This is exact the impression that I got. Every question or task given to LLM returns pretty reasonable, but flawed result. For the coding, those are hard to spot but dangerous mistakes. They all look good and perfectly reasonable, but just wrong. Anthropic compared Claude Code to a "slot machine", and I fell that AI coding now is something close to gambling addiction. As small wins keep gambler to make more bets, so correct results from AI keep developers to use it: "I see it made correct solution, let's try again!"
At a startup CTO, I review most of the pull requests from team members, and team uses AI tools actively. The overall picture strongly confirms your second conclusion.
If someone gives you access to a slot machine which is weighted such that it pays out way more than you put into it, my advice is to start cranking that lever.
If it does indeed start costing more than it's paying out, step away.
This is not about understanding the message, but switching user mental activity.
I go myself in the similar situations many times. One example: I tried to pay my bills in online bank application, but got into error. After several attempts, I did read message and it say "Header size exceed..." . It give me clue that app probably put too much history into cookies. Clear browser data, log in again, and all got works.
Even when error message was clearly understandable for my expertise, it took surprisingly long tome to switch from one mental activity - "Pay bills", to another - "Investigate technical problem". And you have to throw away all short memory to switch into another task. So all rumors about "stupid" users is direct consequence from how human mind works.
> This is not about understanding the message, ...
99% of the population have no idea what "Header size exceeded" means, so it absolutely is about understanding the message, if the devs expect people to read the error.
Yeah, I would certainly not expect the user to understand what to do about a "Header size exceeded" error.
But I WOULD expect the user, when sending a message to support, to say they're getting a "Header size exceeded" error, rather than just say "an error".
A lot of discussions around vibe AI coding flaws: awful architecture, performance problems, security holes, lack of maintainability, bugs, and low code quality. All correct, but none of those is matter if:
- you create small utility that covers only features needed only for you. As many researches show that any individual uses only less than 20% of software functionality, your tool covers only 10-20% that matters for you
- it only runs locally, on user computer or phone, and never has more than one customer. Performance, security, compliances do not matter
- the code lies next to application, and small enough to fix any bug instantly, in a single AI agent run
- as a single user, you don't care about design, UX, or marketing. Do the job is only matter
It means, majority of vibe coded applications run under radar, used only by a few individuals. I can see it myself: I have a bunch of vibe code utilities that never intended for a broad auditory . And, many of my friend and customers, mention the same: "I vibe coded utility that does ... for me". This means a big consequences for software development: the area for commercial development shrinks, nothing that can be replaced by the small local utility has a market value.
I've read book written by captain Kocebu, that was on duty to protect Russian holdings in Alaska. They visited San Francisco in 1805 and 1815, and several chapters described life of native people in the mission.
He described harsh conditions, hard work, no freedom at all, and very high death rates. Shocking even for a early XIX century naval officer. Once a year, those people allowed to visit their tribes and relatives. And they always came back!
So, the real hunter gathers, who had first hand comparison for both nomadic and agrarian life, prefer near slavery in mission to life in the wild.
I saw it more and more recent times, but worse is coming:
With my teenage daughter ( and her friends ), I see that they even do not bother with screenshots, and take pictures by phone...
Funny coincidence. This morning, my news aggregator delivered its daily results. The stack:
Miniflux (https://miniflux.app/) in Docker, fetching 75 RSS feeds I've collected over the years
~200 lines in a Jupyter notebook:
- Fetch entries from Miniflux API (last 24-48 hours)
- Convert to CSV, feed to LLM. GPT-5 identifies trending stories across sources
- Each article gets web-fetched and summarized via Gemini-2.5-flash
- Results render via IPython.display
Ten minutes per day, fully informed.
I prefer to not.
LLM prompts and feeds selection clearly expose my political preferences, interests, and location.
And this is research project for more serious task. For 200 lines of code, there are 1000+ for evaluation, automatic prompt optimization.
But idea is simple, so there should be no problem to implement it
reply