I think I should write more about but I have been feeling very similar. I've been recently exploring using claude code/codex recently as the "default", so I've decided to implement a side project.
My gripe with AI tools in the past is that the kind of work I do is large and complex and with previous models it just wasn't efficient to either provide enough context or deal with context rot when working on a large application - especially when that application doesn't have a million examples online.
I've been trying to implement a multiplayer game with server authoritative networking in Rust with Bevy. I specifically chose Bevy as the latest version was after Claude's cut off, it had a number of breaking changes, and there aren't a lot of deep examples online.
Overall it's going well, but one downside is that I don't really understand the code "in my bones". If you told me tomorrow that I had optimize latency or if there was a 1 in 100 edge case, not only would I not know where to look, I don't think I could tell you how the game engine works.
In the past, I could not have ever gotten this far without really understanding my tools. Today, I have a semi functional game and, truth be told, I don't even know what an ECS is and what advantages it provides. I really consider this a huge problem: if I had to maintain this in production, if there was a SEV0 bug, am I confident enough I could fix it? Or am I confident the model could figure it out? Or is the model good enough that it could scan the entire code base and intuit a solution? One of these three questions have to be answered or else brain atrophy is a real risk.
I'm worried about that too. If the error is reproducible, the model can eventually figure it out from experience. But a ghost bug that I can't pattern? The model ends up in a "you're absolutely right" loop as it incorrectly guesses different solutions.
They're real at scale. Plenty of bugs don't suface until you're running under heavy load on distributed infrastructure. Often the culprit is low in the stack. Asking the reporter the right questions may not help in this case. You have full traces, but can't reproduce in a test environment.
When the cause is difficult to source or fix, it's sometimes easier to address the effect by coding around the problem, which is why mature code tends to have some unintuitive warts to handle edge cases.
> Unless your code is multi-threaded, to which I say, good luck!
What isn't multi-threaded these days? Kinda hard to serve HTTP without concurrency, and practically every new business needs to be on the web (or to serve multiple mobile clients; same deal).
All you need is a database and web form submission and now you have a full distributed system in your hands.
You mean in a single-threaded context like Javascript? (Or with Python GIL giving the impression of the same.) That removes some memory corruption races, but leaves all the logical problems in place. The biggest change is that you only have fixed points where interleaving can happen, limiting the possibilities -- but in either scenario, the number of possible paths is so big it's typically not human-accessible.
Webdevs not aware of race conditions -> complex page fails to load. They're lucky in how the domain sandboxes their bugs into affecting just that one page.
nginx is also from the era when fast static file serving was still a huge challenge, and "enough to run a business" for many purposes -- most software written has more mutable state, and much more potential for edge cases.
Historically I would have agreed with you. But since the rise of LLM-assisted coding, I've encountered an increasing number of things I'd call clear "ghost bugs" in single threaded code. I found a fun one today where invoking a process four times with a very specific access pattern would cause a key result of the second invocation to be overwritten. (It is not a coincidence, I don't think, that these are exactly the kind of bugs a genAI-as-a-service provider might never notice in production.)
> I've been trying to implement a multiplayer game with server authoritative networking in Rust with Bevy. I specifically chose Bevy as the latest version was after Claude's cut off, it had a number of breaking changes, and there aren't a lot of deep examples online.
I am interested in doing something similar (Bevy. not multiplayer).
I had the thought that you ought be able to provide a cargo doc or rust-analyzer equivalent over MCP? This... must exist?
I'm also curious how you test if the game is, um... fun? Maybe it doesn't apply so much for a multiplayer game, I'm thinking of stuff like the enemy patterns and timings in a soulslike, Zelda, etc.
I did use ChatGPT to get some rendering code for a retro RCT/SimCity-style terrain mesh in Bevy and it basically worked, though several times I had to tell it "yeah uh nothing shows up", at which point is said "of course! the problem is..." and then I learned about mesh winding, fine, okay... felt like I was in over my head and decided to go to a 2D game instead so didn't pursue that further.
>I had the thought that you ought be able to provide a cargo doc or rust-analyzer equivalent over MCP? This... must exist?
I've found that there are two issues that arise that I'm not sure how to solve. You can give it docs and point to it and it can generally figure out syntax, but the next issue I see is that without examples, it kind of just brute forces problems like a 14 year old.
For example, the input system originally just let you move left and right, and it popped it into an observer function. As I added more and more controls, it began to litter with more and more code, until it was ~600 line function responsible for a large chunk of game logic.
While trying to parse it I then had it refactor the code - but I don't know if the current code is idiomatic. What would be the cargo doc or rust-analyzer equivalent for good architecture?
Im running into this same problem when trying to claude code for internal projects. Some parts of the codebase just have really intuitive internal frameworks and claude code can rip through them and provide great idiomatic code. Others are bogged down by years of tech debt and performance hacks and claude code can't be trusted with anything other than multi-paragraph prompts.
>I'm also curious how you test if the game is, um... fun?
Lucky enough for me this is a learning exercise, so I'm not optimizing for fun. I guess you could ask claude code to inject more fun.
> What would be the cargo doc or rust-analyzer equivalent for good architecture?
Well, this is where you still need to know your tools. You should understand what ECS is and why it is used in games, so that you can push the LLM to use it in the right places. You should understand idiomatic patterns in the languages the LLM is using. Understand YAGNI, SOLID, DDD, etc etc.
Those are where the LLMs fall down, so that's where you come in. The individual lines of code after being told what architecture to use and what is idiomatic is where the LLM shines.
What you describe is how I use LLM tools today, but the reason I am approaching my project in this way is because I feel I need to brace myself for a future where developers are expected to "know your tools"
When I look around today - its clear more and more people are diving in head first into fully agentic workflows and I simply don't believe they can churn out 10k+ lines of code today and be intimately familiar with the code base. Therefore you are left with two futures:
* Agentic-heavy SWEs will eventually blow up under the weight of all their tech debt
* Coding models are going to continue to get better where tech debt wont matter.
If the answer if (1), then I do not need to change anything today. If the answer is (2), then you need to prepare for a world where almost all code is written by an agent, but almost all responsibility is shouldered by you.
In kind of an ignorant way, I'm actually avoiding trying to properly learn what an ECS is and how the engine is structured, as sort of a handicap. If in the future I'm managing a team of engineers (however that looks) who are building a metaphorical tower of babel, I'd like to develop to heuristic in navigating that mountain.
I ran into similar issues with context rot on a larger backend project recently. I ended up writing a tool that parses the AST to strip out function bodies and only feeds the relevant signatures and type definitions into the prompt.
It cuts down the input tokens significantly which is nice for the monthly bill, but I found the main benefit is that it actually stops the model from getting distracted by existing implementation details. It feels a bit like overengineering but it makes reasoning about the system architecture much more reliable when you don't have to dump the whole codebase into the context window.
My gripe with AI tools in the past is that the kind of work I do is large and complex and with previous models it just wasn't efficient to either provide enough context or deal with context rot when working on a large application - especially when that application doesn't have a million examples online.
I've been trying to implement a multiplayer game with server authoritative networking in Rust with Bevy. I specifically chose Bevy as the latest version was after Claude's cut off, it had a number of breaking changes, and there aren't a lot of deep examples online.
Overall it's going well, but one downside is that I don't really understand the code "in my bones". If you told me tomorrow that I had optimize latency or if there was a 1 in 100 edge case, not only would I not know where to look, I don't think I could tell you how the game engine works.
In the past, I could not have ever gotten this far without really understanding my tools. Today, I have a semi functional game and, truth be told, I don't even know what an ECS is and what advantages it provides. I really consider this a huge problem: if I had to maintain this in production, if there was a SEV0 bug, am I confident enough I could fix it? Or am I confident the model could figure it out? Or is the model good enough that it could scan the entire code base and intuit a solution? One of these three questions have to be answered or else brain atrophy is a real risk.