Agreed, but: There's been a notable jump over the course of the last few months,...

sublinear · 2026-01-12T04:48:08 1768193288

This doesn't make any sense. If the business can get rid of their engineers, then why can't the user get rid of the business providing the software? Why can't the user use AI to write it themselves?

I think instead the value is in getting a computer to execute domain-specific knowledge organized in a way that makes sense for the business, and in the context of those private computing resources.

It's not about the ability to write code. There are already many businesses running low-code and no-code solutions, yet they still have software engineers writing integration code, debugging and making tweaks, in touch with vendor support, etc. This has been true for at least a decade!

That integration work and domain-specific knowledge is already distilled out at a lot of places, but it's still not trivial. It's actually the opposite. AI doesn't help when you've finally shaved the yak smooth.

chongli · 2026-01-12T05:14:41 1768194881

If the business can get rid of their engineers, then why can't the user get rid of the business providing the software?

A lot of businesses are the only users of their own software. They write and use software in-house in order to accomplish business tasks. If they could get rid of their engineers, they would, since then they'd only have to pay the other employees who use the software.

They're much less likely to get rid of the user employees because those folks don't command engineer salaries.

hypeatei · 2026-01-12T09:57:42 1768211862

So instead of paying a human that "commands an engineer salary" then they'll be forced to pay whatever Anthropic or OpenAI commands to use their LLMs? I don't see how that's a better proposition: the LLM generates a huge volume of code that the product team (or whoever) cannot maintain themselves. Therefore, they're locked-in and need to hope the LLM can solve whatever issues they have, and if it can't, hope that whatever mess it generated can be fixed by an actual engineer without costing too much money.

Also, code is only a small piece and you still need to handle your hosting environment, permissions, deployment pipelines, etc. which LLMs / agentic workflows will never be able to handle IMO. Security would be a nightmare with teams putting all their faith into the LLM and not being able to audit anything themselves.

I don't doubt that some businesses will try this, but on paper it sounds like a money pit and you'd be better off just hiring a person.

chongli · 2026-01-12T15:00:11 1768230011

It’s the same business model as consulting firms. Rather than hiring a few people for 65k each, a VP will bring in a consulting firm for 10M and get a bloated, half-working solution that costs even more to get working. The VP doesn’t care though because he ends up looking like a big shot in front of the other execs.

Thews · 2026-01-12T16:21:00 1768234860

There are lots of developer agencies that hire developers as contractors that companies can use to outsource development to in a cheaper way without needing to pay for benefits or HR. They don't necessarily make bad quality software, but it doesn't feel humane.

sublinear · 2026-01-12T17:40:16 1768239616

Unless we're talking about some sketchy gig work nonsense, the "agency" is a consultancy like any other. They are a legitimate employer with benefits, w2, etc. It's not like they're pimps or something!

Those devs aren't code monkeys and they get paid the same as anyone else working in this industry. In fact, I think a lot of the more ADHD type people on here would strongly prefer working on a new project every 6 months without needing to find a new employer every time. The contracts between the consultancy and client usually also include longer term support than the limited time the original dev spent on it.

Thews · 2026-01-13T14:29:44 1768314584

Agencies commonly use 1099 workers, there's been fierce legal battles on qualifications of agencies. (ABC test)

I believe 1099 worker growth has been outpacing hiring for several years.

sublinear · 2026-01-12T18:06:50 1768241210

The VP doesn't care because the short term result is worth more to the business. The business is not going to trip over dollars to pick up pennies.

Would you prefer that they hire, string those people along, and then fire them? That's a pain in the ass for everyone.

matwood · 2026-01-12T08:01:25 1768204885

> If the business can get rid of their engineers, then why can't the user get rid of the business providing the software?

I have't checked the stats lately, but at one point most software written was in non-tech companies for the single business. The first 1/2 of my career was spent writing in-house software for a company that did everything from custom reporting and performance tracking to scraping data of automated phone dialers. There's so much software out there that effectively has a user base of a single company.

daxfohl · 2026-01-14T04:27:05 1768364825

In some cases that could happen; in particular there may be a lot of UI and cross-app-integration style stuff that starts to get offloaded to users, so users can have AI code up their own UI for using some services together in the way that they want.

But in most cases businesses still need to own their own logic and data, so businesses will still be owning plenty of their own software. Otherwise customers could just write software to buy all your business's products for 99% off!

concats · 2026-01-12T09:17:28 1768209448

> Ultimately I think over the next two years or so, Anthropic and OpenAI will evolve their product from "coding assistant" to "engineering team replacement"

The way I see it, there will always be a layer in the corporate organization where someone has to interact with the machine. The transitioning layer from humans to AIs. This is true no matter how high up the hierarchy you replace the humans, be it the engineers layer, the engineering managers, or even their managers.

Given the above, it feels reasonable to believe that whatever title that person has—who is responsible for converting human management's ideas into prompts (or whatever the future has the text prompts replaced by)—that person will do a better job if they have a high degree of technical competence. That is to say, I believe most companies will still want and benefit if that/those employees are engineers. Converting non-technical CEO fever dreams and ambitions into strict technical specifications and prompts.

What this means for us, our careers, or Anthropic's marketing department, I cannot say.

eloisant · 2026-01-12T11:25:24 1768217124

That reminds me of the time where 3GL languages arrived and bosses claimed they no longer needed developers, because anyone could write code in those English-like languages.

Then when mouse-based tools like Visual Basic arrived, same story, no need for developers because anyone can write programs by clicking!

Now bosses think that with AI anyone will be able to create software, but the truth is that you'll still need software engineers to use those tools.

Will we need less people? Maybe. But in the past 40 years we have been increasing the developers productivity so many times, and yet we still need more and more developers because the needs have grown faster.

OkayPhysicist · 2026-01-12T18:41:58 1768243318

My suspicion is that it will be bad for salaries, mostly because it'll kill the "looks difficult" moat that software development currently has. Developers know that "understanding source code" is far from the hard part of developing software, but non-technical folks' immediate recoiling in the face of the moon runes has kept our profession pretty easy to justify high pay for for ages. If our jobs transition to largely "communing with the machines", then we'll go from a "looks hard, is hard" job, to a "looks easy, is hard" job, which historically hurts bargaining power.

daxfohl · 2026-01-14T03:26:11 1768361171

I don't think "looks difficult" has been driving wages. FAANG etc leadership knows what's difficult and what's not. It's just marginal ROI. If you have a trillion-dollar market and some feature could increase that by 0.0001%, you hire some engineers to give it a try. If other companies are also competing for the same engineers for the same reasons, salaries skyrocket.

brailsafe · 2026-01-13T03:55:16 1768276516

I wonder if the actual productivity changes won't end up mattering for the economics to change dramatically, but change in terms of a rebound in favour of seniors. If I was in school 2 years ago, looking at the career prospects and cost of living, I just straight up wouldn't invest in the career. If that happens at a large enough scale, the replenishment of the discipline may reduce, which would have an effect on what people who already had those skills could ask for. If the middle step, where wild magical productivity gains don't materialize in a way that reduces the need for expert software people who can reasonably be liable for whatever gets shipped, then we'll stick around.

Whether it looks easy or not doesn't matter as much imo. Plumbing looks and probably is easy, but it's not the CEOs job to go and fix the pipes.

aspenmartin · 2026-01-13T15:27:03 1768318023

I think this is the right take. In some narrow but constantly broadening contexts, agents give you a huge productivity edge. But to leverage that you need to be skilled enough to steer, design the initial prompt, understand the impact of what you produce, etc. I don't see agents in their current and medium term inception as being a replacement of engineering work, I see it as a great reshuffling of engineering work.

In some business contexts, the impact of more engineering labor on output gets capped at some point. Meaning once agent quality reaches a certain point, the output increase is going to be minimal with further improvements. There, labor is not the bottleneck.

In other business contexts, labor is the bottleneck. For instance it's the bottleneck for you as an individual: what kind of revenue could you make if you had a large team of highly skilled senior SWEs that operate for pennies on the dollar?

Labor will shift to where the ROI is highest is what I think you'll see.

To be fair, I can imagine a world where we eventually fully replace the "driver" of the agent in that it is good enough to fulfill the role of a ~staff engineer that can ingest very high level business context, strategy, politics and generate a high level system design that can then be executed by one or more agents (or one or more other SWEs using agents). I don't (at this point) see some fundamental rule of physics / economics that prevents this, but this seems much further ahead from where we are now.

catlifeonmars · 2026-01-12T05:05:06 1768194306

I actually think it’s the opposite. We’ll see fewer monorepos because small, scoped repos are the easiest way to keep an agent focused and reduce the blast radius of their changes. Monorepos exist to help teams of humans keep track of things.

daxfohl · 2026-01-12T06:49:32 1768200572

Could be. Most projects I've worked on tend to span multiple services though, so I think AI would struggle more trying to understand and coordinate across all those services versus having all the logic in a single deployable instance.

The way I see feature development in the future is, PM creates a dev cluster (also much easier with a monolith), has AI implement a bunch of features to spec, AI provides some feedback and gets input on anywhere it might conflict with existing functionality, whether eventual consistency is okay, which pieces are performance criticial, etc., and provides the implementation, a bunch of tests for review, and errata about where to find observability data, design decisions considered and chosen, etc. PM does some manual testing across various personas and products (along with PMs from those teams), has AI add feature flags, launches. The feature flag rollout ends up being the long-pole, since generally the product team needs to monitor usage data for some time before increasing the rollout percentage.

So I see that kind of workflow as being a lot easier in a monolithic service. Granted, that's a few years down the road though, before we have AI reliable enough to do that kind of work.

catlifeonmars · 2026-01-13T06:34:52 1768286092

> Most projects I've worked on tend to span multiple services though, so I think AI would struggle more trying to understand and coordinate across all those services versus having all the logic in a single deployable instance.

1. At least CC supports multiple folders in a workspace, so that’s not really a limitation.

2. If you find you are making changes across multiple services, then that is a good indication that you might not have the correct abstraction on the service boundary. I agree that in this case a monolith seems like a better fit.

daxfohl · 2026-01-14T03:40:36 1768362036

Agreed on both counts. Though for the first one it's still easier to implement things when bugs create compile or local unit/integration test errors rather than distributed service mismatches that can only be caught with extensive distributed e2e tests and a platform for running them, plus the lack of distribution cuts down significantly on the amount of code, edge cases, and deployment sequencing that needs to be taken into account.

For the second, yeah, but IME everything starts out well-factored, but almost universally evolves into spaghetti over time. The main advantage monoliths have is that they're safer to refactor across boundaries. With distributed services, there are a lot more backward-compatibility guarantees and concerns you have to work through, and it's harder to set up tests that exercise everything e2e across those boundaries. Not impossible, but hard enough that it usually requires a dedicated initiative.

Anyway, random thoughts.

ericmcer · 2026-01-12T23:53:15 1768261995

If you research how something like Cursor works I don't think you would believe it is inevitable. The jump that would have to happen for it to replace engineers entirely is insurmountable. They can keep expanding contexts and coming up with clever ways to augment generation but I don't see it ever actually having full vision on the system, product and users.

Beyond that it is incredibly biased towards existing code & prompt content. If you wanted to build a voice chat app, and you said "should I use websockets or http?" It would say Websockets. It won't override you and say "Use neither, you should use webRTC", but an experienced engineer would spot that the prompt itself is flawed instantly. LLMs just will bias towards existing tokens in the prompt and won't surface data that would challenge the question itself.

aldanor · 2026-01-13T04:00:59 1768276859

Unless you, well, state in AGENTS.md that prompts may offer suboptimal options in which case it's the machine's duty to question them, treat the prompter like a coworker and not a boss.

apstls · 2026-01-13T21:18:46 1768339126

Sit down and re-read your comment one night with your "I am an engineer and will solve this as an engineering problem" hat firmly on. If you stop thinking of LLMs as lobotimized coworkers trapped inside an API wrapper and instead as computational primitives then things become much more interesting and the future becomes clearer to see.

smt88 · 2026-01-12T03:52:34 1768189954

There's no chance LLMs will be an engineering team replacement. The hallucination problem is unsolvable and catastrophic in some edge cases. Any company using such a team would be uninsurable and sued into oblivion.

eru · 2026-01-12T04:36:34 1768192594

Writing software is actually one of the domains where hallucinations are easiest to fix: you can easily check whether it builds and passes tests.

If you want to go further, you can even require the LLM to produce a machine checkable proof that the software is correct. That's beyond the state of the art at the moment, but it's far from 'unsolvable'.

If you hallucinate such a proof, it'll just not work. Feed back the error message from the proof checker to your coding assistant, and the hallucination goes away / isn't a problem.

thesz · 2026-01-12T05:41:36 1768196496

  > you can easily check whether it builds and passes tests.

This link were on HN recently: https://spectrum.ieee.org/ai-coding-degrades

  "...recently released LLMs, such as GPT-5, have a much more insidious method of failure. They often generate code that fails to perform as intended, but which on the surface seems to run successfully, avoiding syntax errors or obvious crashes. It does this by removing safety checks, or by creating fake output that matches the desired format, or through a variety of other techniques to avoid crashing during execution."

The trend for LLM generated code is to build and pass tests but do not deliver functionality needed.

Also, please consider how SQLite is tested: https://sqlite.org/testing.html

The ratio between test code and code itself is mere 590 times (590 LOC of tests per LOC of actual code), it used to be more than 1100.

Here is notes on current release: https://sqlite.org/releaselog/3_51_2.html

Notice fixes there. Despite being one of the most, if not the most, tested pieces of software in the world, it still contains errors.

  > If you want to go further, you can even require the LLM to produce a machine checkable proof that the software is correct.

Haha. How do you reconcile a proof with actual code?

vanviegen · 2026-01-12T08:07:14 1768205234

I've recently seen Opus, after struggling for a bit, implement an API by having it return JSON that includes instructions for a human to manually accomplish the task I gave it.

It proudly declared the task done.

thesz · 2026-01-12T18:35:33 1768242933

I believe you have used Albanian [1] version of Opus.

[1] https://www.reddit.com/r/ProgrammerHumor/comments/1lw2xr6/hu...

DrSiemer · 2026-01-12T14:08:09 1768226889

Recent models have started to "fix" HTML issues with ugly hacks like !important. The result looks like it works, but the tech debt is considerable.

Still, it's just a temporary hindrance. Nothing a decent system prompt can't take care of until the models evolve.

eru · 2026-01-12T06:49:27 1768200567

> Haha. How do you reconcile a proof with actual code?

You can either proof your Rust code correct, or you can use a proof system that allows you to extract executable code from the proofs. Both approaches have been done in practice.

Or what do you mean?

thesz · 2026-01-12T18:42:13 1768243333

Rust code can have arbitrary I/O effects in any parts of it. This precludes using only Rust's type system to make sure code does what spec said.

The most successful formally proven project I know, seL4 [1], did not extracted executable code from the proof. They created a prototype in Haskell, mapped (by hand) it to Isabelle, I believe, to have a formal proof and then recreated code in C, again, manually.

[1] https://sel4.systems/

Not many formal proof systems can extract executable C source.

eab- · 2026-01-12T10:27:20 1768213640

> Haha. How do you reconcile a proof with actual code?

Languages like Lean allow you to write programs and proofs under the same umbrella.

thesz · 2026-01-12T18:49:44 1768243784

As if Lean does not allow to circumvent it's proof system (the "sorry" keyword).

Also, consider adding code to the bigger system, written in C++. How would you use Lean to prove correctness of your code as part of the bigger system?

daxfohl · 2026-01-14T03:57:27 1768363047

I mean, it's somewhat moot, as even the formal hypothesis ("what is this proof proving") can be more complex than the code that implements it in nontrivial cases. So verifying that the proof is saying the thing that you actually want it to prove can be near impossible for non-experts, and that's just the hypothesis; I'm assuming the proof itself is fully AI-generated and not reviewed beyond running it through the checker.

And at least in backend engineering, for anything beyond low-level algorithms you almost always want some workarounds: for your customer service department, for engineering during incident response, for your VIP clients, etc. If you're relying on formal proof of some functionality, you've got to create all those allowances in your proof algorithm (and hypothesis) too. And additionally nobody has really come up with a platform for distributed proofs, durable proof keys (kinda), or how to deal with "proven" functionality changes over time.

DrammBA · 2026-01-12T05:03:38 1768194218

You focused on writing software, but the real problem is the spec used to produce the software, LLMs will happily hallucinate reasonable but unintended specs, and the checker won’t save you because after all the software created is correct w.r.t. spec.

Also tests and proof checkers only catch what they’re asked to check, if the LLM misunderstands intent but produces a consistent implementation+proof, everything “passes” and is still wrong.

simonw · 2026-01-12T05:04:54 1768194294

This is why every one of my coding agent sessions starts with "... write a detailed spec in spec.md and wait for me to approve it". Then I review the spec, then I tell it "implement with red/green TDD".

tsimionescu · 2026-01-12T06:53:52 1768200832

The premise was that the AI solution would replace the engineering team, so who exactly is writing/reviewing this detailed spec?

eru · 2026-01-12T08:11:01 1768205461

Well, perhaps it'll only shrink the engineering team by 95% then.

LouisSayers · 2026-01-12T11:11:27 1768216287

Why would you shrink the team rather than become 20x more productive as a whole?

daxfohl · 2026-01-14T04:03:54 1768363434

Users don't want changes that rapidly. There's not enough people on the product team to design 20x more features. 20x more features means 400x more cross-team coordination. There's only positive marginal ROI for maybe 1.5-2x even if development is very cheap.

eru · 2026-01-12T13:01:49 1768222909

Either way can work. It depends on what the rest of the business needs.

PurpleRamen · 2026-01-12T11:57:07 1768219027

The premise is in progress. We are only at the beginning of the fourth year of this hype-phase, and we haven't even reached AGI yet. It's obviously not perfect, maybe never will, but we are not a the point yet were we can conclude which future is true. The singularity hasn't happend yet, so we are still moving with (llm-enhanced) human speed at the moment, meaning things need time.

simonw · 2026-01-12T06:55:45 1768200945

That's a bad premise.

tsimionescu · 2026-01-12T07:01:12 1768201272

Maybe, but you're responding to a thread about why AI might or might not be able to replace an entire engineering team:

> Ultimately I think over the next two years or so, Anthropic and OpenAI will evolve their product from "coding assistant" to "engineering team replacement", which will include standard tools and frameworks that they each specialize in (vendor lock in, perhaps), but also ways to plug in other tech as well.

This is the context of how this thread started, and this is the context in which DrammBA was saying that the spec problem is very hard to fix [without an engineering team].

matwood · 2026-01-12T08:08:39 1768205319

Might be good to define the (legacy) engineering team. Instead of thinking 0/1 (ugh, almost nothing happens this way), the traditional engineering team may be replaced by something different. A team mostly of product, spec writers, and testers. IDK.

galaxyLogic · 2026-01-12T08:42:30 1768207350

The job of AI is to do what we tell it to do. It can't "create a spec" on its own. If it did and then implemented that spec, it wouldn't accomplish what we want it to accomplish. Therefore we the humans must come up with that spec. And when you talk about a software application, the totality of its spec written out, can be very complex, very complicated. To write and understand, and evolve and fix such a spec takes engineers, or what used to be called "system analysts".

To repeat: To specify what a "system" we want to create does is a highly complicated task, which can only be dones by human engineers who understand the requirements for the system, and how parts of those requirements/specs interact with other parts of the spec, what are the consequences of one (part of the) spec to other parts of it. We must not writ e"impossible specs" like draw me a round square. Maybe the AI can check whether the spec is impossible or not, but I'm not so sure of that.

So I expect that software engineers will still be in high demand, but they will be much more productive with AI than without it. This means there will be much more software because it will be cheaper to produce. And the quality of the software will be higher in terms of doing what humans need it to do. Usability. Correctness. Evolvability. In a sense the natural language-spec we give the AI is really something written in a very high-level programming-language - the language of engineers.

BTW. As I write this I realize there is no spell-checker integrated into Hacker News. (Or is there?). Why? Because it takes developers to specify and implement such a system - which must be integrated into the current HN implementation. If AI can do that for HN, it can be done, because it will be cheap enough to do it -- if HN can exactly spell out what kind of system it wants. So we do need more software, better software, cheaper software, and AI will helps us do that.

A 2nd factor is that we don't really know if a spec is "correct" until we test the implemented system with real users. At that point we typically find many problems with the spec. So somebody must fix the problems with the spec, evolve the spec and rinse and repeat the testing with real users -- the developers who understand the current spec and why it is is not good enough.

AI can write my personal scripts for me surely. But writing a spec for a system to be used by thousands of humans, still takes a lot of (human) work. The spec must work for ALL users. That makes it complicated and difficult to get right.

daxfohl · 2026-01-12T06:16:50 1768198610

Same, and similarly something like a "create a holistic design with all existing functionality you see in tests and docs plus new feature X, from scratch", then "compare that to the existing implementation and identify opportunities for improvement, ranked by impact, and a plan to implement them" when the code starts getting too branchy. (aka "first make the change easy, then make the easy change"). Just prompting "clean this code up" rarely gets beyond dumb mechanical changes.

Given so much of the work of managing these systems has become so rote now, my only conclusion is that all that's left (before getting to 95+% engineer replacement) is an "agent engineering" problem, not an AI research problem.

ulrikrasmussen · 2026-01-12T04:59:40 1768193980

In order to prove safety you need a formal model of the system and formally defined safety properties that are both meaningful and understandable by humans. These do not exist for enterprise systems

eru · 2026-01-12T05:57:26 1768197446

An exhaustive formal spec doesn't exist. But you can conservatively proof some properties. Eg program termination is far from sufficient for your program to do what you want, but it's probably necessary.

(Termination in the wider sense: for example an event loop has to be able to finish each run through the loop in finite time.)

You can see eg Rust's or Haskell's type system as another light-weight formal model that lets you make and proof some simple statements, without having a full formal spec of the whole desired behaviour of the system.

ulrikrasmussen · 2026-01-12T07:49:15 1768204155

Yeah, but with all respect, that is a totally uninteresting property in an enterprise software system where almost no software bugs actually manifest as non-termination.

The critical bugs here are related to security (DDoS attacks, authorization and authentication, data exfiltration, etc), concurrency, performance, data corruption, transactionality and so forth. Most enterprise systems are distributed or at least concurrent systems which depend on several components like databases, distributed lock managers, transaction managers, and so forth, where developing a proper formal spec is a monumental task and possibly impossible to do in a meaningful way because these systems were not initially developed with formal verification in mind. The formal spec, if faithful, will have to be huge to capture all the weird edge cases.

Even if you had all that, you need to actually formulate important properties of your application in a formal language. I have no idea how to even begin doing that for the vast majority of the work I do.

Proving the correctness of linear programs using techniques such as Hoare logic is hard enough already for anything but small algorithms. Proving the correctness of concurrent programs operating on complex data structures requires much more advanced techniques, setting up complicated logical relations and dealing with things like separation logic. It's an entirely different beast, and I honestly do not see LLMs as a panacea that will suddenly make these things scale for anything remotely close in size to a modern enterprise system.

eru · 2026-01-12T08:12:37 1768205557

Oh, there's lots more simple properties you can state and prove that capture a lot more, even in the challenging enterprise setting.

I just gave the simplest example I could think of.

And termination is actually a much stronger and more useful property than you make it out to be---in the face of locks and concurrency.

tsimionescu · 2026-01-12T07:05:12 1768201512

That is true and very useful for software development, but it doesn't help if the goal is to remove human programmers from the loop entirely. If I'm a PM who is trying to get a program to, say, catalogue books according to the Dewey Decimal system for a library, a proof that the program terminates is not going to help that much when the program is mis-categorizing some books.

seanmcdirmid · 2026-01-12T07:07:54 1768201674

Is removing the human in the loop really the goal, or is the goal right now to make the human a lot more productive? Because...those are both very different things.

tsimionescu · 2026-01-12T07:11:16 1768201876

I don't know what the goal for OpenAI or Anthropic really is.

But the context of this thread is the idea that the user daxfohl launched that these companies will, in the next few years, launch an "engineering team replacement" program; and then the user eru claimed that this is indeed more doable in programming than other domains because you can have specs and tests for programs in a way that you can't for, say, an animated movie.

eru · 2026-01-12T08:14:30 1768205670

OK, so you successfully argued that replacing the entire engineering team is hard. But you can perhaps still shrink it by 99%. To the point where a sole founder can do the remaining tech role part time.

seanmcdirmid · 2026-01-12T08:28:34 1768206514

I have no idea what will happen in a few years, maybe LLM tech will hit a wall and humans will continue to be needed in the loop. But today humans are definitely needed in the loop in some way.

solid_fuel · 2026-01-12T05:01:59 1768194119

> Writing software is actually one of the domains where hallucinations are easiest to fix: you can easily check whether it builds and passes tests.

What tests? You can't trust the tests that the LLM writes, and if you can write detailed tests yourself you might as well write the damn software.

eru · 2026-01-12T05:55:58 1768197358

Use multiple competing LLM. Generative adversarial network style.

solid_fuel · 2026-01-12T06:45:25 1768200325

Cool. That sure sounds nice and simple. What do you do when the multiple LLMs disagree on what the correct tests are? Do you sit down and compare 5 different diffs to see which have the tests you actually want? That sure sounds like a task you would need an actual programmer for.

At some point a human has to actually use their brain to decide what the actual goals of a given task are. That person needs to be a domain expert to draw the lines correctly. There's no shortcut around that, and throwing more stochastic parrots at it doesn't help.

eru · 2026-01-12T06:48:32 1768200512

Just because you can't (yet) remove the human entirely from the loop, doesn't mean that economising on the use of the humans time is impossible.

For comparison have a look at compilers: nowadays approximately no one writes their software by hand, we write a 'prompt' in something like Rust or C, and ask another computer program to create the actual software.

We still need the human in the loop here, but it takes much less human time than creating the ELF directly.

solid_fuel · 2026-01-12T10:16:39 1768212999

It’s not “economizing” if I have to verify every test myself. To actually validate that tests are good I need to understand the system under test, and at that point I might as well just write the damn thing myself.

This is the fundamental problem with this “AI” mirage. If I have to be an expert to validate that the LLM actually did the task I set out, and isn’t just cheating on tests, then I might as well code the solution myself.

daxfohl · 2026-01-14T04:17:27 1768364247

From a PM perspective, the main differentiator between an engineering team and AI is "common sense". As these tools get used more and more, enough training data will be available that AI's "common sense" in terms of coding and engineering decisions could be indistinguishable from a human's over time. At that point, the only advantage a human has is that they're also useful on the ops and incident response side, so it's beneficial if they're also comfortable with the codebase.

Eventually these human advantages will be overcome, and AI will sufficiently pass a "Turing Test" for software engineering. PMs will work with them directly and get the same kinds of guidance, feedback, documentation, and conversational planning and coordination that they'd get from an engineering team, just with far greater speed and less cost. At that point, yeah you'll probably need to keep a few human engineers around to run the system, but the system itself will manage the software. The advantage of keeping a human in the loop will dwindle to zero.

exceptione · 2026-01-12T08:39:28 1768207168

I can see how LLMs can help with testing, but one should never compare LLMs with deterministic tools like compilers. LLMs are entirely a separate category.

somenameforme · 2026-01-12T05:31:23 1768195883

Tests and proofs can only detect issues that you design them to detect. LLMs and other people are remarkably effective at finding all sorts of new bugs you never even thought to test against. Proofs are particularly fragile as they tend to rely on pre/post conditions with clean deterministic processing, but the whole concept just breaks down in practice pretty quickly when you start expanding what's going on in between those, and then there's multithreading...

mohaine · 2026-01-12T05:10:00 1768194600

Ah, most the problem in programming is writing the tests. Once you know what you need the rest is just typing.

I can see an argument where you can get none programers to create the input and output of said tests but if the can do that, they are basically programmers.

This is of course leaving aside that half the stated use cases I hear for AI are that it can 'write the tests for you'. If it is writing the code and the tests it is pointless.

discreteevent · 2026-01-12T09:35:38 1768210538

You need more than tests. Test induced design damage:

https://dhh.dk/2014/test-induced-design-damage.html

shevy-java · 2026-01-12T06:48:54 1768200534

Well - the end result can be garbage still. To be fair: humans also write a lot of garbage. I think in general most software is rather poorly written; only a tiny percentage is of epic prowess.

Marazan · 2026-01-12T09:35:14 1768210514

Who is writing the tests?

rezonant · 2026-01-12T06:53:09 1768200789

Who writes the tests?

eru · 2026-01-12T08:14:36 1768205676

A competing AI.

Marazan · 2026-01-12T09:35:40 1768210540

Ah, it is turtles all the way down.

eru · 2026-01-12T13:02:57 1768222977

Yes. But it's no different from the question of how a non-tech person can make sure that whatever their tech person tells them actually makes sense: you hire another tech person to have a look.

matwood · 2026-01-12T08:05:29 1768205129

These types of comments are interesting to me. Pre-chatGPT there were tons of posts how so many software people were terrible at their jobs. Bugs were/are rampant. Software bugs caused high profile issues, but likely so many more we never heard about.

Today we have chatGPT and only now will teams be uninsurable and sued into oblivion? LOL

elzbardico · 2026-01-12T12:45:42 1768221942

LLMs were trained on exactly that kind of code.

smt88 · 2026-01-12T17:26:31 1768238791

If you've ever used Claude Code in brave mode, I can't understand how you'd think a dev team could make the same categories of mistakes or with the same frequency.

fragmede · 2026-01-12T05:59:53 1768197593

I am but a lowly IC, with no notion of the business side of things. If I am an IC at, say, a FANG company, what insurance has been taken out on me writing code there?

smt88 · 2026-01-12T15:43:35 1768232615

> If I am an IC at, say, a FANG company, what insurance has been taken out on me writing code there?

Every non-trivial software business has liability insurance to cover them for coding lapses that lead to data breaches or other kinds of damages to customers/users.

kristiandupont · 2026-01-12T06:58:54 1768201134

I use LLM's to write the majority of my code. I haven't encountered a hallucination for the better part of a year. It might be theoretically unsolvable but it certainly doesn't seem like a real problem to me.

smt88 · 2026-01-12T15:41:04 1768232464

I use LLMs whenever I'm coding, and it makes mistakes ~80% of the time. If you haven't seen it make a huge mistake, you may not be experienced enough to catch them.

kristiandupont · 2026-01-12T17:52:54 1768240374

Hallucinations, no. Mistakes, yes, of course. That's a matter of prompting.

tdrz · 2026-01-12T21:15:42 1768252542

> That's a matter of prompting.

So when I introduce a bug it's the PM's fault.

twelvedogs · 2026-01-12T09:21:20 1768209680

honestly i think they got the low hanging fruit already. they're bumping up against the limits of what it can do and while it's impressive it's not spectacular

embedding-shape · 2026-01-12T10:54:01 1768215241

Maybe I'm easily impressed, but that LLMs even work to output basic human-like text to me is bananas, and I do understand a bit of how it works, yet it's still up there as "Amazing that huge airplanes even can fly" is for me.