Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I still really, really, really struggle to see how humans are going to maintain and validate the programs written by LLMs if we no longer know (intimately) how to program. Any thoughts?


Very few people have the expertise to write efficient assembly code, yet everyone relies on compilers and assemblers to translate high-level code to byte-level machine code. I think same concept is true here.

Once coding agents become trivial, few people will know the detail of the programming language and make sure intent is correctly transformed to code, and the majority will focus on different objectives and take LLM programming for granted.


No, that's a completely different concept, because we have faultless machines which perfectly and deterministically translate high-level code into byte-level machine code. This is another case of (nearly) perfect abstraction.

On the other hand, the whole deal of the LLM is that it does so stochastically and unpredictably.


The unpredictable part isn't new - from a project manager's point of view, what's the difference between an LLM and a team of software engineers? Both, from that POV, are a black box. The "how" is not important to them, the details aren't important. What's important is that what they want is made a reality, and that customers can press on a button to add a product to their shopping cart (for example).

LLMs mean software developers let go of some control of how something is built, which makes one feel uneasy because a lot of the appeal of software development is control and predictability. But this is the same process that people go through as they go from coder to lead developer or architect or project manager - letting go of control. Some thrive in their new position, having a higher overview of the job, while some really can't handle it.


"But this is the same process that people go through as they go from coder to lead developer or architect or project manager - letting go of control."

In those circumstances, it's delegating control. And it's difficult to judge whether the authority you delegated is being misused if you lose touch with how to do the work itself. This comparison shouldn't be pushed too far, but it's not entirely unlike a compiler developer needing to retain the ability to understand machine code instructions.


As someone that started off with assembly issues for a large corporation - assembly code may sometimes contain very similiar issues that mroe high-level code those, the perfection of the abstraction is not guaranteed.

But yeah, there's currently a wide gap between that and a stochastic LLM.


We also have machines that can perfectly and deterministically check written code for correctness.

And the stohastic LLM can use those tools to check whether its work was sufficient, if not, it will try again - without human intervention. It will repeat this loop until the deterministic checks pass.


> We also have machines that can perfectly and deterministically check written code for correctness.

Please do provide a single example of this preposterous claim.


It's not like testing code is a new thing. Junit is almost 30 years old today.

For functionality: https://en.wikipedia.org/wiki/Unit_testing

With robust enough test suites you can vibe code a HTML5 parser

- https://ikyle.me/blog/2025/swift-justhtml-porting-html5-pars...

- https://simonwillison.net/2025/Dec/15/porting-justhtml/

And code correctness:

- https://en.wikipedia.org/wiki/Tree-sitter_(parser_generator)

- https://en.wikipedia.org/wiki/Roslyn_(compiler)

- https://en.wikipedia.org/wiki/Lint_(software)

You can make analysers that check for deeply nested code, people calling methods in the wrong order and whatever you want to check. At work we've added multiple Roslyn analysers to our build pipeline to check for invalid/inefficient code, no human will be pinged by a PR until the tests pass. And an LLM can't claim "Job's Done" before the analysers say the code is OK.

And you don't need to make one yourself, there are tons you can just pick from:

https://en.wikipedia.org/wiki/List_of_tools_for_static_code_...


> It's not like testing code is a new thing. Junit is almost 30 years old today.

Unit tests check whether code behaves in specific ways. They certainly are useful to weed out bugs and to ensure that changes don't have unintended side effects.

> And code correctness:

These are tools to check for syntactic correctness. That is, of course, not what I meant.

You're completely off the mark here.


What did you mean then if unit tests and syntactic correctness aren't what you're looking for?


Algorithmic correctness? Unit tests are great for quickly poking holes in obviously algorithmically incorrect code, but far from good enough to ensure correctness. Passing unit tests is necessary, not sufficient.

Syntactic correctness is more or less a solved problem, as you say. Doesn't matter if the author is a human or an LLM.


It depends on the algorithm of course. If your code is trying to prove P=NP, of course you can't test for it.

But it's disingenuous to claim that even the majority of code written in the world is so difficult algorithmically that it can't be unit-tested to a sufficient degree.


Suppose you're right and the "majority of code" is fully specified by unit testing (I doubt it). The remaining body of code is vast, and the comments in this thread seem to overlook that.


> Very few people have the expertise to write efficient assembly code, yet everyone relies on compilers and assemblers to translate high-level code to byte-level machine code. I think same concept is true here.

That's a poor analogy which gets repeated in every discussion: compilers are deterministic, LLMs are not.


> That's a poor analogy which gets repeated in every discussion: compilers are deterministic, LLMs are not.

Compilers are not used directly, they are used by human software developers who are also not deterministic.

From the perspective of an organization with a business or service-based mission, they already know how to supervise non-deterministic LLMs because they already know how to supervise non-deterministic human developers.


Why does it matter if LLMs are not deterministic? Who cares?

There should be tests covering meaningful functionality, as long as the code passes the tests, ie. the externally observable behaviour is the same, I don't care. (Especially, if many tests can also be autogenerated with the LLM.)


>>> Very few people have the expertise to write efficient assembly code, yet everyone relies on compilers and assemblers to translate high-level code to byte-level machine code. I think same concept is true her

>> That's a poor analogy which gets repeated in every discussion: compilers are deterministic, LLMs are not.

> Why does it matter if LLMs are not deterministic? Who cares?

In the context of this analogy, it matters. If you're not using this analogy, then sure, only the result matters. But when the analogy being used is deterministic, then, yes, it matters.

You can't very well claim "We'll compare this non-deterministic process to this other deterministic process that we know works."


Yes, but compilers (in the main), do not have a random number generator to decide what output to produce.


The difference is that if you write in C you can debug in C. You don't have to debug the assembly. You can write an english wish list for an LLM but you will still have to debug the generated code. To debug it you will need to understand it.


Why would you have to debug generated code? Let the LLM debug it.


And how do you know it did it right?


> how humans are going to maintain and validate the programs written by LLMs if we no longer know (intimately) how to program

Short answer: we wouldn’t be able to. Slightly-less short answer: unlikely to happen.

Most programmers today can’t explain the physics of computation. That’s fine. Someone else can. And if nobody can, someone else can work backwards to it.


> > how humans are going to maintain and validate the programs written by LLMs if we no longer know (intimately) how to program

> Short answer: we wouldn’t be able to.

That's a huge problem! A showstopper for many kinds of programs!

> Slightly-less short answer: unlikely to happen.

Could you elaborate?

> Most programmers today can’t explain the physics of computation. That’s fine. Someone else can. And if nobody can, someone else can work backwards to it.

That's not the same at all. We have properly abstracted away the physics of computation. A modern computer operates in a way where, if you use it the way you've been instructed to, the physics underlying the computations cannot affect the computation in any undocumented way. Only a very few (and crucically, known and understood!!) physical circumstances can make the physics influence the computations. A layperson does not need to know how those circumstances work, only roughly what their boundaries are.

This is wildly different from the "abstraction" to programming that LLMs provide.


> That's a huge problem! A showstopper for many kinds of programs!

We have automated validation and automated proofs.

Proof is necessary. Do you validate the theorem prover, or trust that it works? Do you prove the compiler is correctly compiling the program (when it matters, you should, given they do sometimes re-write things incorrectly) or trust the compiler?

> We have properly abstracted away the physics of computation. A modern computer operates in a way where, if you use it the way you've been instructed to, the physics underlying the computations cannot affect the computation in any undocumented way.

You trust the hardware the code is running on? You shouldn't.

Rowhammer comes to mind, but it's hardly the only case. US banned some Chinese chips for unspecified potential that this was going on.

For some people it's OK to run a few simple tests on the chip's output to make sure it doesn't have something like the Pentium FDIV bug, for others they remove the silicon wafer from the packaging and scan it with an electron microscope, verify not just each transistor is in the right place but also that the wires aren't close enough to have currents quantum tunnelling or act as an antenna that leaks out some part of a private key.

Some people will go all the way down to the quantum mechanics. Exploits are possible at any level, domains where the potential losses exceed the cost of investigation do exist, e.g. big countries and national security.

Proof is necessary. The abstraction of hardware is good enough for most of us, and given the excessive trust already given to NPM and other package management tools, LLM output that passes automated tests is already sufficient for most.

People like me who don't trust package management tools, or who filed bugs with Ubuntu for not using https enough and think that Ubuntu's responses and keeping the bug open for years smelled like "we have a court order requiring this but can't admit it" (https://bugs.launchpad.net/ubuntu-website-content/+bug/15349...)… well, I can't speak for the paranoid, but I'm also the curious type who learned how to program just because the book was there next to the C64 game tapes.


> We have automated validation and automated proofs.

Example?

> Proof is necessary. Do you validate the theorem prover, or trust that it works? Do you prove the compiler is correctly compiling the program (when it matters, you should, given they do sometimes re-write things incorrectly) or trust the compiler?

I trust that the people who wrote the compiler and use it will fix mistakes. I trust the same people to discover compiler backdoors.

As for the rest of what you wrote: you're missing the point entirely. Rowhammer, the fdiv bug, they're all mistakes. And sure, malevolence also exists. But when mistakes or malevolence are found, they're fixed, or worked around, or at least documented as mistakes. With an LLM you don't even know how it's supposed to behave.


> Example?

Unit tests. Lean. Typed languages. Even more broadly, compilers.

> I trust the same people to discover compiler backdoors.

https://micahkepe.com/blog/thompson-trojan-horse/

> you're missing the point entirely. Rowhammer, the fdiv bug, they're all mistakes. And sure, malevolence also exists.

Rowhammer was a thing because the physics was ignored. Calling it a mistake is missing the point, it demonstrates the falseness of the previous claim:

  We have properly abstracted away the physics of computation. A modern computer operates in a way where, if you use it the way you've been instructed to, the physics underlying the computations cannot affect the computation in any undocumented way.
Rowhammer *is* the physics underlying the computations affecting the computation in a way that was undocumented prior to it getting discovered and, well, documented. Issues like this exist before they're documented, and by definition nobody knows how many unknown things like this have yet to be found.

> But when mistakes or malevolence are found, they're fixed, or worked around, or at least documented as mistakes.

If you vibe code (as in: never look at the code), then find an error with the resulting product, you can still just ask the LLM to fix that error.

I only had a limited time to experiment with this before Christmas (last few days of a free trial, thought I'd give it a go to see what the limits were), and what I found it doing wrong was piling up technical debt, not that it was a mysterious ball of mud beyond its own ability to rectify.

> With an LLM you don't even know how it's supposed to behave.

LLM generated source code: if you've forgotten how to read the source code it made for you to solve your problem and can't learn how to read that source code and can't run the tests of that source code, at which point it's as interpretable as psychology.

The LLMs themselves: yes, this is the "interpretability" problem, people are working on that.


> Unit tests.

Not proof.

> Lean.

Fantastic. But what proportion of developers are ready to formalize their requirements in Lean?

> Typed languages. Even more broadly, compilers.

For sufficiently strong type systems, sure! But then we're back in the above point.

> https://micahkepe.com/blog/thompson-trojan-horse/

I am of course aware. Any malevolent backdoor in your compiler could also exist in your LLM. Or the compiler that compiled the LLM. So you can never do better.

> Rowhammer is the physics underlying the computations affecting the computation in a way that was undocumented prior to it getting discovered and, well, documented. Issues like this exist before they're documented, and by definition nobody knows how many unknown things like this have yet to be found.

Yep. But it's a bug. It's a mistake. The unreliability of LLMs is not.

> If you vibe code (as in: never look at the code), then find an error with the resulting product, you can still just ask the LLM to fix that error.

Of course. But you need skills to verify that it did.

> LLM generated source code: if you've forgotten how to read the source code it made for you to solve your problem and can't learn how to read that source code and can't run the tests of that source code, at which point it's as interpretable as psychology.

Reading source code is such a minute piece of the task of understanding code that I can barely understand what you mean.


> This is wildly different from the "abstraction" to programming that LLMs provide.

I absolutely agree. But consider the unsaid hypothetical here: What if AI coding reaches the point where we can trust it in a similar manner?


At the current time this is essentially science fiction though. This something that the best funded companies on the planet (as well as many many others) work on and seem to be completely unable to achieve despite trying their best for years now, despite an incredible hype.

It feels like if those resources were poured in nuclear fusion for example we'd have it production ready by now.

The field is also not a couple of years old, this has been tried for decades. Sure only now companies decided to put essentially "unlimited" resources into it, but while it showed that certain things are possible and work extremely well, it also strongly hinted that at least the current approach will not get us there, especially not without significant trade-off (that whole over training vs "creativity" and hallucination topic).

Doesn't mean it won't come, but that it doesn't appear a "we just need a bit more development" topic. The state hasn't changed much. Models became bigger and bigger and people added that "thinking" hack and agents and agents for agents, but it also didn't change much about the initial approach and its limitations, given that they haven't cracked these problems after years of hyped funding.

Would be amazing if we would have AIs that automate research and maybe help us fix all the huge problems the world is facing. I'd absolutely love that. I'd also love it if people could easily create tools, games, art. However that's not the reality we live in. Sadly.


> At the current time this is essentially science fiction

I guess my point is so long as LLMs being trustworthy remains science fiction, so will coders forgetting how to code.


You could use AI to tutor you on how to code in a specific instance you need?


Tutoring – whether AI or human – does not provide the in-depth understanding necessary for validation and long-term maintenance. It can be a very useful step on the way there, but only a step.


No, that'll always remain a human skill that can only be taught with knowledge (which a tutor can help you gain) and experience.


Same how we do it now - look at the end result, test it. Testers never went away.

Besides, your comment goes by the assumption that we no longer know (intimately) how to program - is that true? I don't know C or assembly or whatever very well, but I'm still a valuable worker because I know other things.

I mean it could be partially true - but it's like having years of access to Google to quickly find just what I need, meaning I never learned how to read e.g. books on software development or scientific paper end to end. Never felt like I needed to have that skill, but it's a skill that a preceding generation did have.


> Besides, your comment goes by the assumption that we no longer know (intimately) how to program - is that true? I don't know C or assembly or whatever very well, but I'm still a valuable worker because I know other things.

The proposal seems to be for LLMs to take over the task of coding. I posit that if you do not code, you will not gain the skills to do so well.

> I mean it could be partially true - but it's like having years of access to Google to quickly find just what I need, meaning I never learned how to read e.g. books on software development or scientific paper end to end.

I think you've misunderstood what papers are for or what "the previous generation" used them for. It is certainly possible to extract something useful from a paper without understanding what's going on. Googling can certainly help you. That's good. And useful. But not the main point of the paper.


Fair question but haven't we been doing this for decades? Very few people know how to write assembly and yet software has proliferated. This is just another abstraction.


> Fair question but haven't we been doing this for decades? Very few people know how to write assembly and yet software has proliferated. This is just another abstraction.

Not at all. Given any "layperson input", the expert who wrote the compiler that is supposed to turn it into assembly can describe in excruciating detail what the compiler will do and why. Not so with LLMs.

Said differently: If I perturb a source code file with a few bytes here and there, anyone with a modicum of understanding of the compiler used can understand why the assembly changed the way it did as a result. Not so with LLMs.


But there's a limit to that. There's (relatively) very few people that can explain the details of e.g. a compiler, compared to for example React front-end developers that build B2C software (...like me). And these software projects grow, ultimately to the limit of what one person can fit in their head.

Which is why we have lots of "rules" and standards on communication, code style, commenting, keeping history, tooling, regression testing, etc. And I'm afraid those will be the first to suffer when code projects are primarily written by LLMs - do they even write unit tests if you don't tell them to?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: