Doesn't change the fact that what I mentioned greatly improves agent accuracy.

dns_snek · 2026-01-12T13:37:30 1768225050

AI-generated implementation with AI-generated tests left me with some of the worst code I've witnessed in my life. Many of the passing tests it generated were tautologies (i.e. they would never fail even if behavior was incorrect).

When the tests failed the agent tended to change the (previously correct) test making it pass but functionally incorrect, or it "wisely" concluded that both the implementation and the test are correct but that there are external factors making the test fail (there weren't).

It behaved much like a really naive junior.

simonw · 2026-01-12T14:33:57 1768228437

Which coding agent and which model?