Why do you assume it’s cheating?

measurablefunc · 2026-01-21T10:55:59 1768992959

Because it's a well know failure mode of neural networks & scalar valued optimization problems in general: https://www.nature.com/articles/s42256-020-00257-z

saagarjha · 2026-01-21T11:41:56 1768995716

Again, you can just read the code

measurablefunc · 2026-01-21T16:46:42 1769014002

You're missing the point. There is no evidence to support their claims which means they are more than likely leaking the memory into the LLM prompt & it is cheating by simply loading constants into memory instead of computing anything. This is why formal specifications are used to constrain optimization. Without proof that the code is equivalent you might as well just load constants into memory & claim victory.

fc417fc802 · 2026-01-21T17:23:57 1769016237

> There is no evidence to support their claims

Do you make a habit of not presuming even basic competence? You believe that Anthropic left the task running for hours, got a score back, and never bothered to examine the solution? Not even out of curiosity?

Also if it was cheating you'd expect the final score to be unbelievably low. Unless you also suppose that the LLM actively attempted to deceive the human reviewers by adding extra code to burn (approximately the correct number of) cycles.

measurablefunc · 2026-01-21T18:48:54 1769021334

This has nothing to do w/ me & consistently making it a personal problem instead of addressing the claims is a common tactic for people who do not know what it means to present evidence for their claims. Anthropic has not provided the necessary evidence for me to conclude that their LLM is not cheating. I have no opinion on their competence b/c that is not what is at issue. They could be incompetent & not notice that their LLM is cheating at their take home exam but I don't care about that.

red75prime · 2026-01-21T11:32:55 1768995175

And? Anthropic is not aware of this 2020 paper? The problem is not solvable?

measurablefunc · 2026-01-21T16:53:35 1769014415

Why are you asking me? Email & ask Anthropic.