Their Readme.md is weirdly obsessed with "2 hours":
"before Claude Opus 4.5 started doing better than humans given only 2 hours"
"Claude Opus 4.5 in a casual Claude Code session, approximately matching the best human performance in 2 hours"
"Claude Opus 4.5 after 2 hours in our test-time compute harness"
"Claude Sonnet 4.5 after many more than 2 hours of test-time compute"
So that does make one wonder where this comes from. Could just be LLM generated with a talking point of "2 hours", models can fall in love with that kind of stuff. "after many more than 2 hours" is a bit of a tell.
Would be quite curious to know though. How I usually design take home assignments is:
1. Candidate has several _days_ to complete (usually around a week).
2. I design the task to only _take_ 2-4 hours, informing the candidate about that, but that doesn't mean they can't take longer. The subsequent interview usually reveals if they went overboard or struggled more than expected.
But I can easily picture some places sending a candidate the assignment and asking them to hand in their work within two hours. Similar to good old coding competitions.
"before Claude Opus 4.5 started doing better than humans given only 2 hours"
"Claude Opus 4.5 in a casual Claude Code session, approximately matching the best human performance in 2 hours"
"Claude Opus 4.5 after 2 hours in our test-time compute harness"
"Claude Sonnet 4.5 after many more than 2 hours of test-time compute"
So that does make one wonder where this comes from. Could just be LLM generated with a talking point of "2 hours", models can fall in love with that kind of stuff. "after many more than 2 hours" is a bit of a tell.
Would be quite curious to know though. How I usually design take home assignments is:
1. Candidate has several _days_ to complete (usually around a week).
2. I design the task to only _take_ 2-4 hours, informing the candidate about that, but that doesn't mean they can't take longer. The subsequent interview usually reveals if they went overboard or struggled more than expected.
But I can easily picture some places sending a candidate the assignment and asking them to hand in their work within two hours. Similar to good old coding competitions.