A 20k LOC PR isn’t reviewable in any normal workflow/process.
The only moves are refusing to review it, taking it up the chain of authority, or rubber stamping it with a note to the effect that it’s effectively unreviewable so rubber stamping must be the desired outcome.
I don't understand this attitude. Tests are important parts of the codebase. Poorly written tests are a frequent source of headaches in my experience, either by encoding incorrect assumptions, lying about what they're testing, giving a false sense of security, adding friction to architectural changes/refactors, etc. I would never want to review even 2k lines of test changes in one go.
Preach. Also, don't forget making local testing/CI take longer to run, which costs you both compute and developer context switching.
I've heard people rave about LLMs for writing tests, so I tried having Claude Code generate some tests for a bug I fixed against some autosave functionality - (every 200ms, the auto-saver should initiate a save if the last change was in the previous 200ms). Claude wrote five tests that each waited 200ms (!) adding a needless entire second to the run-time of my test suite.
I went in to fix it by mocking out time, and in the process realized that the feature was doing a time stamp comparisons when a simpler/non-error prone approach was to increment a logical clock for each change instead.
The tests I've seen Claude write vary from junior-level to flat-out-bad. Tests are often the first consumer of a new interface, and delegating them to an LLM means you don't experience the ergonomics of the thing you just wrote.
i think the general take away for all of this is the model can write the code but you still have to design it. I don't disagree with anything you've said, and I'd say my advice is engage more, iterate more, and work in small steps to get the right patterns and rules laid out. It wont work well on day one if you don't set up the right guidelines and guardrails. That's why it's still software engineering, despite being a different interaction medium.
And if the 10k lines of tests are all garbage, now what? Because tests are the 1 place you absolutely should not delegate to AI outside of setting up the boilerplate/descriptions.
The only moves are refusing to review it, taking it up the chain of authority, or rubber stamping it with a note to the effect that it’s effectively unreviewable so rubber stamping must be the desired outcome.