Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't know, I prompted Opus 4.5 "Tell me the reasons why this report is stupid" on one of the example slop reports and it returned a list of pretty good answers.[1]

Give it a presumption of guilt and tell it to make a list, and an LLM can do a pretty good job of judging crap. You could very easily rig up a system to give this "why is it stupid" report and then grade the reports and only let humans see the ones that get better than a B+.

If you give them the right structure I've found LLMs to be much better at judging things than creating them.

Opus' judgement in the end:

"This is a textbook example of someone running a sanitizer, seeing output, and filing a report without understanding what they found."

1. https://claude.ai/share/8c96f19a-cf9b-4537-b663-b1cb771bfe3f





Ok, run the same prompt on a legitimate bug report. The LLM will pretty much always agree with you

find me one

https://hackerone.com/curl/hacktivity Add a filter for Report State: Resolved. FWIW I agree with you, you can use LLMs to fight fire with fire. It was easy to see coming, e.g. it's not uncommon in sci-fi to have scenarios where individuals have their own automation to mediate the abuses of other people's automation.

I tried your prompt with https://hackerone.com/reports/2187833 by copying the markdown, Claude (free Sonnet 4.5) begins: "I can't accurately characterize this security vulnerability report as "stupid." In fact, this is a well-written, thorough, and legitimate security report that demonstrates: ...". https://claude.ai/share/34c1e737-ec56-4eb2-ae12-987566dc31d1

AI sycophancy and over-agreement are annoying but people who just parrot those as immutable problems or impossible hurdles must just never try things out.


No. Learn about the burden of proof and get some basic reason - your AI sycophancy will simply disappear.

"Tell me the reasons why this report is stupid" is a loaded prompt. The tool will generate whatever output pattern matches it, including hallucinating it. You can get wildly different output if you prompt it "Tell me the reasons why this report is great".

It's the same as if you searched the web for a specific conclusion. You will get matches for it regardless of how insane it is, leading you to believe it is correct. LLMs take this to another level, since they can generate patterns not previously found in their training data, and the output seems credible on the surface.

Trusting the output of an LLM to determine the veracity of a piece of text is a baffilingly bad idea.


>"Tell me the reasons why this report is stupid" is a loaded prompt.

This is precisely the point. The LLM has to overcome its agreeableness to reject the implied premise that the report is stupid. It does do this but it takes a lot, but it will eventually tell you "no actually this report is pretty good"

The point being filtering out slop, we can be perfectly find with false rejections.

The process would look like "look at all the reports, generate a list of why each of them is stupid, and then give me a list of the ten most worthy of human attention" and it would do it and do a half-decent job at it. It could also pre-populate judgments to make the reviewer's life easier so they could very quickly glance at it to decide if it's worthy of a deeper look.


And if you ask why it's accurate it'll spaff out another list of pretty convincing answers.

It does indeed, but at the end added:

>However, I should note: without access to the actual crash file, the specific curl version, or ability to reproduce the issue, I cannot verify this is a valid vulnerability versus expected behavior (some tools intentionally skip cleanup on exit for performance). The 2-byte leak is also very small, which could indicate this is a minor edge case or even intended behavior in certain code paths.

Even biased towards positivity it's still giving me the correct answer.

Given a neutral "judge this report" prompt we get

"This is a low-severity, non-security issue being reported as if it were a security vulnerability." with a lot more detail as to why

So positive, neutral, or negative biased prompts all result in the correct answer that this report is bogus.


Yet this is not reproducible. This is the whole issue with LLMs: they are random.

You cannot trust that it'll do a good job on all reports so you'll have to manually review the LLMs reports anyways or hope that real issues didn't get false-negatives or fake ones got false-positives.

This is what I've seen most LLM proponents do: they gloss over the issues and tell everyone it's all fine. Who cares about the details? They don't review the gigantic pile of slop code/answers/results they generate. They skim and say YOLO. Worked for my narrow set of anecdotal tests, so it must work for everything!

IIRC DOGE did something like this to analyze government jobs that were needed or not and then fired people based on that. Guess how good the result was?

This is a very similar scenario: make some judgement call based on a small set of data. It absolutely sucks at it. And I'm not even going to get into the issue of liability which is another can of worms.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: