How would it work if LLMs provide incorrect reports in the first place? Have a look at the actual HackerOne reports and their comments.
The problem is the complete stupidity of people. They use LLMs to convince the author of the curl that he is not correct about saying that the report is hallucinated. Instead of generating ten LLM comments and doubling down on their incorrect report, they could use a bit of brain power to actually validate the report. It does not even require a lot of skills, you have to manually tests it.
Let the reporter duke it out with the project's gatekeeping LLM. If it keeps going on for long enough a human can quickly skim the exchange. It should be immediately obvious if the reporter is making sensible rebuttals or just throwing more slop at the wall.
I think fighting fire with fire is likely the correct answer here.
The problem is the complete stupidity of people. They use LLMs to convince the author of the curl that he is not correct about saying that the report is hallucinated. Instead of generating ten LLM comments and doubling down on their incorrect report, they could use a bit of brain power to actually validate the report. It does not even require a lot of skills, you have to manually tests it.