Set your site to 401 unauthorized with a basic challenge if an auth header isn’t sent, and set the auth description to “Enter anything to proceed. All human access is authorized. Unauthorized non-human access is prohibited.”. Crawlers can’t parse the instructions and will deadstop on them, while people will shrug and enter any password, which will work.
Anubis is also viable and popular, but it lacks the legal threat to AI of being able to file a federal hacking claim against a scraper’s unauthorized intrusion if they code their scraper to transmit an empty/invalid/valid authentication header.
I think you might be overestimating how little I care if humans don’t interact with my content. If it’s not worth five seconds of their time to type a word and click OK, I have better things to do than trying to coddle them into caring about me.
What decade do you think it is? :) Depending on who you ask, captcha bots have become better at solving them than humans...
There's almost nothing you can do that "AI" can't while keeping it easy enough for your average joe that wants to login. Especially considering "the grandma test"...
It’s trivial for them to solve this automatically. It’s also illegal under the computer fraud and abuse act, because then they’re knowingly and intentionally bypassing for profit (fraud) a clearly defined authorization barrier that explicitly prohibits their (ab)use. No joke, that’s an actual U.S. federal crime, and it would hold up well enough to reach a courtroom to be judged. (Can’t say what would happen after that — the law is not code, etc.) That’s why it’s so effective: it doesn’t matter that they can bypass it, it matters that they could be jailed if they do so, and corporations only take risks whose profits can pay for the penalties incurred. The cost of federal prison is well in excess of what most corporate leaders will tolerate :)
ps. I learned assembly programming from my grandma. She would have loved to discuss this problem with me.
Just because it's called authentication doesn't mean that all possible applications of it qualify. Following directions provided by the owner isn't breaking into the system. Clicking "Yes I'm over 18" when you aren't isn't a violation of the CFAA any more than any other ToS violation is.
What matters isn't the box presenting the challenge but rather the nature of the challenge itself.
Perhaps! But for some weird reason, I bet you’ll find no one except e.g. Yandex is willing to code their AI scraper to attempt passwords at websites — because, as noted, only a courtroom can determine with certainty the exposure to legal risk here, and so long as usage isn’t widespread there’s no profit in accepting that risk.
Very feasible solution I guess the only issue is that we now have to add friction for legitimate users which will only accelerate their migration to AI summaries at the top of the page.
AI is on course to destroy anonymous web browsing due to the costs it inflicts on server operators; user friction, whether via manual input or proof of work interstitial or otherwise, is the only remaining alternative to attested identity, pseudonymous or not. This is HN, so I’m suggesting a non-attestation solution, which is therefore user friction.
Oh I agree with you entirely, not in disagreement at all. Just highlighting the absurdity of the world that this is where we are.
I was discussing with my partner recently the fact that I believe this is all heading to a licensed, centralised internet. I can very easily foresee the path we are blindly wandering leading to an authoritarian space in which websites and hosting will only be available to a select few, for a hefty fee.
The justification will be ‘think of the children’ or ‘we need to control what your AI agents can connect to as there are bad actors whom will convince your AI agents to give them your funds and personal, private data’
Obviously just pure speculative hyperbole to be taken with a tbsp of salt, but, yeah, I can see the path quite clearly judging by how little friction governments get for their reckless and nefarious actions nowadays.
Ultimately, the web will revert to where it was before full text search harvested and profited from all the curated indexes of content — and, bluntly, we did fine back then! It was a lot less shitty of a web than we have now :) And it’ll be a lot easier to be open with others when someone can’t just full-text search their way to our bulletin board (forums used to be called that, since they derived their core structure from BBSes!) to hate on us. Will be harder to use the web for problem-solving in esoteric corner cases, because you’ll have to find the right enthusiast community and jump through some hoops to sign up and become a trusted member. Can’t wait, honestly.
> Crawlers can’t parse the instructions and will deadstop on them, while people will shrug and enter any password, which will work.
Well, that just seems like something where they will just fix the glitch. Do you really think that the devs of bots can't fix this instead of it just haven't fixed it yet?
Corporations can't be jailed. They just pay fines. So, not really sure what the point is. You also have to realize that not all devs that create bots are under the jurisdiction of US law. So again I ask, what's your point?
Feeding AI with new high quality content only add to the problem. We must stop feeding it.