Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm taking the OP at his word here, but he specifically claims that the proxy service making these requests will also make requests independent of a `go get` or other user-initiated action, sometimes to the tune of a dozen repos at once and 2500 requests per hour. That sounds like a crawler to me, and even if you want to argue the semantic meaning of the word "crawler," I strongly feel that robots.txt is the best available solution to inform the system what its rate limit should be.


When I mean crawler I mean something that discovers new pages. Refreshing the same URL isn't really crawling.

But yes, it may be the best available solution in this case, even if I would argue that it isn't really it's main purpose.


After reading this and your response to a sibling comment I wholeheartedly disagree with you on both the specific definition of the word crawler and what the "main purpose" of robots.txt is, but glad we can agree that Google should be doing more to respect rate limits :)


What you're thinking about, in my opinion, is best referred to as a spider.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: