[Grub-dev] Why grub is blacklisted so widely.

Jason Pump jason at dataalchemy.com
Wed Jan 30 22:48:50 UTC 2008


A lot of webmasters become very upset at the following behaviors by web
crawlers -

Accessing the same pages over and over again in an unreasonable time frame
or the same deep-level page simultaneously from multiple servers.
Accessing many dead links or other error pages on a web site, causing site
statistics e.g. 500 errors / hour to set off site alerts
Accessing a site too fast
Accessing a page that is denied by robots.txt
Fetching pages without first fetching robots.txt
Fetching may more pages for their crawl then the users that use that crawl
ever do.
Not refetching robots.txt frequently enough to pick up recent changes.

Doing most of these things are what got grub a bad name the first time
around. During development phases most of these problems can be avoided with
good architecture and planning. Some thought and discussion should perhaps
be made, at this point, as to how to avoid being branded as bad netizens
moving forwards.

Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikia.com/pipermail/grub-dev/attachments/20080130/f0ccea2c/attachment.html 


More information about the Grub-dev mailing list