[Grub-dev] Fwd: Mistery of Wikipedia ban revealed
Bartek Jasicki
thindil2 at gmail.com
Sat Aug 2 18:09:41 UTC 2008
> 2008-08-02, 16:04:17
> Balinny <balinny at gmail.com> wrote:
>
> > I always assumedthat Wikipedia simply has blocked queries with the
> > substring Grub in the User-Agent.
> > And i stand on it. See evidence below. You can even see from the
> > queries that it is blocked by the squids.
> > What's needed in order to crawl wikipedia is to ask the system
> > administrators to lift the block (or changing
> > the user-agent). I don't see the reason the C# client avoids it.
> > Perhaps it's getting a cached response?
>
Ok, everyone can start laugh - after check code for C# client i found
it send wrong User-Agent header. Bug fixed (new version of client -
0.7.5 are available). So on 100% this is problem with Squid.
But there is still problem with Accept header. Most crawlers have it set
on:
Accept: */*
Headers for googlebot:
http://209.85.135.104/search?q=cache:Z8EzzvyAqH8J:pgl.yoyo.org/http/browser-headers.php+googlebot+http+headers&hl=en&ct=clnk&cd=5
And for crawl.yahoo:
http://cache.search.yahoo.net/search/cache?ei=UTF-8&p=http%3A%2F%2Fpgl.yoyo.org%2Fhttp%2Fbrowser-headers.php&y=Search&rd=r1&meta=vc%3Dpl&fr=yfp-t-501&fp_ip=PL&u=pgl.yoyo.org/http/browser-headers.php&d=FxR1ZC72ROH4&icp=1&.intl=us
Bartek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikia.com/pipermail/grub-dev/attachments/20080802/3c8f6b77/attachment.html
More information about the Grub-dev
mailing list