<div dir="ltr"><div class="gmail_quote">> 2008-08-02, 16:04:17<br>
> Balinny <<a href="mailto:balinny@gmail.com">balinny@gmail.com</a>> wrote:<br>
><br>
> > I always assumedthat Wikipedia simply has blocked queries with the<br>
> > substring Grub in the User-Agent.<br>
> > And i stand on it. See evidence below. You can even see from the<br>
> > queries that it is blocked by the squids.<br>
> > What's needed in order to crawl wikipedia is to ask the system<br>
> > administrators to lift the block (or changing<br>
> > the user-agent). I don't see the reason the C# client avoids it.<br>
> > Perhaps it's getting a cached response?<br>
><br>
<br>
Ok, everyone can start laugh - after check code for C# client i found<br>
it send wrong User-Agent header. Bug fixed (new version of client -<br>
0.7.5 are available). So on 100% this is problem with Squid.<br>
<br>
But there is still problem with Accept header. Most crawlers have it set<br>
on:<br>
<br>
Accept: */*<br>
<br>
Headers for googlebot:<br>
<br>
<a href="http://209.85.135.104/search?q=cache:Z8EzzvyAqH8J:pgl.yoyo.org/http/browser-headers.php+googlebot+http+headers&hl=en&ct=clnk&cd=5" target="_blank">http://209.85.135.104/search?q=cache:Z8EzzvyAqH8J:pgl.yoyo.org/http/browser-headers.php+googlebot+http+headers&hl=en&ct=clnk&cd=5</a><br>
<br>
And for crawl.yahoo:<br>
<br>
<a href="http://cache.search.yahoo.net/search/cache?ei=UTF-8&p=http%3A%2F%2Fpgl.yoyo.org%2Fhttp%2Fbrowser-headers.php&y=Search&rd=r1&meta=vc%3Dpl&fr=yfp-t-501&fp_ip=PL&u=pgl.yoyo.org/http/browser-headers.php&d=FxR1ZC72ROH4&icp=1&.intl=us" target="_blank">http://cache.search.yahoo.net/search/cache?ei=UTF-8&p=http%3A%2F%2Fpgl.yoyo.org%2Fhttp%2Fbrowser-headers.php&y=Search&rd=r1&meta=vc%3Dpl&fr=yfp-t-501&fp_ip=PL&u=pgl.yoyo.org/http/browser-headers.php&d=FxR1ZC72ROH4&icp=1&.intl=us</a><br>
<font color="#888888"><br>
Bartek<br>
</font></div><br></div>