[Search-l] Grub Update
John McCormac
jmcc at hackwatch.com
Thu Aug 2 21:56:21 UTC 2007
jer wrote:
> Where I really start to believe in the distributed crawling is when the
> clients get more intelligent, recognizing 404 pages, junk pages, spider
> traps, common patterns (parked pages), and so on.
This is the danger of confusing the function of crawlers with that of
the search backend. The key to a fast and efficient crawl is that the
crawler is streamlined and handles as many pages as possible in as short
a time as possible. Breaking out to parse html is processor intensive
and slows down crawling considerably.
Regards...jmcc
--
******************************************************
John McCormac * e-mail: jmcc at whoisireland.com
MC2 * voice: +353-51-873640
22 Viewmount * web: http://www.whoisireland.com/
Waterford * blog: http://blog.whoisireland.com
Ireland * Irish Domain Stats & Market Research
******************************************************
More information about the Search-l
mailing list