[Search-l] Grub Update

John McCormac jmcc at hackwatch.com
Thu Aug 2 21:56:21 UTC 2007


jer wrote:
> Where I really start to believe in the distributed crawling is when  the 
> clients get more intelligent, recognizing 404 pages, junk pages,  spider 
> traps, common patterns (parked pages), and so on.

This is the danger of confusing the function of crawlers with that of 
the search backend. The key to a fast and efficient crawl is that the 
crawler is streamlined and handles as many pages as possible in as short 
a time as possible. Breaking out to parse html is processor intensive 
and slows down crawling considerably.

Regards...jmcc
-- 
******************************************************
John McCormac  *  e-mail: jmcc at whoisireland.com
MC2            *  voice:  +353-51-873640
22 Viewmount   *  web:  http://www.whoisireland.com/
Waterford      *  blog: http://blog.whoisireland.com
Ireland        *  Irish Domain Stats & Market Research
******************************************************



More information about the Search-l mailing list