[Search-l] Grub Update

John McCormac jmcc at hackwatch.com
Fri Aug 3 00:53:03 UTC 2007


Jimmy Wales wrote:
> And this is the real potential strength of a distributed approach, I 
> think.  With a small number of crawling machines, you perhaps have to 
> fetch fetch fetch fetch without a lot of "thinking".  But with 10,000 or
> 100,000 or 1,000,000 machines pitching in?

If anything, it would require a lot more thinking at the intial stage as 
the handling process has to be well designed. This would be a largely 
asynchronous crawl with crawlers dropping in and out of the network at a 
far greater rate than would happen with dedicated crawlers.

The crawl is not a one-off thing either. It is an ongoing process that 
involves a number of search indices. Some of them will be live and 
others will be development indices.

> I am not sure what the best architecture for that will end up being -- 
> that's an empirical question and I don't think we have enough experience 
> yet, any of us, to really know the answer.  So, we move forward and 
> learn. :)

Well there are a lot of possibilities. The trick is chosing the right 
one and turning into reality.

Regards...jmcc
-- 
******************************************************
John McCormac  *  e-mail: jmcc at whoisireland.com
MC2            *  voice:  +353-51-873640
22 Viewmount   *  web:  http://www.whoisireland.com/
Waterford      *  blog: http://blog.whoisireland.com
Ireland        *  Irish Domain Stats & Market Research
******************************************************



More information about the Search-l mailing list