[Search-l] Using Grid Computing for Wikia Project

Seth Finkelstein sethf at sethf.com
Fri May 11 01:25:33 UTC 2007


On Fri, May 11, 2007 at 09:00:11AM +0800, Grahame Gould wrote:
> Seth,
> 
> As I understand it, the whole point of distributed computing is not to
> tie up your ISP but you use your spare CPU power.  You download a
> project, your computer works on it and send back results.  All the
> projects I've seen wouldn't use more of your internet than having your
> mail program running.

	Right, because those are projects for *CPU*. I'm saying the
major resource bottleneck right now for search project experiments is
not *CPU* so much as *bandwidth*.

> I'm not sure what is hoped to be accomplished by this Wikia at Home project.

	A search engine is roughly made up of crawling, storage,
indexing, ranking algorithms, and serving results. Of these, I think
only the last, the serving results, lends itself (relatively) *easily*
to grid computing. Storage, indexing, and ranking algorithms can
basically be handled by a single home machine for experimental
purposes. But the crawling requires a huge amount of bandwidth, and
doing that in parallel yet coordinating the results is very hard.

--
Seth Finkelstein  Consulting Programmer  http://sethf.com/
Infothought blog - http://sethf.com/infothought/blog/
Interview: http://sethf.com/essays/major/greplaw-interview.php



More information about the Search-l mailing list