[Search-l] Using Grid Computing for Wikia Project
Seth Finkelstein
sethf at sethf.com
Fri May 11 01:25:33 UTC 2007
On Fri, May 11, 2007 at 09:00:11AM +0800, Grahame Gould wrote:
> Seth,
>
> As I understand it, the whole point of distributed computing is not to
> tie up your ISP but you use your spare CPU power. You download a
> project, your computer works on it and send back results. All the
> projects I've seen wouldn't use more of your internet than having your
> mail program running.
Right, because those are projects for *CPU*. I'm saying the
major resource bottleneck right now for search project experiments is
not *CPU* so much as *bandwidth*.
> I'm not sure what is hoped to be accomplished by this Wikia at Home project.
A search engine is roughly made up of crawling, storage,
indexing, ranking algorithms, and serving results. Of these, I think
only the last, the serving results, lends itself (relatively) *easily*
to grid computing. Storage, indexing, and ranking algorithms can
basically be handled by a single home machine for experimental
purposes. But the crawling requires a huge amount of bandwidth, and
doing that in parallel yet coordinating the results is very hard.
--
Seth Finkelstein Consulting Programmer http://sethf.com/
Infothought blog - http://sethf.com/infothought/blog/
Interview: http://sethf.com/essays/major/greplaw-interview.php
More information about the Search-l
mailing list