[Search-l] Using Grid Computing for Wikia Project
Michael Christen
mc at yacy.net
Fri May 11 08:27:49 UTC 2007
> I'm saying the
> major resource bottleneck right now for search project experiments is
> not *CPU* so much as *bandwidth*.
>
In the last years the YaCy project has seen many bottlenecks, but the
major one that we currently see is (you will be surprised):
IO-Load
It turned out that indexing is a heavy db-application, and we want
that people running YaCy can simultanously _work_ on their computer
while the indexer is running. Therefore we slow down indexing a bit
> A search engine is roughly made up of crawling, storage,
> indexing, ranking algorithms, and serving results. Of these, I think
> only the last, the serving results, lends itself (relatively) *easily*
> to grid computing.
>
index-chunks must be distributed (DHT-positions) to other grid-nodes
before 'serving results' takes place. Thats another nework task, but
underestimated: its much more again a db-task. Needs IO-load.
> Storage, indexing, and ranking algorithms can
> basically be handled by a single home machine for experimental
> purposes. But the crawling requires a huge amount of bandwidth,
>
there is enough bandwith for every home-user. no problem. you need
only a fraction of that what you use for file-sharing with other
software.
> and doing that in parallel yet coordinating the results is very hard.
>
this is in fact easy. If you restrict the coordination of crawling to
a specific subset (the leaves of the crawl tree) it is just no problem.
Greetings,
Michael
yacy.net
More information about the Search-l
mailing list