[Search-l] Numbers (Using Grid Computing for Wikia Project)

Bani borboleta at gmail.com
Tue May 15 04:51:05 UTC 2007


I don't know exactly for these specs, but if it helps, I'm
participating in a program in which we have to build a complete search
engine, and we have just finished the crawler this week.
The computers in our lab are about 4Ghz CPU and 2GB RAM and we have an
university broadband connection. The minimum requirements asked from
us was that we crawled 1 million pages per day, even after crawling 10
million. So, you are supposed to expect 10 million pages in 10 days if
your algorithm and connection are good. But it gets harder as the size
of your repository grows.
About the amount of disk, we are just starting the indexer now, so I
have no idea yet on how big the index will be.

Vanessa

On 5/15/07, Seth Finkelstein <sethf at sethf.com> wrote:
> with a home broadband connection. How well does this work for a text-only
> crawl of the web, to get something useful, and how long would it take?
> That is, after crawling X days, you can expect a usable index of Y
> documents, which would use Z amount of disk.



More information about the Search-l mailing list