[Search-l] (crawler) Re: call to action....
jer
jeremie at jabber.org
Tue Jun 12 19:25:22 UTC 2007
> Incidentally if we do build on Sami's software I can offer a crawler
> that will do
> 50 pages/sec using two very modest domestic PCs (one crawling/
> parsing and
> one saving metadata in a MySQL database). It's written in C and is
> multi-threaded.
I'm pretty sure that would be useful all by itself if you're interested.
I'm curious, since there's a few other C(or C++)-based crawlers
(larbin, htdig, wget) how yours might compare? Do you do anything
special with dns, robots, duplicate detection, rate management,
spider traps, hostname reduction, etc? Just wondering what aspect
inspired you to create another one, maybe it was just code style/
control :)
Jer
More information about the Search-l
mailing list