[Search-l] (crawler) Re: call to action....

jer jeremie at jabber.org
Tue Jun 12 19:25:22 UTC 2007


> Incidentally if we do build on Sami's software I can offer a crawler
> that will do
> 50 pages/sec using two very modest domestic PCs (one crawling/ 
> parsing and
> one saving metadata in a MySQL database). It's written in C and is
> multi-threaded.

I'm pretty sure that would be useful all by itself if you're interested.

I'm curious, since there's a few other C(or C++)-based crawlers  
(larbin, htdig, wget) how yours might compare?  Do you do anything  
special with dns, robots, duplicate detection, rate management,  
spider traps, hostname reduction, etc?  Just wondering what aspect  
inspired you to create another one, maybe it was just code style/ 
control :)

Jer




More information about the Search-l mailing list