[Search-l] Introducing the Wikia Search blog

Jimmy Wales jwales at wikia.com
Fri Jul 11 23:10:35 UTC 2008


Anthony, the job of the crawler is a lot more complex than you seem to 
realize.  A publicly available, up-to-date, complete copy of the web is 
nontrivial to do, simply because must of what you get from http requests 
will not properly be considered part of "the web" due to spider traps,
http://en.wikipedia.org/wiki/Spider_trap , etc.

In any event, we provide the index, the algorithm, the data, everything 
publicly.

--Jimbo



More information about the Search-l mailing list