[Search-l] Introducing the Wikia Search blog
Jimmy Wales
jwales at wikia.com
Fri Jul 11 23:10:35 UTC 2008
Anthony, the job of the crawler is a lot more complex than you seem to
realize. A publicly available, up-to-date, complete copy of the web is
nontrivial to do, simply because must of what you get from http requests
will not properly be considered part of "the web" due to spider traps,
http://en.wikipedia.org/wiki/Spider_trap , etc.
In any event, we provide the index, the algorithm, the data, everything
publicly.
--Jimbo
More information about the Search-l
mailing list