[Search-l] Introducing the Wikia Search blog

Anthony wikiasari at inbox.org
Fri Jul 11 19:39:39 UTC 2008


On Wed, Jul 2, 2008 at 1:26 PM, Dan Lewis <dan at wikia-inc.com> wrote:
> In fact, Jimmy has a post going live in
> about an hour about why Grub is so important to the future of the Internet.
>
Maybe you or someone else on this list can explain this to me...
What's the point of having a publicly available, up-to-date, complete
copy of the web, if I have to access it over the Internet?  I already
have access to a publicly available, up-to-date, complete copy of the
web, over the Internet.  It is, the web itself.

I suspect this is mainly just Jimmy and you oversimplifying things
here.  What all search engines need is an *indexed*, publicly
available, up-to-date, complete copy of the web.  But to add value a
search engine needs much more than that, really.  Say I invented the
concept of pagerank, and wanted to add it on to the generic Wikia
Search index.  Without Grub/Wikia Search, I'd have to crawl the entire
web noting links.  With Grub/Wikia Search providing me just a copy the
web (without pagerank data, since we're pretending that hasn't been
invented yet), I haven't saved much.  Sure, I don't have to deal with
pipelining http requests to save on latency, but I still have to
download the entire web in order to analyze the links.  Now,
presumably Grub/Wikia Search will offer me a *filtered* copy of the
web so I only have to download the map of links.  For those familiar
with Wikipedia dumps, something like pagelinks.sql (for the entire
web) would be great.

But just a publicly available, up-to-date, complete copy of the web?
Not useful at all.

Anthony



More information about the Search-l mailing list