[Search-l] Introducing the Wikia Search blog
Anthony
wikiasari at inbox.org
Fri Jul 11 19:39:39 UTC 2008
On Wed, Jul 2, 2008 at 1:26 PM, Dan Lewis <dan at wikia-inc.com> wrote:
> In fact, Jimmy has a post going live in
> about an hour about why Grub is so important to the future of the Internet.
>
Maybe you or someone else on this list can explain this to me...
What's the point of having a publicly available, up-to-date, complete
copy of the web, if I have to access it over the Internet? I already
have access to a publicly available, up-to-date, complete copy of the
web, over the Internet. It is, the web itself.
I suspect this is mainly just Jimmy and you oversimplifying things
here. What all search engines need is an *indexed*, publicly
available, up-to-date, complete copy of the web. But to add value a
search engine needs much more than that, really. Say I invented the
concept of pagerank, and wanted to add it on to the generic Wikia
Search index. Without Grub/Wikia Search, I'd have to crawl the entire
web noting links. With Grub/Wikia Search providing me just a copy the
web (without pagerank data, since we're pretending that hasn't been
invented yet), I haven't saved much. Sure, I don't have to deal with
pipelining http requests to save on latency, but I still have to
download the entire web in order to analyze the links. Now,
presumably Grub/Wikia Search will offer me a *filtered* copy of the
web so I only have to download the map of links. For those familiar
with Wikipedia dumps, something like pagelinks.sql (for the entire
web) would be great.
But just a publicly available, up-to-date, complete copy of the web?
Not useful at all.
Anthony
More information about the Search-l
mailing list