[Search-l] Introducing the Wikia Search blog
Jeremie Miller
jeremie at jabber.org
Sat Jul 12 17:00:48 UTC 2008
> But just a publicly available, up-to-date, complete copy of the web?
> Not useful at all.
It's not just a big blob, the goal is to have various "functional"
indexes of it and APIs into it, not a typical ranked keyword index but
just the ability to select subsets based on URL, content-type, etc
meta-data.
Also the data is (in various states) loaded into a hadoop cluster and
contributed MapReduce jobs can be run against it, the only restriction
is that the MR jobs are open source and their outputs are available to
everyone, this is a community resource.
It's a little early yet, but work is progressing towards these goals
for Grub :)
Jer
More information about the Search-l
mailing list