[Search-l] Introducing the Wikia Search blog

Jeremie Miller jeremie at jabber.org
Sat Jul 12 17:00:48 UTC 2008


> But just a publicly available, up-to-date, complete copy of the web?
> Not useful at all.

It's not just a big blob, the goal is to have various "functional"  
indexes of it and APIs into it, not a typical ranked keyword index but  
just the ability to select subsets based on URL, content-type, etc  
meta-data.

Also the data is (in various states) loaded into a hadoop cluster and  
contributed MapReduce jobs can be run against it, the only restriction  
is that the MR jobs are open source and their outputs are available to  
everyone, this is a community resource.

It's a little early yet, but work is progressing towards these goals  
for Grub :)

Jer





More information about the Search-l mailing list