[Search-l] wrong questions

Jimmy Wales jwales at wikia.com
Thu May 15 14:25:34 UTC 2008


Paul Vixie wrote:
> 1. why is ISC's the only backend?  jer's vision is backend syndication, so,
> if his XML schema is stable and if there's at least one f/l/oss implementation
> of crawling and of indexing, then, why aren't there more crawlers and more
> indexers, conforming to jer's XML, possibly flooding data between each other
> and possibly dividing up the workload so that all crawlers don't have to
> crawl all sites?  ISC ought to have peers, and we ought to be able to have
> gentlemen's agreements, like, "we'll do [a-l].com, you do [m-z].com", etc.

We strongly support this notion of backend syndication, and we are 
hopeful that as the infrastructure and protocols mature, we will get 
more and more people working with us on this.

> 2. why is Wikia's the only frontend?  again referring to the syndication
> model, and knowing that there are other "social search engines", when will
> we see someone other than wikia use ISC's backend, or any other backend whose
> data can be reached using jer's XML?

Hopefully people can start on this soon... as has already been pointed 
out, a great start might be to use our code... sounds like Jer is 
deciding on the license now.

> 3. who is driving the syndication model?  it's clear that ISC knows how to
> provide network and power, and that jer knows how to design the system and
> build various parts of it, but who is the champion for jer's vision -- who
> will drive us to better answers for #1 and #2 above?  who ought to be in here
> answering critics and beating the drum, which is a distraction to jer (and
> candidly he's too busy to do this part well unless he drops other stuff 
> that's already late)?  remembering that jimbo keeps this issue alive in the
> press, the overall project still lacks a day to day "programme manager".

We just transitioned our New York office to fulltime work on the search 
project, and Dan Lewis is being put fulltime on the task of community 
outreach: answering critics, beating the drum, and doing the detailed 
work of working with inbound inquiries from potential partners who are 
already interested, outreach to potential partners who are not yet 
interested, etc.

> 4. what else is jer working on?  has wikia dedicated him to this project or
> does he also handle day to day fire fighting on wikia's existing services to
> justify his paycheck?  and while we're on that topic, what other personnel
> has wikia dedicated to this -- how seriously are they really taking it, in
> terms of cash on the barrel head?

Jer is fulltime on search, as are several others.  Dennis, Seth, 
Jeffrey, David, Aaron, Dan... I feel that I am forgetting someone.

We are prepared to ramp up our commitment as we start to get traction, 
as well.  At the present time, every time I ask the team what we need to 
buy, they say "not yet, we are coding". :)

> 5. who else is working on this, outside of wikia?  what outside volunteers
> or wikia competitor's employees have commit access to the source pool for
> the crawler, or indexer, or front end, or have root access to the donated
> back end machines hosted by ISC?  if the answer is nobody, then is that due
> to lack of outreach (see #3 above) or is it wikia's preference that outsiders
> contribute content rather than code and sysops?  (is that written anywhere?)

Strong preference that we get lots of people coding on a fully open 
system, as they like it.  I think so far we have not done a great job of 
outreach, but then again, we have not had everything in place to get 
people oriented and started.

Also, we view ourselves as a "good neighbor" part of the existing Nutch 
project: Dennis is a Nutch committer who is starting to work on a set of 
ideas he is calling "Nutch 2.0".

> 6. where are the mini-articles stored?  if outside volunteers are mostly
> contributing data, is that data stored on wikia's front end?  if so, what are
> the redistribution terms -- would wikia flood this data to competing front
> end operators, and accept incoming floods of similar data from competitors?
> or, is this the "secret sauce", there's no way to get access to contributed
> data of this kind except one article at a time, inside wikia's advertising
> system?

It's all GFDL, and we make available database dumps.  We would have to 
consider a "flood" of incoming data from a community/editorial point of 
view, but totally welcome it, and are totally committed to sharing 
everything extremely liberally.

> 7. given that the idea of "taking on google" is silly, given their size and
> focus and ambition and brand strength and so on, and that what we can
> actually hope to achieve with this project is to change the game and make
> search part of the internet infrastructure, where are the white papers,
> journal articles, and outreach glossies explaining what the new world of
> internet search could look like, and what effect this change will have on
> google, microsoft, yahoo, and the current market hierarchy, and the rest of
> the "social search" scene?

I think this is a really great question. :)

One of the things I have been arguing is that we are no threat to google 
even if we are wildly successful at "making search part of the internet 
infrastructure" as you put it...

Google's brand is tied up with search, but Google's business is not 
searhc, per se, but the matching of advertisements to user actions and 
intentions online.  The threat to google is not an open source 
alternative that helps 1,000 small competitors to flourish, but a single 
large proprietary competitor (Powerset?) that captures enough market 
share to take away the advertising marketplace.

1,000 small competitors are much more likely to simply partner with 
Google for ad revenues, because buyers go where the sellers are, and 
sellers go where the buyers are.

> 8. has anybody reached out to yahoo and microsoft to see if they'd like to
> join this effort or at least sponsor it, since as #2 and #3 in internet
> search today, they're the ones with the most to gain if we change the game.
> and if nobody's doing this now, and i did it, what would wikia say about
> sharing the sponsorship burden with other players, perhaps larger players?

We have done some of this, and would be eager to support you if you want 
to help us with it.  We can talk privately about the status of current 
talks but there is nothing to report and nothing likely to happen right 
away... but there are a lot of interested parties in the industry.

> this list of questions isn't meant to be exhaustive.  but as in my own
> controversial efforts over the years, i find the quality of criticism here
> somewhat low. 

:-)  Quality criticism is extremely valuable.

> also for the record, ISC's hosting of this project has been a cash neutral
> event for us, which is important since we don't have cash for this kind of
> thing.  the 15-ton air handler wikia bought feeds a room that has other
> projects in it too, and our network is a fixed cost, and wikia has agreed
> to pay for the power we use for search, and the servers were all donated,
> and that donation was targetted for this project, and we got a lot more
> servers than we needed, and we've been passing the excess along to other
> f/l/oss and internet security projects.  so no matter whether this project
> changes the world, ISC is already winning.

:-)

--Jimbo



More information about the Search-l mailing list