[Search-l] What Is Wikia and How Real Is It?
Jimmy Wales
jwales at wikia.com
Mon Aug 6 13:25:40 UTC 2007
John McCormac wrote:
> The venture is being portrayed as a Google Killer in the media coverage
> and spin. The problem is that there is no actual basis for such a claim
> other than it gives the media a nice soundbite and keeps the investors
> happy.
Actually, I think it makes the investors wonder what kind of lunatic I
am. :)
We are trying to downplay the "google killer" story line, but it is a
great story line, and so the media runs with it anyway.
You would get the same story lines about RedHat and Microsoft a few
years ago. It's an interesting story, but has little relationship to
getting some work done.
> So if I read this right, there is no search engine?
There currently is no search engine. This is a project to build one,
but more importantly, to build this:
> It is just an idea for a platform that is scalable and can be used for
> search engine development? But without knowing the processing
> requirements, the storage requirements and the bandwidth requirements,
> it is difficult to design such a platform.
Figuring out those things is part of the process, yes?
> The bandwidth required to spider tens of millions of websites on an
> ongoing basis is considerable. Therefore such a venture would need a lot
> of available bandwidth.
>
> The hardware is also a very significant requirement. It would need a lot
> of servers to do a proper crawl of the web. It would also require a
> backend to process the resulting data into something usable. And a
> search interface would be required.
Yes, so that matches my own very scientific estimates. "a lot of
bandwidth" and a "lot of servers". :)
> The search index is the hard part. It takes a long time to develop a
> good, clean index. The Infinite Monkeys approach to building an index
> (following links and hoping that they will lead to new pages) is not the
> most efficient method of building an index quickly when any of the prior
> requirements are absent or deficient.
I absolutely agree with that. I don't think anyone is proposing an
Infinite Monkey approach to spidering.
> A good index makes the difference between a great search engine and a
> spam infested pile of junk. I'm not convinced that the Wikia people
> quite appreciate the level of work that goes into that aspect of
> developing a search engine. Crawling a clearly defined index such as
> that of Wikipedia or some other silo site is easy. However crawling the
> web is like trying to take a slice of a swirling nebula.
Would it help if I say that I *do* appreciate the level of work that
goes into that aspect of things? Not sure what you are looking for here.
The task at the moment for me is to design the social aspect of the
community part of the site. The goal is to have good tools to allow the
community to control the crawl in intelligent ways. This is not
Infinite Monkeys, and it has to deal with interesting questions about
self-interested editors, trust, etc.
> So what exactly can Wikia offer? Bandwidth? Hardware? Expertise? Can you
> give us some descriptions and specifications of the resources and
> expertise that is available to search engine developers? For most of us,
> we have to deal with the realities imposed by hardware and bandwidth
> limitations. We don't have the luxury of just theorising - everything we
> do is geared towards survival in a highly competitive market. Perhaps we
> SE people really are on a different wavelength to the Wikia people.
Well, we do have the luxury of being able to provide hardware and
bandwidth to the community. So we don't have to cut corners in those areas.
> Perhaps the question foremost in the minds of many of the SE people on
> this list is this: why should be provide the search expertise? Or, to
> put it less diplomatically, why should we make you rich?
I am not asking you to make me rich. If you don't want to participate,
then don't.
If you think you can go out on your own and build a proprietary search
engine that makes you money, go ahead. If you think that you could find
it useful to work with a broader community to leverage each others
talents so that in whatever you are doing (enterprise search? niche
search on the web? social search?), there is a chance for you to compete
with the big players on a much more level playing field, then come and
help us.
If you want us to build it for you, for free, giving it all to you and
asking for nothing in return, then... well, that's fine too. :) That's
what we do.
--Jimbo
More information about the Search-l
mailing list