[Search-l] Grub Update
jer
jeremie at jabber.org
Mon Aug 6 13:21:29 UTC 2007
> Yes Jer, but you don't know what I've done and vice versa. So I
> don't know if we have been through quite the same process. :)
In 05 & 06 I built a web search as part of a private R&D project
(various iterations had between 100M and 1B pages), it's when I
really got upset about the state of the whole web search industry and
realized that building an open foundation right now will make a
tremendous impact in the next 5-10 years.
> The thing about search on this or any other significant scale is
> that it requires a completely different mindset to that required
> for building a web directory or wiki where each entry can be
> individually validated. I don't think that adding a 'crap-ton' of
> human intelligence to the process is an accurate description of
> what happens.
>
> The indications that mark a site for deletion tend to be clear and
> it is the speed on which this happens that is important. Sometimes,
> this has to be applied to every website on an IP or even on the
> same DNS. It is a very anti-democratic process. Some are easy wins
> - linkswamps that can be identified by a DNS or IP. PPC that can be
> identified from a particular string, duplicate content pages that
> all have the same MD5 hash etc. The hard part is when it goes
> beyond the easy wins to the stuff that requires a human decision.
That's why it's an open source and human platform, each do their own
parts as best they can, it's not all one or all the other.
> Some of us (those lucky enough to survive in the search engine
> wars) have been doing this kind of work independently for years.
> We do talk to each other but there is a slight attitude of "better
> him than me" when some other search engine venture goes dot.bomb.
> Some of the techniques and methodology of search engine development
> are closely held - none more closely than a good search index. The
> tools for building search engines are widely available (Nutch etc).
> It is the human element of the equation that is in short supply.
>
> Many on the second and third tiers (those below GYM (Google/Yahoo/
> Microsoft)) of the search business have been talking on internet
> fora and lists for years. Having spent years developing good a
> search index, many of these people would not particularly want to
> give up such an edge. Though the wiki idea is nice, the mindset is
> somewhat different to that of Wikipedia and the whole "Cathederal
> and the Bazzar" model. Most search engine developers are too busy
> trying to survive without having to subscribe to some happy-clappy
> ethos that could very well put them out of business. These are the
> guys who you will have to convince that there is some value to
> being involved in the Wikia search project.
I don't need to convince anyone, I'll build something I believe in
and build/share it openly, and if others share in the vision and
passion then they are welcome to participate.
> That's all very laudable but this is a business. The small search
> engines are not going to hand over their survival edge to Jimmy's
> vision, which is essentially that of a competitor who will take
> their work an monetise it. That is the road block that the project
> has to get beyond.
It's an inefficient and closed business, and that will change.
Survival may depend on participating and collaborating much more
openly than it's done today in everyone's search silo.
> But without that essential spark of the search engine developers,
> there is a danger that the project could just be another platform -
> much like Amazon's search and servers product. Being a search
> engine developer is not the same as being a webdeveloper. There is
> a lot more thinking and learning involved. Most thinking is about
> the "searching for what" question. It defines the nature of the
> search engine being developed. It makes the search engine a macro
> search engine or a niche engine. It makes the difference between
> success and failure.
>
> Having a platform for open search is nice. It might attract some
> search engine developers. Having a real search idea to go with that
> platform is better. Is Wikia search just an open platform without
> an idea for a search application?
I think I answered this in the other email, we're here because we
want to move *all* search forward using open and social value
systems. There should be many sparks, many applications, that are
much easier and faster to build atop an open platform with lots of
free tools and resources.
Jer
More information about the Search-l
mailing list