[Search-l] Old markets and methods vs. new

Aerik Sylvan aerik at thesylvans.com
Wed May 28 17:28:37 UTC 2008


On Wed, May 28, 2008 at 1:15 AM, Rainer Blome <rainer.blome at gmx.de> wrote:

> Aerik Sylvan wrote:
> > [...] Mahalo [...] a search application.
>
> The problem is that when they do have a page, they only
> show what's on that dedicated page, and no more.  It's like Wikia Search
> would only show the mini article, once there is one.  Effectively, the
> "search" part is dropped in those cases.  By design, these cases are
> common, because Maholo aims to cover the common searches.


Exactly - it's just like looking at a dmoz category, but with a "search"
interface instead of a "browse" interface and keywords instead of
categories.  Same problem too:  like the other point I was making, the
application itself favors the entrenched players!  Some resources may always
be "best" for the average searcher, and therefore the best result to serve
until such a time as personalized searches are the norm, but many other
resources may become stale over time (think technology or medical research),
or they are simply the "incumbent", blocking the other very good resources
from being served as top results.


> > [...] the data being built in Wikipedia (and similar
> > projects) is huge and is under-utilized.  [...] My favorite possibility
> > is category intersections.  A category in Wikipedia is essentially a tag
> > - someone has said that this chunk of information should be associated
> > with this concept.
>
> Wikipedia embodies a semantic network. The links are sometimes not
> unambiguous, but I guess that effective automated use is possible
> (exploiting the "human computation" done there). Some are already trying
> it, just search for "semantic mining wikipedia" or "wikipedia link
> structure".  The categories make the semantic network relatively
> explicit and therefore easier to mine, but mining should be possible
> even without them.  And yes, it would be swell to have a search engine
> which guesses Wikipedia articles and categories and directly links to them.


I think there are a number of such tools, but third party tools do not
fulfill the whole promise.  The Semantic Mediawiki guys have a great vision,
but it has technology hurdles to overcome.  Category Intersections is quite
doable, and Roan has written a trunk backend for it - it sounds like the
interface will need tweaking, and then it needs to be set up with Lucene for
Wikipedia.  But the main point is this:  In any software design, the design
needs to consider all (likely) use cases, and all outputs.

I'm sure we've all bumped into software that was was shortsighted in it's
view of the necessary outputs, and the application then cannot support them
because the architecture itself cannot (database scheme for instance).  In
Wikipedia, the primary use case is users browsing or searching for articles
in a fairly straightforward manner, ie a search for "Elvis".  But another
very powerful use case is searching for articles for intersecting concepts -
ie, "Americans" and "Rock and Roll Stars" for example.  A more pragmatic
example is my search for video games.  This is *tremendously* powerful, and
this use case needs to be a consideration in the ongoing design of Wikipedia
to facilitate the extraction of that data.

Similar methods of human constructed meta-data can be equally as powerful,
and are a lot more interesting, innovative, and ultimately useful than
Mahalo.  I know Jimmy sees this - he has made several attempts at harnessing
it, including an early version of wikia (when it was a search engine powered
by tags and star ratings - the problem was that it had a small number of
contributors and seems to be spammed to death).  I think there was promise
in that early vision, but that a slightly different model is needed - the
community approach to wikia search *feels* like the right direction, and I
still think that tags (as an obvious human generated datapoint) is an
important part of the puzzle.  I also think that some more hard controls
against spam are necessary, as the ratio of data to contributors is not like
Wikipedia, and the good will of contributors alone is not enough to stop the
spam.

Best Regards,
Aerik

-- 
http://www.wikidweb.com - the Wiki Directory of the Web
http://tagthis.info - Hosted Tagging for your website!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikia.com/pipermail/search-l/attachments/20080528/af31a00a/attachment.html 


More information about the Search-l mailing list