[Search-l] Parsed text samples

Linas Vepstas linasvepstas at gmail.com
Thu Jul 17 16:10:32 UTC 2008


2008/7/17 Sergio Monge <monge.sergio at gmail.com>:
> what the heck is this stuff for?

Heh. Well, to improve search, of course!  The idea is that
by having lexical, semantic information, the quality of
search results can be improved.  It also should allow
NLP queries:

  "Who won the 1957 World Series?"

So, relex output identifies subject and object relations.
In this case, "who" is the subject, "the 1957 world series"
is the object, and "win" is the verb.  So, we are looking
for any text which has "win" as the verb, and "world series"
as the object.  Find that, and you've found the answer.

I think the above could actually be fairly simple/straight-forward
to implement: you have to make a giant table of subject,
object, and URL. When a question is typed in, you search
the table for matching subject/object.

Whether this is better than keyword search, I dunno. Maybe
just some of the time.  But I think you can fold the scores
in with keyword scores, and get better results.

If we can get even basics like the above working on a large
scale, then there are much fancier things that can be done.

Besides, this is all supposed to be sexy/hot: Microsoft just
paid $100M for Powerset, and, as best as I can tell, Powerset
doesn't do much more than the above.  There's a couple of
other startups playing in this area cause its, uhh sexy hot.
So, if nothing else, it allows wikia to claim its in the forefront
with the latest "semantic web" technologies.

--linas



More information about the Search-l mailing list