[Search-l] Parsed text samples

Jeremie Miller jeremie at jabber.org
Thu Jul 17 17:59:25 UTC 2008


It's definitely sexy hot, thanks Linas (and RelEx folks)!

I'm looking forward to deploying this both in our map-reduce cluster  
on a large set of the top pages in the index (and posting the  
resulting data of course), as well as figuring out how we could better  
integrate this with Grub as a platform, I think the promise here to  
have rich tagged content is very very exciting :)

Jer

On Jul 17, 2008, at 11:10 AM, Linas Vepstas wrote:

> 2008/7/17 Sergio Monge <monge.sergio at gmail.com>:
>> what the heck is this stuff for?
>
> Heh. Well, to improve search, of course!  The idea is that
> by having lexical, semantic information, the quality of
> search results can be improved.  It also should allow
> NLP queries:
>
>  "Who won the 1957 World Series?"
>
> So, relex output identifies subject and object relations.
> In this case, "who" is the subject, "the 1957 world series"
> is the object, and "win" is the verb.  So, we are looking
> for any text which has "win" as the verb, and "world series"
> as the object.  Find that, and you've found the answer.
>
> I think the above could actually be fairly simple/straight-forward
> to implement: you have to make a giant table of subject,
> object, and URL. When a question is typed in, you search
> the table for matching subject/object.
>
> Whether this is better than keyword search, I dunno. Maybe
> just some of the time.  But I think you can fold the scores
> in with keyword scores, and get better results.
>
> If we can get even basics like the above working on a large
> scale, then there are much fancier things that can be done.
>
> Besides, this is all supposed to be sexy/hot: Microsoft just
> paid $100M for Powerset, and, as best as I can tell, Powerset
> doesn't do much more than the above.  There's a couple of
> other startups playing in this area cause its, uhh sexy hot.
> So, if nothing else, it allows wikia to claim its in the forefront
> with the latest "semantic web" technologies.
>
> --linas
> _______________________________________________
> Wikia Search mailing list
> http://re.search.wikia.com/
> Change options or unsubscribe: http://lists.wikia.com/mailman/options/search-l
>




More information about the Search-l mailing list