[Search-l] [relevancy of search results]

jer jeremie at jabber.org
Thu Jun 7 17:42:11 UTC 2007


Very well thought out Bill, and as you pointed out and anyone can  
plainly see, Google believes that they can make search more relevant  
by knowing the user better too.

I always have to fall back on the tools I'm comfortable with, so I  
see simple solutions to the privacy issues by using open standards.   
It's far more reasonable to have either local tools on your desktop  
or trusted 3rd parties like the Attention Trust, who intelligently  
compile intention "vectors" for you.  These vectors can be simple  
common definitions and a simple format, that you can decide to  
include with your search queries.

Ultimately, I don't believe that it's the search engine itself that  
should know anything about you, it should simply support the ability  
to search more intelligently (beyond just keywords).

Jer

On Jun 6, 2007, at 11:59 AM, William Surowiec wrote:

> (This is a plain text reposting of an earlier, accidental html posting
> with an additional link at the end.)
>
> An interesting article
> (http://jcmc.indiana.edu/vol12/issue3/vancouvering.html#schemas) has
> begun to change my mind.
>
> I have been somewhat of a "lurker" waiting to gain access to crawl
> results to pass them through a "natural language processing" pipeline
> (see UIMA: http://incubator.apache.org/uima/.) I admit to not  
> believing
> in the success of a voluntary group rating system (note, this is far
> from saying I believe in the opposite: that it will fail)  I know  
> that I
> do not, and cannot, know the outcome till we get there.
>
> The following quote from the article has forced me to question the
> potential efficacy of both my approach and "the" (quotes because it is
> only my impression of what I believe is still evolving) voluntary  
> group
> rating system being discussed.
>
> *** quoted text follows ***
>
> What is relevance? In a small, well-defined database, it is relatively
> easy to sort relevant from irrelevant documents. On the Web, this  
> is not
> necessarily as simple. One interviewee commented that the standard of
> relevance has changed from when he began to work with information
> retrieval systems:
>
> [W]here the systems used to only be the Dialogues and the Lexis- 
> Nexises,
> you know, I think they strove for a more academic standard of  
> relevance,
> where you define relevance as the relationship between the subject  
> that
> is in the document with what the user is asking about. So it is  
> sort of
> topical relevance. Whereas in the practical world where the search
> engines are reaching today, something being useful to the user and
> something where the user grabs the information and continues, has
> become, I think, more important and less emphasis on say, getting the
> best document. (Interviewee G)
>
> In other words, as this interviewee says elsewhere, it is about
> "satisfying users." Relevance has changed from some type of topical
> relevance based on an applied classification to something more  
> subjective.
>
> *** end quoted text ***
>
> If this is so (and others may fairly argue against that point) then a
> determination of the user relevance of a link needs to be in alignment
> with the intentions of the user and is neither inherent in the  
> document
> nor _any_ meta data associated with the link that is not so aligned.
>
> I believe this leads to requiring knowledge about the user that cannot
> be derived solely from the query - to impute the user's intent will  
> require:
>
>   1. identification of the user (may be anonymous, but a specific
>      anonymous user - a token in the user's possession)
>
>   2. the newly entered query from this user
>
>   3. the search history (the ordered collection of query and results
>      returned and user action taken) of this user and many others
>
>   4. an ability to impute a current relevancy value for a link in a
>      result set for a query given this user and the actions taken by
>      similar user/query requests - the hard part
>
> I know that collecting this data will justifiably be offensive to  
> some -
> given enough data, an anonymous user may be identified and a careless
> user far sooner. And, as we are open, this data _will_ be closely
> examined, sometimes by not nice people. Some users will doubtlessly be
> hurt. (It is neither cold heartedness nor insensitivity that  
> prevents me
> from ameliorating that statement - if we collect this data we  
> should do
> it knowing the consequences.)
>
> Given enough data I believe this approach will be both used and yield
> more relevant results than any other. The "used" and "yield" part of
> that sentence is the conversion in me wrought by the article. I now
> doubt a user would make the effort to use even a "semantic search" if
> one were available over a simple keyword search yielding good enough
> results with less effort on their part - sigh. Of course a semantic
> search would be preferentially used by "intelligent agents" - both
> software and some humans. But I sense neither is our target audience.
>
> I believe user history (aka personalization) will be a component in  
> the
> approach taken by the "big boys" (I am intentionally trying to
> communicate a negative in that phrasing as I am annoyed by the belief
> that it is being done quietly by those who will posses a de facto,
> significant, and user appreciated advantage that will be well  
> managed to
> not "cause trouble." )
>
> I do not claim that being technically feasible or because others are
> doing it is sufficient reason for us to do it. But I do not believe in
> another way to deliver the most relevant results to a user (I am  
> open to
> any data - especially contrary data.)
>
> One saving grace we might have, if we were to do this, would be our
> openness. This will help research efforts, inform the public, and
> possibly influence rule makers and others
>
> We now have servers - they are being provisioned. Shall we load the  
> data
> released by AOL last year and begin exploring how to use this type  
> of data?
>
> Bill
>
> ps - I discovered the article via a blog entry by Seth Finkelstein
> (http://sethf.com/) I intend this as a public thank you but realize it
> may yield other fruit :)
>
> pps - I've become aware of an additional article bearing on this  
> point:
> http://jeffnolan.com/wp/2007/05/22/google-flirts-with-evil/
>
>
>
> _______________________________________________
> Search-l mailing list
> Search-l at wikia.com
> http://lists.wikia.com/mailman/listinfo/search-l
> Change options or unsubscribe: http://lists.wikia.com/mailman/ 
> options/search-l




More information about the Search-l mailing list