[Search-l] [relevancy of search results]

William Surowiec wsurowiec at gmail.com
Sun Jun 10 11:57:07 UTC 2007


Thank you Jer. I have installed the Firefox plug in for Attention Trust 
(http://attentiontrust.org/) and have begun collecting local data on my 
browsing usage.

A tool that allows a person to "own" their browsing history and share it 
as they see fit _is_ empowering. (Of course the devil lies in the 
details, but that is only a touch of show me, let me examine the 
evidence and not active concern.)  I truly hope a mechanism for the 
sharing of the user's data opens for researchers and that a significant 
number of users opt in.

Upon reflection I realize I may have been reaching for a trumpet, in my 
prior post, to offer to others the ability to loudly proclaim a "clear 
and present danger." While that is a concern I hold, it is not cause for 
me to act as if it were a demonstrable fact.

I know Google, as a significant example, allows me to review my search 
history and to opt out of their program collecting the data. I also 
acknowledge that if they shared the data probably more immediate, direct 
harm would arise through the preying actions of the "not_nice_ones."  I 
honestly do not view them (Google) as inimical to my interests. But I 
worry that they will change, especially when managers more oriented to 
"Wall Street concerns" become ascendant.

I know the power accruing to a few individuals in large firms is a 
problem endemic to our times. But being common does not mitigate the 
risk. We should be mindful that we are witnessing the creation of 
tomorrow's economic Leviathans.

(As I've grown older I appreciate more what George Washington did at the 
end of his presidency than what he did before.  The latter was 
overcoming hardships when his position was weak; the former was acting 
for the better good of all when he could have done otherwise - something 
far harder.)

Bill

jer wrote:
> Very well thought out Bill, and as you pointed out and anyone can 
> plainly see, Google believes that they can make search more relevant 
> by knowing the user better too.
>
> I always have to fall back on the tools I'm comfortable with, so I see 
> simple solutions to the privacy issues by using open standards.  It's 
> far more reasonable to have either local tools on your desktop or 
> trusted 3rd parties like the Attention Trust, who intelligently 
> compile intention "vectors" for you.  These vectors can be simple 
> common definitions and a simple format, that you can decide to include 
> with your search queries.
>
> Ultimately, I don't believe that it's the search engine itself that 
> should know anything about you, it should simply support the ability 
> to search more intelligently (beyond just keywords).
>
> Jer
>
> <snip/>



jer wrote:
> Very well thought out Bill, and as you pointed out and anyone can 
> plainly see, Google believes that they can make search more relevant 
> by knowing the user better too.
>
> I always have to fall back on the tools I'm comfortable with, so I see 
> simple solutions to the privacy issues by using open standards.  It's 
> far more reasonable to have either local tools on your desktop or 
> trusted 3rd parties like the Attention Trust, who intelligently 
> compile intention "vectors" for you.  These vectors can be simple 
> common definitions and a simple format, that you can decide to include 
> with your search queries.
>
> Ultimately, I don't believe that it's the search engine itself that 
> should know anything about you, it should simply support the ability 
> to search more intelligently (beyond just keywords).
>
> Jer
>
> On Jun 6, 2007, at 11:59 AM, William Surowiec wrote:
>
>> (This is a plain text reposting of an earlier, accidental html posting
>> with an additional link at the end.)
>>
>> An interesting article
>> (http://jcmc.indiana.edu/vol12/issue3/vancouvering.html#schemas) has
>> begun to change my mind.
>>
>> I have been somewhat of a "lurker" waiting to gain access to crawl
>> results to pass them through a "natural language processing" pipeline
>> (see UIMA: http://incubator.apache.org/uima/.) I admit to not believing
>> in the success of a voluntary group rating system (note, this is far
>> from saying I believe in the opposite: that it will fail)  I know that I
>> do not, and cannot, know the outcome till we get there.
>>
>> The following quote from the article has forced me to question the
>> potential efficacy of both my approach and "the" (quotes because it is
>> only my impression of what I believe is still evolving) voluntary group
>> rating system being discussed.
>>
>> *** quoted text follows ***
>>
>> What is relevance? In a small, well-defined database, it is relatively
>> easy to sort relevant from irrelevant documents. On the Web, this is not
>> necessarily as simple. One interviewee commented that the standard of
>> relevance has changed from when he began to work with information
>> retrieval systems:
>>
>> [W]here the systems used to only be the Dialogues and the Lexis-Nexises,
>> you know, I think they strove for a more academic standard of relevance,
>> where you define relevance as the relationship between the subject that
>> is in the document with what the user is asking about. So it is sort of
>> topical relevance. Whereas in the practical world where the search
>> engines are reaching today, something being useful to the user and
>> something where the user grabs the information and continues, has
>> become, I think, more important and less emphasis on say, getting the
>> best document. (Interviewee G)
>>
>> In other words, as this interviewee says elsewhere, it is about
>> "satisfying users." Relevance has changed from some type of topical
>> relevance based on an applied classification to something more 
>> subjective.
>>
>> *** end quoted text ***
>>
>> If this is so (and others may fairly argue against that point) then a
>> determination of the user relevance of a link needs to be in alignment
>> with the intentions of the user and is neither inherent in the document
>> nor _any_ meta data associated with the link that is not so aligned.
>>
>> I believe this leads to requiring knowledge about the user that cannot
>> be derived solely from the query - to impute the user's intent will 
>> require:
>>
>>   1. identification of the user (may be anonymous, but a specific
>>      anonymous user - a token in the user's possession)
>>
>>   2. the newly entered query from this user
>>
>>   3. the search history (the ordered collection of query and results
>>      returned and user action taken) of this user and many others
>>
>>   4. an ability to impute a current relevancy value for a link in a
>>      result set for a query given this user and the actions taken by
>>      similar user/query requests - the hard part
>>
>> I know that collecting this data will justifiably be offensive to some -
>> given enough data, an anonymous user may be identified and a careless
>> user far sooner. And, as we are open, this data _will_ be closely
>> examined, sometimes by not nice people. Some users will doubtlessly be
>> hurt. (It is neither cold heartedness nor insensitivity that prevents me
>> from ameliorating that statement - if we collect this data we should do
>> it knowing the consequences.)
>>
>> Given enough data I believe this approach will be both used and yield
>> more relevant results than any other. The "used" and "yield" part of
>> that sentence is the conversion in me wrought by the article. I now
>> doubt a user would make the effort to use even a "semantic search" if
>> one were available over a simple keyword search yielding good enough
>> results with less effort on their part - sigh. Of course a semantic
>> search would be preferentially used by "intelligent agents" - both
>> software and some humans. But I sense neither is our target audience.
>>
>> I believe user history (aka personalization) will be a component in the
>> approach taken by the "big boys" (I am intentionally trying to
>> communicate a negative in that phrasing as I am annoyed by the belief
>> that it is being done quietly by those who will posses a de facto,
>> significant, and user appreciated advantage that will be well managed to
>> not "cause trouble." )
>>
>> I do not claim that being technically feasible or because others are
>> doing it is sufficient reason for us to do it. But I do not believe in
>> another way to deliver the most relevant results to a user (I am open to
>> any data - especially contrary data.)
>>
>> One saving grace we might have, if we were to do this, would be our
>> openness. This will help research efforts, inform the public, and
>> possibly influence rule makers and others
>>
>> We now have servers - they are being provisioned. Shall we load the data
>> released by AOL last year and begin exploring how to use this type of 
>> data?
>>
>> Bill
>>
>> ps - I discovered the article via a blog entry by Seth Finkelstein
>> (http://sethf.com/) I intend this as a public thank you but realize it
>> may yield other fruit :)
>>
>> pps - I've become aware of an additional article bearing on this point:
>> http://jeffnolan.com/wp/2007/05/22/google-flirts-with-evil/
>>
>>
>>
>> _______________________________________________
>> Search-l mailing list
>> Search-l at wikia.com
>> http://lists.wikia.com/mailman/listinfo/search-l
>> Change options or unsubscribe: 
>> http://lists.wikia.com/mailman/options/search-l
>
>



More information about the Search-l mailing list