<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
</head>
<body bgcolor="#ffffff" text="#000000">
An interesting article (<a
href="http://jcmc.indiana.edu/vol12/issue3/vancouvering.html#schemas">http://jcmc.indiana.edu/vol12/issue3/vancouvering.html#schemas</a>)
has begun to change my mind. <br>
<br>
I have been somewhat of a "lurker" waiting to gain
access to crawl results to pass them through a "natural language
processing" pipeline (see UIMA: <a
href="http://incubator.apache.org/uima/">http://incubator.apache.org/uima/</a>.)
I admit to not believing in the success of a voluntary group rating
system (note, this is far from saying I believe in the opposite: that
it will fail) I know that I do not, and cannot, know the outcome till
we get there. <br>
<br>
The following quote from the article has forced me to question the
potential efficacy of both my approach and "the" (quotes because it is
only my impression of what I believe is still evolving) voluntary group
rating system being discussed.<br>
<br>
<blockquote type="cite">What
is relevance? In a small, well-defined database, it is relatively easy
to sort relevant from irrelevant documents. On the Web, this is not
necessarily as simple. One interviewee commented that the standard of
relevance has changed from when he began to work with information
retrieval systems:
<br>
[W]here
the systems used to only be the Dialogues and the Lexis-Nexises, you
know, I think they strove for a more academic standard of relevance,
where you define relevance as the relationship between the subject that
is in the document with what the user is asking about. So it is sort of
topical relevance. Whereas in the practical world where the search
engines are reaching today, something being useful to the user and
something where the user grabs the information and continues, has
become, I think, more important and less emphasis on say, getting the
best document. (Interviewee G)<br>
In
other words, as this interviewee says elsewhere, it is about
"satisfying users." Relevance has changed from some type of topical
relevance based on an applied classification to something more
subjective.</blockquote>
If this is so (and others may fairly argue against that point) then a
determination of the user relevance of a link needs to be in alignment
with the intentions of the user and is neither inherent in the document
nor _any_ meta data associated with the link that is not so aligned.<br>
<br>
I believe this leads to requiring knowledge about the user that cannot
be derived solely from the query - to impute the user's intent will
require:<br>
<ol>
<li>identification of the user (may be anonymous, but a specific
anonymous user - a token in the user's possession)<br>
</li>
<li>the newly entered query from this user</li>
<li>the search history (the ordered collection of query and results
returned and user action taken) of this user and many others<br>
</li>
<li>an ability to impute a current relevancy value for a link in a
result set for a query given this user and the actions taken by similar
user/query requests - the hard part<br>
</li>
</ol>
I know that collecting this data will justifiably be offensive to some
- given enough data, an anonymous user may be identified and a careless
user far sooner. And, as we are open, this data _will_ be closely
examined, sometimes by not nice people. Some users will doubtlessly be
hurt. (It is neither cold heartedness nor insensitivity that prevents
me from ameliorating that statement - if we collect this data we should
do it knowing the consequences.)<br>
<br>
Given enough data I believe this approach will be both used and yield
more relevant results than any other. The "used" and "yield" part of
that sentence is the conversion in me wrought by the article. I now
doubt a user would make the effort to use even a "semantic search" if
one were available over a simple keyword search yielding good enough
results with less effort on their part - sigh. Of course a semantic
search would be preferentially used by "intelligent agents" - both
software and some humans. But I sense neither is our target audience.<br>
<br>
I believe user history (aka personalization) will be a component in the
approach taken by the "big boys" (I am intentionally trying to
communicate a negative in that phrasing as I am annoyed by the belief
that it is being done quietly by those who will posses a de facto,
significant, and user appreciated advantage that will be well managed
to not "cause trouble." )<br>
<br>
I do not claim that being technically feasible or because others are
doing it is sufficient reason for us to do it. But I do not believe in
another way to deliver the most relevant results to a user (I am open
to any data - especially contrary data.)<br>
<br>
One saving grace we might have, if we were to do this, would be our
openness. This will help research efforts, inform the public, and
possibly influence rule makers and others<br>
<br>
We now have servers - they are being provisioned. Shall we load the
data released by AOL last year and begin exploring how to use this type
of data?<br>
<br>
Bill<br>
<br>
ps - I discovered the article via a blog entry by Seth Finkelstein (<a
href="http://sethf.com/">http://sethf.com/</a>) I intend this as a
public thank you but realize it may yield other fruit <span
class="moz-smiley-s1"><span> :-) </span></span><br>
<br>
<br>
</body>
</html>