[Search-l] more than just interoperability
Tall Street
contact at tallstreet.com
Fri Jun 8 21:43:27 UTC 2007
You bring up an interesting idea. Of course the first concern with any
data retrieved from the client is privacy, It has to be clear exactly
what is being sent back and preferably the client should authorize
everything that is sent beforehand (it shouldn't passively collect
information in the background and just send it otherwise there is a
big risk of sending private information that the client may not wish
others to know.)
Having said that why limit to social networks? How about a toolbar
extension that collects your browser history and associates meta data
with that (such as the search terms you used before you located the
site, or the anchor text of the link you clicked to get there, time
spent and number of times visited could be associated to help
determine usefulness). Such as extension would be useful in and off
itself for helping people locate sites they remembered they visited
but forgot the url, or title and just remembered a few things about
the site. Initially collect the data and store it locally and allow
the client to search it. Then add a feature that lets the client
review the sites they visited and makes recommendations about what
links they should send back and under which keywords.
Use something like http://www.tallstreet.com/ to rank the data (so
people who have no history / history of sending not useful results get
only a tiny weighting on the ranking and people who have a history of
sending back useful links get a greater weighting) and you will
definately have an interesting and more useful dataset then what you
will get if you just count links to a page.
Any Thoughts?
Gary
On 6/5/07, Fred Benenson <fred.benenson at gmail.com> wrote:
> Instead of publicly crawling the human indexes (del.icio.us / stumbleupon,
> etc.) ourselves, why don't we have our users to do it? It's not a complete
> work around, but might be an approach that works for a bit. Here's how I
> envision it:
>
> * A client side crawler (similar to yacy, but not targeting the entire web,
> just metadata rich places) implemented through a Firefox extension or
> Greasemonkey script.
> * When a client visits a social network with valuable data (as determined by
> a list managed by us) their local client makes a copy of all the data
> delivered to their client side browser.
> * The server can't tell the difference between a user surfing with the
> extension or without the extension.
> * That data is then meta-tagged and packaged properly locally and sent to
> the Wikia Search servers from the client's machine.
> * The Wikia Search servers then index and make sense of all of this data
> culled from the various clients running the Wikia Search client / extension.
>
> This way we're able to work around the bandwidth concerns that Yahoo and
> company would have with us crawling their databases. And the data that we're
> getting is merely stuff that is being browsed naturally, by live humans, so
> it's likely of more value.
>
> But bandwidth is obviously not just what they're concerned with. As has been
> mentioned, these sites view these databases of useful human tagged
> information as enormously valuable assets that give them a competitive edge.
> So then it's a question of the "intellectual property" contained in those
> databases. Now, I'm not sure if other networks do this, but Del.icio.us and
> Flickr have Creative Commons license implementation. That means that a
> particular user's stream of content that they've created (links, photos,
> etc.) can be set for people to share it. I think this would be a perfect
> opportunity for our distributed crawlers to take advantage of.
>
> Thoughts?
>
>
> Fred
>
>
>
More information about the Search-l
mailing list