[Search-l] Privacy and sharing browsing data (Seth Ford)
MJE Sales, LLC
mjesales at gmail.com
Fri Aug 3 15:20:24 UTC 2007
This is my first reply to something so If I screwed it up - I'm sorry.
I like the idea of having something run in the browser that would know
what urls to spider based on our browsing history. Lots of people use
the google toolbar, alexa tool bar or compete toolbars - all of which
send to the server what website or websites you are at and everything
else.
A little firefox button that simply logged the urls - or the domains
and sent it anonymously to the server - would be a great way of
developing an index that had sites that you knew people were actually
visiting.
If you are looking at hindering spam - there has to be some sort of AI
component or a human element. Why not create a stumbleupon type thing
where sites are flagged as spam or not spam. to reduce the load on
all the servers it could send 10, 25, or 50 urls at a time. But then
again I guess each persons definition of spam is a little different.
We run several large domains with 100,000's of pages, so our approach
is a tad different.
Have a Great Day!
Life is what you make of it!
Matt Ellsworth
MJE Sales, LLC
702-953-5733
Skype: mjesales
yahoo: mattseo
http://www.mjesales.com
http://www.articlesnatch.com
"The richest people in the world look for and build networks, everyone
else looks for work." ~ Robert Kiyosaki
RE: Date: Mon, 30 Jul 2007 13:35:19 -0600
From: "Seth Ford" <seth.ford at gmail.com>
Subject: Re: [Search-l] Privacy and sharing browsing data
To: "Jimmy Wales" <jwales at wikia.com>
Cc: search-l at wikia.com
Message-ID:
<ff963f940707301235u71c2ccccwc87a664d238e5024 at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Thats why I think it has to be a mash-up. You have to allow people to look
to the community first and then look to the crawl, be it tab based or
inline. It's seems people are more interested in participating once the
trust they can find the data they are looking for and then given
encouragement to participate it organize it in a more reasonable fashion. I
have sent out some of the implementation I have done along these lines
internally where I work. It does seem like it comes down to a matter of
trust, internally it's much easier to do a community powered search engine
built of a wiki mashed by a crawl. Externally how do you hinder spam and
gaming and foster the sense of identity? Maybe it's a /. like implementation
or simply wikipedia is as good as it gets...?
Seth
On 7/28/07, Jimmy Wales <jwales at wikia.com> wrote:
>
> (This was about faroo.com )
>
> jer wrote:
> > Yeah, noticed them too, completely not open source...
>
> Yup! But doing this:
>
> >> "When an user opens a page with the browser, it will be automatically
> >> inserted into the distributed index of the p2p network. The
> >> additional network load and the site submission of a traditional
> >> crawler is omitted. Assuming a wide spread of FAROO this enables an
> >> almost complete index, updated in real time."
>
> Seems pretty easy to do with a simple firefox extension.
>
> The difficult bit is thinking about user privacy and stopping spam. Let
> me explain what I mean:
>
> When we have a public way for people to submit, tag, and rate urls there
> are no particular difficult issues with privacy because when you submit
> something, you are doing it publicly and if you want privacy, you'd best
> use a pseudonym to login... just like with any wiki. Anyone who is
> inserting junk into the index will be quickly detected and blocked or
> rated as a spammer, and there you go.
>
> But simply browsing the web is a different matter. I would not be happy
> with having my click stream of what I am surfing made public -- even if
> I was using a pseudonym. There are simply too many ways to guess who I
> am from my click stream.
>
> And yet, if no one can see my click stream, then I might just be a
> spammer merrily trolling around on my own spamtastic crap site.
>
> I think there are some clever solutions to this possible. One would be
> that my browsing history would never be made public BUT if urls that I
> have submitted made it into the index, and people subsequently mark them
> as spam, then this fact shows up publicly in the form of a number: "This
> user has submitted X urls which were subsequently judged by the
> community to be spam." This could be said without revealing what they
> were.
>
> That is just the first thought of how to go about it.
>
> I am eager to think about way that we can encourage passive
> participation by GOOD people who simply believe in our mission, would
> like to give us good data on real browsing patterns, but who rightly
> value their privacy, while at the same time preventing spammers from
> wasting too much of our time.
>
> --Jimbo
More information about the Search-l
mailing list