[Search-l] Privacy and sharing browsing data (Seth Ford)
jer
jeremie at jabber.org
Fri Aug 3 21:38:27 UTC 2007
Definitely check out the Attention Trust, these folks are on the
right track and I hope that we can become one of the opt-in Attention
Services who's goal is to source/weight items for a crawler (with a
public crawl output).
http://www.attentiontrust.org/
Jer
On Aug 3, 2007, at 10:20 AM, MJE Sales, LLC wrote:
> This is my first reply to something so If I screwed it up - I'm sorry.
>
> I like the idea of having something run in the browser that would know
> what urls to spider based on our browsing history. Lots of people use
> the google toolbar, alexa tool bar or compete toolbars - all of which
> send to the server what website or websites you are at and everything
> else.
>
> A little firefox button that simply logged the urls - or the domains
> and sent it anonymously to the server - would be a great way of
> developing an index that had sites that you knew people were actually
> visiting.
>
>
> If you are looking at hindering spam - there has to be some sort of AI
> component or a human element. Why not create a stumbleupon type thing
> where sites are flagged as spam or not spam. to reduce the load on
> all the servers it could send 10, 25, or 50 urls at a time. But then
> again I guess each persons definition of spam is a little different.
> We run several large domains with 100,000's of pages, so our approach
> is a tad different.
>
>
>
> Have a Great Day!
>
> Life is what you make of it!
>
> Matt Ellsworth
> MJE Sales, LLC
> 702-953-5733
> Skype: mjesales
> yahoo: mattseo
> http://www.mjesales.com
> http://www.articlesnatch.com
>
> "The richest people in the world look for and build networks, everyone
> else looks for work." ~ Robert Kiyosaki
>
> RE: Date: Mon, 30 Jul 2007 13:35:19 -0600
> From: "Seth Ford" <seth.ford at gmail.com>
> Subject: Re: [Search-l] Privacy and sharing browsing data
> To: "Jimmy Wales" <jwales at wikia.com>
> Cc: search-l at wikia.com
> Message-ID:
> <ff963f940707301235u71c2ccccwc87a664d238e5024 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Thats why I think it has to be a mash-up. You have to allow people
> to look
> to the community first and then look to the crawl, be it tab based or
> inline. It's seems people are more interested in participating once
> the
> trust they can find the data they are looking for and then given
> encouragement to participate it organize it in a more reasonable
> fashion. I
> have sent out some of the implementation I have done along these lines
> internally where I work. It does seem like it comes down to a
> matter of
> trust, internally it's much easier to do a community powered search
> engine
> built of a wiki mashed by a crawl. Externally how do you hinder
> spam and
> gaming and foster the sense of identity? Maybe it's a /. like
> implementation
> or simply wikipedia is as good as it gets...?
> Seth
>
> On 7/28/07, Jimmy Wales <jwales at wikia.com> wrote:
>>
>> (This was about faroo.com )
>>
>> jer wrote:
>>> Yeah, noticed them too, completely not open source...
>>
>> Yup! But doing this:
>>
>>>> "When an user opens a page with the browser, it will be
>>>> automatically
>>>> inserted into the distributed index of the p2p network. The
>>>> additional network load and the site submission of a traditional
>>>> crawler is omitted. Assuming a wide spread of FAROO this enables an
>>>> almost complete index, updated in real time."
>>
>> Seems pretty easy to do with a simple firefox extension.
>>
>> The difficult bit is thinking about user privacy and stopping
>> spam. Let
>> me explain what I mean:
>>
>> When we have a public way for people to submit, tag, and rate urls
>> there
>> are no particular difficult issues with privacy because when you
>> submit
>> something, you are doing it publicly and if you want privacy,
>> you'd best
>> use a pseudonym to login... just like with any wiki. Anyone who is
>> inserting junk into the index will be quickly detected and blocked or
>> rated as a spammer, and there you go.
>>
>> But simply browsing the web is a different matter. I would not be
>> happy
>> with having my click stream of what I am surfing made public --
>> even if
>> I was using a pseudonym. There are simply too many ways to guess
>> who I
>> am from my click stream.
>>
>> And yet, if no one can see my click stream, then I might just be a
>> spammer merrily trolling around on my own spamtastic crap site.
>>
>> I think there are some clever solutions to this possible. One
>> would be
>> that my browsing history would never be made public BUT if urls
>> that I
>> have submitted made it into the index, and people subsequently
>> mark them
>> as spam, then this fact shows up publicly in the form of a number:
>> "This
>> user has submitted X urls which were subsequently judged by the
>> community to be spam." This could be said without revealing what
>> they
>> were.
>>
>> That is just the first thought of how to go about it.
>>
>> I am eager to think about way that we can encourage passive
>> participation by GOOD people who simply believe in our mission, would
>> like to give us good data on real browsing patterns, but who rightly
>> value their privacy, while at the same time preventing spammers from
>> wasting too much of our time.
>>
>> --Jimbo
> _______________________________________________
> Search-l mailing list
> Search-l at wikia.com
> http://lists.wikia.com/mailman/listinfo/search-l
> Change options or unsubscribe: http://lists.wikia.com/mailman/
> options/search-l
More information about the Search-l
mailing list