[Search-l] more than just interoperability

peter burden peter.burden at gmail.com
Fri Jun 1 23:01:41 UTC 2007


Aerik Sylvan wrote:
>
>
> On 6/1/07, *jer* <jeremie at jabber.org <mailto:jeremie at jabber.org>> wrote:
>
>>         So, here it is:  Getting data from existent social
>>         bookmarking services may be an option we should consider. 
>>         Think of it - aggregating data from del.icio.us
>>         <http://del.icio.us>, stumbleupon, etc.  Now, I can't imagine
>>         how we'd get Yahoo to give us the data from del.icio.us
>>         <http://del.icio.us>, but maybe there are other providers who
>>         would be willing to do this.  Or perhaps we look at paying
>>         them for it, at least enough to cover their bandwidth and
>>         other overhead.
>>
>>         Anybody got an ideas around this type of thing? 
>>
>>
>>     Yeah.. its a good way to find the actual interest of the people
>>     thru social book marking, digg and many other social websites..
>>     But it all matters whether they are ready to release data open to
>>     such open source search projects.. 
>
>     For the most part, all of those sites and all of that data *is*
>     open, it just needs to be intelligently crawled and indexed. 
>     They're great seed sites for keeping a crawler fresh.
>
>     Sure it would be nice to have it in a more digestible form, but
>     it's all there already :)
>
>
> I guess that's kind of what I was talking about - if you're stubborn 
> and clever enough you can crawl (scrape) just about anything - but 
> getting some data in a reasonably digestable form, with permission, 
> would be huge...

I'd have concerns about the quality of the information. Once it became 
clear that this sort of "social" information
was affecting rankings, which are important to commercial web sites, 
then a small business would find it very
tempting to give good, positive/relevant ratings to their own pages and 
negative/irrelevant to those of competitors.
The scale of the WWW is such that I cannot conceive of any community 
effort that would be able to police and
resolve such actions.

However the sites mentioned would be excellent sources of crawler seeds, 
although effective crawling of
dynamic (Web 2.0) database/CMS driven sites poses some significant 
problems - especially if they're using
Ajax.
>
> Aerik
> ------------------------------------------------------------------------
>
> _______________________________________________
> Search-l mailing list
> Search-l at wikia.com
> http://lists.wikia.com/mailman/listinfo/search-l
> Change options or unsubscribe: http://lists.wikia.com/mailman/options/search-l





More information about the Search-l mailing list