I've been thinking about something that is kind of tangential to this - one of the things we've discussed is getting a large amount of proactive human data - tags, or something like them. It would take a really large number of tags (or whatever) to be really useful. Hopefully something like millions of websites each tagged by hundreds of people with at least several tags. So a dataset of perhaps a billion records is easy to imagine.
<br><br>But, it's not easy to accumulate or process. Processing it is a technical hurdle which will be fun to tackle, but accumulating the data is a whole other matter.<br><br>So, here it is: Getting data from existent social bookmarking services may be an option we should consider. Think of it - aggregating data from
<a href="http://del.icio.us">del.icio.us</a>, stumbleupon, etc. Now, I can't imagine how we'd get Yahoo to give us the data from <a href="http://del.icio.us">del.icio.us</a>, but maybe there are other providers who would be willing to do this. Or perhaps we look at paying them for it, at least enough to cover their bandwidth and other overhead.
<br><br>Anybody got an ideas around this type of thing?<br><br>Aerik<br>