[atlas-l] cleaned up a few bits and a super alpha collector
Robert Ackland
robert.ackland at anu.edu.au
Tue Jul 22 00:03:45 UTC 2008
Jer,
As someone who is following the Atlas project closely and also working
with ARC/WARC files as part of my research using large-scale web crawls,
what you are planning with the processing of ARC files is highly relevant
for me. So I'll be watching the list with interest and will start looking
at the code too.
Rob
-------------------------------------
Dr Robert Ackland
Fellow, Australian Demographic and Social Research Institute
College of Arts and Social Sciences
The Australian National University
e-mail: robert.ackland at anu.edu.au
homepage: http://adsri.anu.edu.au/people/robert.php
project site: http://voson.anu.edu.au
ph./fax/mob.: +61 2 6125 0312/+61 2 6125 2992/+61 438 833 525
mail: Coombs Building, 9
Canberra, ACT 0200
AUSTRALIA
-------------------------------------
On Mon, 21 Jul 2008, Jeremie Miller wrote:
>I've made a few edits to http://search.wikia.com/wiki/Atlas to clean
>up the knugget definition, with things I'm learning as I continue to
>prototype the first factory and collector. One refinement I've made
>is really focusing in the definition of a knugget to be intrinsic,
>that it only describes itself and not it's larger context, a knugget
>is an atomic entity and only given context by some other knugget that
>references it.
>
>I also threw together a really really rough perl prototype "collector"
>that parsed the knuggets from the experimental factory and indexed
>them using sqlite3's full text search (it was low hanging fruit).
>This doesn't do anything useful yet, but I have ~7 thousand urls
>indexed using it and you can kinda query it via:
> http://people.swlabs.org/cgi/collector?q=java
>
>I think my next step is to make the factory process ARC files and
>churn out a bunch of static knuggets, then have the test collector
>index those. There's a lot of interaction between the factory<-
> >collector that has to be thought through and that should help get
>that moving.
>
>PS: I'll be at OSCON on Wednesday if anyone on this list happens to be
>around, track me down, I'd love to talk about Atlas more in person :)
>
>Jer
>_______________________________________________
>Atlas-l mailing list
>Atlas-l at wikia.com
>http://lists.wikia.com/mailman/listinfo/atlas-l
>
More information about the Atlas-l
mailing list