[atlas-l] knugget archives, the ARK format
Jeremie Miller
jeremie at jabber.org
Tue Aug 5 22:03:55 UTC 2008
One of the things I had to solve quickly was how to package a whole
bunch of knuggets together for working with them, something easier
than thousands of little JS[on] files.
Somewhat like the Internet Archive format (ARC), I made a uber-simple
format that looks kind of like this in pseudocode:
gzip("http://factory.tld/knugget1" + \n + "{...JSON...}" + \n) +
gzip("http://factory.tld/knugget2#doc" + \n + "{...JSON...}" + \n) +
gzip("http://factory.tld/knugget3" + \n + "{...JSON...}" + \n) +
...
It's one file that consists of individually gzip'd line-pairs, one
being the URL of the knugget, the other being the JSON contents you
would find if you requested it via HTTP.
There's a test one checked into svn and you can just gzcat it to see
the contents, very easy to parse (colndx.pl takes it as STDIN) and the
only caveat in creating it is that each url+json pair must be
individually compressed and the resulting compressed binary appended
to the whole. This stream of gzip chunks makes it much easier to work
with later in referencing individual pieces within a larger ARK file.
There's not much to an ARK, but it greatly simplifies the process of
moving lots of knuggets between a factory and a collector.
Jer
More information about the Atlas-l
mailing list