[atlas-l] knugget archives, the ARK format

Jeremie Miller jeremie at jabber.org
Tue Aug 5 22:03:55 UTC 2008


One of the things I had to solve quickly was how to package a whole  
bunch of knuggets together for working with them, something easier  
than thousands of little JS[on] files.

Somewhat like the Internet Archive format (ARC), I made a uber-simple  
format that looks kind of like this in pseudocode:

gzip("http://factory.tld/knugget1" + \n + "{...JSON...}" + \n) +
gzip("http://factory.tld/knugget2#doc" + \n + "{...JSON...}" + \n) +
gzip("http://factory.tld/knugget3" + \n + "{...JSON...}" + \n) +
...

It's one file that consists of individually gzip'd line-pairs, one  
being the URL of the knugget, the other being the JSON contents you  
would find if you requested it via HTTP.

There's a test one checked into svn and you can just gzcat it to see  
the contents, very easy to parse (colndx.pl takes it as STDIN) and the  
only caveat in creating it is that each url+json pair must be  
individually compressed and the resulting compressed binary appended  
to the whole.  This stream of gzip chunks makes it much easier to work  
with later in referencing individual pieces within a larger ARK file.

There's not much to an ARK, but it greatly simplifies the process of  
moving lots of knuggets between a factory and a collector.

Jer



More information about the Atlas-l mailing list