[Grub-dev] some notes about making a "valid" arc from a workunit

jer jeremie at jabber.org
Wed Jan 9 16:45:28 UTC 2008


Here is a very short workunit example:

GET /cobe/index.html HTTP/1.0
Host: members.home.nl
Agent: Grub WU1

GET / HTTP/1.0
Host: www.kimmoritsugu.com
Agent: Grub WU1

GET / HTTP/1.0
Host: everettlofts.net
Agent: Grub WU1

GET / HTTP/1.0
Host: canadajournal.whitesnow.jp
Agent: Grub WU1

PUT /arcs/jeremie.53b28400fc6d8f886726435f3e119d9e411f7735.arc.gz  
HTTP/1.0
Host: dispatch.grub.swlabs.org


If you're implementing processing of the work-unit there one un- 
stated and important requirement when you go to build the resulting  
ARC file, and that is ordering.  Each entry in the workunit must have  
a corresponding matching entry in the arc file, as the sequence of  
these are cumulatively hashed and the hash code in the final PUT acts  
as sort of a check-sum for the server to later verify that it's a  
valid ARC.

The implication of this is that every entry must have a response, and  
if you've played with the babygrub.pl script you can see that even  
when DNS fails or connecting to the server fails, it generates an  
internal 500 HTTP response (with an appropriate human-readable error,  
much like a proxy would) and saves that into the ARC.  I don't think  
we (yet) need standard templates for these types of client-generated  
errors, as long as *some* 500 error is generated.

Another implication is that redirects aren't followed, that whatever  
HTTP response the server gives back, is saved into the ARC verbatim.   
The ARC processing on the server will follow the redirects and  
generate new entries in new workunits.

Anyone had a chance to play with the workunits yet?  Is all this  
making sense, and/or simple enough?

Jer



More information about the Grub-dev mailing list