[Grub-dev] some notes about making a "valid" arc from a workunit
jer
jeremie at jabber.org
Wed Jan 9 16:45:28 UTC 2008
Here is a very short workunit example:
GET /cobe/index.html HTTP/1.0
Host: members.home.nl
Agent: Grub WU1
GET / HTTP/1.0
Host: www.kimmoritsugu.com
Agent: Grub WU1
GET / HTTP/1.0
Host: everettlofts.net
Agent: Grub WU1
GET / HTTP/1.0
Host: canadajournal.whitesnow.jp
Agent: Grub WU1
PUT /arcs/jeremie.53b28400fc6d8f886726435f3e119d9e411f7735.arc.gz
HTTP/1.0
Host: dispatch.grub.swlabs.org
If you're implementing processing of the work-unit there one un-
stated and important requirement when you go to build the resulting
ARC file, and that is ordering. Each entry in the workunit must have
a corresponding matching entry in the arc file, as the sequence of
these are cumulatively hashed and the hash code in the final PUT acts
as sort of a check-sum for the server to later verify that it's a
valid ARC.
The implication of this is that every entry must have a response, and
if you've played with the babygrub.pl script you can see that even
when DNS fails or connecting to the server fails, it generates an
internal 500 HTTP response (with an appropriate human-readable error,
much like a proxy would) and saves that into the ARC. I don't think
we (yet) need standard templates for these types of client-generated
errors, as long as *some* 500 error is generated.
Another implication is that redirects aren't followed, that whatever
HTTP response the server gives back, is saved into the ARC verbatim.
The ARC processing on the server will follow the redirects and
generate new entries in new workunits.
Anyone had a chance to play with the workunits yet? Is all this
making sense, and/or simple enough?
Jer
More information about the Grub-dev
mailing list