[Grub-dev] back working on some grub stuff :) - New workunit format

Balinny balinny at gmail.com
Mon May 12 10:57:15 UTC 2008


Jeremie Miller wrote:
> Good suggestions on the headers, I also think we need to use the exact  
> same accept encoding as IE since any squid proxies will serve it from  
> the cache then.
>   
I don't see why a squid would refuse to give out a gzipped content to a 
user asking for gzip
content even if the original query was from a user which also accepted 
deflate...
It's a http-proxy, so i'd expect it to understand compressions and even 
compressing or
decompressing the cached content by itself (unless disabled by 
configuration).

> We'll definitely work on another more advanced alternative format to  
> the workunit one, after the back-end us running a little more  
> smoothly :)
>
> Jer
Ok, i bite :-)
After discussing with Bartek how to improve the output ARC format i have 
been also thinking about the input format.
What do you think about the following one?

NGW/0.2 100 Go crawling!
 max-download=1073741824; output-format=arc,zip required;
User-Agent: GrubNG 20080128
Accept: text/*
Accept-Charset: utf-8

GET http://homepage3.nifty.com/naonaorin/
If-Modified-since: Wed, 3 Nov 2004 17:21:05 GMT

GET http://www.miastoplusa.pl/ output-content: headers-only

GET http:// www.vinolentus.nl:1234/

...

PUT 
http://soap.grub.org:57/arcs/Balinny.bbdacb62d1c82d8f114d71d79f954caea120861c.arc.gz
Cookie: workunitID=7



In detail:

NGW/0.2
New Generation Workunit, version 0.2
Clients SHALL NOT try to run a workunit targetted for a version it 
doesn't understand.

Status code: Reserved for future use. Being conservative, if receiving a 
code != 100, it should at least ask the user for confirmation
before continuing (kind of disabling automatic mode with that workunit 
until receiving the human answer).

The status line is followed by a number of sequentially option lines.
Option lines begin by a space and contain a number of parameters the 
server provides giving the client recommendations on how to
handle the requests. They values and types are defined on a case-by-case 
basis. If undesired or not understood, options can be ignored
unless the 'required' propertty is specified.

Then we have a list of common headers. This shall be included on each of 
the sequent queries. No option line may appear after the fisrt
common header. However, a common header may be splitted on several lines 
per http rfc.

After a splitting double CRLF, there's the GET lines.
Each line determines a location to be fetched. Instead of having headers 
with special meaning like Host, that are needed to start getting the
resource, everything is embedded on an absolute URL. Http/1.1 itself 
contain provisions for use of absolute uris on future versions.
It also allows for easy expanding to other protocols. A "protocol not 
understood" code is to be added to the output format.
All clients must detect and correctly handle port numbers other than the 
default. User name for authorization is optional though.
The URL must be properly urlencoded. Specifically, characters as spaces, 
tabs, nulls, new lines... shall not be present without being escaped.
It must be taken into account that options speciffic to that request may 
be provided after the uri.
Headers specific to that connection come after the request line, up to 
the next empty line (CRLF CRLF)  and shall be appended to the common
headers for that request.

As a specific case, if it's PUT it sets the location where the resulting 
file is to be sent. Format is the same as requests, so it can have specific
options, headers and must end with a double CRLF. However, it differs in 
that it's not affected by global headers.
Each workunit must have exactly one PUT line, present at the end. 
Behaviour for workunits with several put lines is unspecified.

All line breaks shall be CRLF. No bare CR or LF can appear on a wokunit, 
and clients should reject a workunit not complying with it.

Wokunits give the clients a list of *what* to get, not *how* to get it. 
They're free for using the protocol he wishes, negotiating extra 
capabilities
with the server (such as compression), etc.



I tried to improve the workunit format by adding versioning, removing 
redundancy and making more extendable while still keeping the basic format
and not difficulting the parsing too much. As existing clients would 
break with this format, seems quite safe to upgrade, specially given 
that there're
not many users, so we aren't still constrained to the existing protocol.

Opinions?



More information about the Grub-dev mailing list