[Grub-dev] back working on some grub stuff :) - New workunit format
Balinny
balinny at gmail.com
Mon May 12 10:57:15 UTC 2008
Jeremie Miller wrote:
> Good suggestions on the headers, I also think we need to use the exact
> same accept encoding as IE since any squid proxies will serve it from
> the cache then.
>
I don't see why a squid would refuse to give out a gzipped content to a
user asking for gzip
content even if the original query was from a user which also accepted
deflate...
It's a http-proxy, so i'd expect it to understand compressions and even
compressing or
decompressing the cached content by itself (unless disabled by
configuration).
> We'll definitely work on another more advanced alternative format to
> the workunit one, after the back-end us running a little more
> smoothly :)
>
> Jer
Ok, i bite :-)
After discussing with Bartek how to improve the output ARC format i have
been also thinking about the input format.
What do you think about the following one?
NGW/0.2 100 Go crawling!
max-download=1073741824; output-format=arc,zip required;
User-Agent: GrubNG 20080128
Accept: text/*
Accept-Charset: utf-8
GET http://homepage3.nifty.com/naonaorin/
If-Modified-since: Wed, 3 Nov 2004 17:21:05 GMT
GET http://www.miastoplusa.pl/ output-content: headers-only
GET http:// www.vinolentus.nl:1234/
...
PUT
http://soap.grub.org:57/arcs/Balinny.bbdacb62d1c82d8f114d71d79f954caea120861c.arc.gz
Cookie: workunitID=7
In detail:
NGW/0.2
New Generation Workunit, version 0.2
Clients SHALL NOT try to run a workunit targetted for a version it
doesn't understand.
Status code: Reserved for future use. Being conservative, if receiving a
code != 100, it should at least ask the user for confirmation
before continuing (kind of disabling automatic mode with that workunit
until receiving the human answer).
The status line is followed by a number of sequentially option lines.
Option lines begin by a space and contain a number of parameters the
server provides giving the client recommendations on how to
handle the requests. They values and types are defined on a case-by-case
basis. If undesired or not understood, options can be ignored
unless the 'required' propertty is specified.
Then we have a list of common headers. This shall be included on each of
the sequent queries. No option line may appear after the fisrt
common header. However, a common header may be splitted on several lines
per http rfc.
After a splitting double CRLF, there's the GET lines.
Each line determines a location to be fetched. Instead of having headers
with special meaning like Host, that are needed to start getting the
resource, everything is embedded on an absolute URL. Http/1.1 itself
contain provisions for use of absolute uris on future versions.
It also allows for easy expanding to other protocols. A "protocol not
understood" code is to be added to the output format.
All clients must detect and correctly handle port numbers other than the
default. User name for authorization is optional though.
The URL must be properly urlencoded. Specifically, characters as spaces,
tabs, nulls, new lines... shall not be present without being escaped.
It must be taken into account that options speciffic to that request may
be provided after the uri.
Headers specific to that connection come after the request line, up to
the next empty line (CRLF CRLF) and shall be appended to the common
headers for that request.
As a specific case, if it's PUT it sets the location where the resulting
file is to be sent. Format is the same as requests, so it can have specific
options, headers and must end with a double CRLF. However, it differs in
that it's not affected by global headers.
Each workunit must have exactly one PUT line, present at the end.
Behaviour for workunits with several put lines is unspecified.
All line breaks shall be CRLF. No bare CR or LF can appear on a wokunit,
and clients should reject a workunit not complying with it.
Wokunits give the clients a list of *what* to get, not *how* to get it.
They're free for using the protocol he wishes, negotiating extra
capabilities
with the server (such as compression), etc.
I tried to improve the workunit format by adding versioning, removing
redundancy and making more extendable while still keeping the basic format
and not difficulting the parsing too much. As existing clients would
break with this format, seems quite safe to upgrade, specially given
that there're
not many users, so we aren't still constrained to the existing protocol.
Opinions?
More information about the Grub-dev
mailing list