[Grub-dev] do we even really need a native client
jer
jeremie at jabber.org
Thu Jan 10 21:21:59 UTC 2008
> Maybe i'm missing something.
> If you add them with
> print $arc "http://$host$path $ip 19691231175959 $ctype
> ",length($body),"\n$body";
> The file will look like:
>
> http://www.example.com 127.0.0.1 19691231175959 message/http 11
> \nHello Worldhttp://www.example.net 127.0.0.2 19691231175959
> message/http 7\nGoodbye
>
> Which would correspond with <doc><doc>
> And each of <doc> would consist of <URL-record><nl><network_doc>
>
> Whereas it should be <nl><URL-record><nl><network_doc>
>
> \nhttp://www.example.com 127.0.0.1 19691231175959 message/http 11
> \nHello World\nhttp://www.example.net 127.0.0.2 19691231175959
> message/http 7\nGoodbye
>
> You see \n before http: on the archives because almost every
> webpage ends with some line feeds. But they belong to
> <network_doc> (are into the count).
So, I think you're right and it's missing a \n, but maybe it's
missing TWO of them?
doc == <nl><URL-record><nl><network_doc>
URL-record-v1 == <url><sp>
<ip-address><sp>
<archive-date><sp>
<content-type><sp>
<length><nl>
So, there should be a \n before each URL record, and two of them
after it, one defined as the terminator in URL-record-v1, and one
defined as the separator between URL-record and network_doc. Is that
correct?
print $arc "\nhttp://$host$path $ip 19691231175959 $ctype",length
($body),"\n\n$body";
Is that correct? Can anyone else verify?
> Probably for the benefit of having persistent connections, as the
> hosts
> to connect are spread.
The workunits are specifically random, it will be very unlikely that
two requests will go to the same host from a single workunit.
> However, getting the result compressed seems worthwhile.
The workunits can (someday) start to define HTTP/1.1 with a
Connection: close, and an Accept-encoding: gzip. A client supporting
the current workunit format shouldn't care or know any different, right?
> The workunits have a header: "Agent: Grub WU1", but such header is -to
> my knowledge- not
> defined anywhere. I think it was meant to be User-Agent (rfc1945
> section
> 10.15 / rfc2616 sect 14.43).
Doh! My bad, I can fix it when I generate some more workunits :)
Jer
More information about the Grub-dev
mailing list