[Grub-dev] do we even really need a native client

jer jeremie at jabber.org
Thu Jan 10 21:21:59 UTC 2008


> Maybe i'm missing something.
> If you add them with
>   print $arc "http://$host$path $ip 19691231175959 $ctype
> ",length($body),"\n$body";
> The file will look like:
>
> http://www.example.com 127.0.0.1 19691231175959 message/http 11 
> \nHello Worldhttp://www.example.net 127.0.0.2 19691231175959  
> message/http 7\nGoodbye
>
> Which would correspond with <doc><doc>
> And each of <doc> would consist of <URL-record><nl><network_doc>
>
> Whereas it should be <nl><URL-record><nl><network_doc>
>
> \nhttp://www.example.com 127.0.0.1 19691231175959 message/http 11 
> \nHello World\nhttp://www.example.net 127.0.0.2 19691231175959  
> message/http 7\nGoodbye
>
> You see \n before http: on the archives because almost every  
> webpage ends with some line feeds. But they belong to  
> <network_doc>  (are into the count).

So, I think you're right and it's missing a \n, but maybe it's  
missing TWO of them?

doc == <nl><URL-record><nl><network_doc>

URL-record-v1 == <url><sp>
<ip-address><sp>
<archive-date><sp>
<content-type><sp>
<length><nl>

So, there should be a \n before each URL record, and two of them  
after it, one defined as the terminator in URL-record-v1, and one  
defined as the separator between URL-record and network_doc.  Is that  
correct?

print $arc "\nhttp://$host$path $ip 19691231175959 $ctype",length 
($body),"\n\n$body";

Is that correct?  Can anyone else verify?


> Probably for the benefit of having persistent connections, as the  
> hosts
> to connect are spread.

The workunits are specifically random, it will be very unlikely that  
two requests will go to the same host from a single workunit.

> However, getting the result compressed seems worthwhile.

The workunits can (someday) start to define HTTP/1.1 with a  
Connection: close, and an Accept-encoding: gzip.  A client supporting  
the current workunit format shouldn't care or know any different, right?

> The workunits have a header: "Agent: Grub WU1", but such header is -to
> my knowledge- not
> defined anywhere. I think it was meant to be User-Agent (rfc1945  
> section
> 10.15 / rfc2616 sect 14.43).

Doh!  My bad, I can fix it when I generate some more workunits :)

Jer



More information about the Grub-dev mailing list