[Grub-dev] back working on some grub stuff :)
Balinny
balinny at gmail.com
Sun May 11 13:33:02 UTC 2008
Bartek Jasicki wrote:
> On 2008-05-10 at. 18:14:38
> Jeremie Miller <jeremie at jabber.org> wrote:
>
>
>> I've been toying with re-generating a bunch of workunits, and I'd
>> like to include this header:
>>
>> Accept: text/html
>>
>> Is there any other headers that we should include by default?
>>
>
> Maybe change this on:
>
> Accept: text/html, text/*
>
> Then crawler can get all text type but prefer html. Later client or
> server can convert for example .pdf, .doc or .odf files on .html files.
> Plus here is still problem with servers which not send any content type
> headers.
>
Better, but keep into mind that those formats wouldn't be delivered. pdf
is application/pdf,
doc is application/*vnd*.ms-word and odf vnd.oasis.opendocument
<http://www.iana.org/assignments/media-types/application/vnd.oasis.opendocument.text-web>.
I think it should be prefferred:
Accept: application/xhtml+xml, text/html, text/*,
application/vnd.oasis.opendocument
<http://www.iana.org/assignments/media-types/application/vnd.oasis.opendocument.text-web>.*,
application/pdf, application/*vnd*.ms-word
Problem: A partial wildcard doesn't seem to be in the standards, though
could be sensible to use if understood by some major servers.
And there's a lot of opendocument mimes...
> Maybe add:
>
> Accept-Encoding: gzip,deflate
>
> too? For now C# client has it by default. This can save bandwith and if
> i good saw, some servers give content only when it is set (example:
> Wikipedia return other page on no compressed connection and other for
> compressed)
>
Then, the arc could contain the page body compressed. If the arc is
supposed to contain the
page uncompressed it shouldn't appear on the workunit (but could be
added by the client).
deflate is probably not needed, as very few servers support it.
More information about the Grub-dev
mailing list