[Grub-dev] back working on some grub stuff :)

Bartek Jasicki thindil2 at gmail.com
Sun May 11 15:27:25 UTC 2008


On 2008-05-11 at. 15:33:02
Balinny <balinny at gmail.com> wrote:

> Bartek Jasicki wrote:
> > On 2008-05-10 at. 18:14:38
> > Jeremie Miller <jeremie at jabber.org> wrote:
> >
> >   
> >> I've been toying with re-generating a bunch of workunits, and I'd
> >> like to include this header:
> >>
> >> 	Accept: text/html
> >>
> >> Is there any other headers that we should include by default?  
> >>     
> >
> > Maybe change this on:
> >
> > 	Accept: text/html, text/*
> >
> > Then crawler can get all text type but prefer html. Later client or
> > server can convert for example .pdf, .doc or .odf files on .html
> > files. Plus here is still problem with servers which not send any
> > content type headers.
> >   
> Better, but keep into mind that those formats wouldn't be delivered.
> pdf is application/pdf,
> doc is application/*vnd*.ms-word and odf vnd.oasis.opendocument 
> <http://www.iana.org/assignments/media-types/application/vnd.oasis.opendocument.text-web>.
> 

You have right, my mistake, sorry.

> I think it should be prefferred:
> Accept: application/xhtml+xml, text/html, text/*, 
> application/vnd.oasis.opendocument 
> <http://www.iana.org/assignments/media-types/application/vnd.oasis.opendocument.text-web>.*, 
> application/pdf, application/*vnd*.ms-word
> 
> Problem: A partial wildcard doesn't seem to be in the standards,
> though could be sensible to use if understood by some major servers.
> And there's a lot of opendocument mimes...
> 

Then maybe at now we use only pdf?

Accept: application/xhtml+xml, text/html, text/*, application/pdf

I think, this may be enough at now.

> > Maybe add: 
> >
> > 	Accept-Encoding: gzip,deflate
> >
> > too? For now C# client has it by default. This can save bandwith
> > and if i good saw, some servers give content only when it is set
> > (example: Wikipedia return other page on no compressed connection
> > and other for compressed)
> >   
> Then, the arc could contain the page body compressed. If the arc is 
> supposed to contain the
> page uncompressed it shouldn't appear on the workunit (but could be 
> added by the client).
> deflate is probably not needed, as very few servers support it.
> 
> 

Or .arc file can contain the page body uncompressed too. I based on
this same option in any modern web browser - compressed connection to
server and decompress it after received response and before show it to
user. Arc file is compressed before send to server, thus imho
compressing page body and later all .arc file can be little weird and
make more work for server.
Plus this proposition is very optional, i dont want force add it to
basic client options ;)


And one other thing - about if-modified-since header. If we want talk
about modify workunit format (it been needed for if-modified-since
header) - how about change it from plain text to xml file? I think this
can be easier to parse by clients and more flexible.

Bartek


More information about the Grub-dev mailing list