[Grub-dev] LZMA support in upload server

Bartek Jasicki thindil2 at gmail.com
Wed Feb 25 21:52:20 UTC 2009


On 2009-02-25, at 22:27:35
Balinny <balinny at gmail.com> wrote:

> See? Two minutes :)
> 
> @@ -41,1 +41,1 @@
> -print "PUT /arcs/$ENV{REMOTE_USER}.$key.arc.gz HTTP/1.0\r\nHost:
> soap.grub.org\r\n\r\n";
> +print "PUT /arcs/$ENV{REMOTE_USER}.$key.arc HTTP/1.0\r\nHost:
> soap.grub.org\r\nAccept-Encoding: gzip,lzma\r\n\r\n";
> 

Hmm, then why i still get workunits with old format? ;)

> >> What really does the dispatch server? Which interface does it use
> >> to grab the urls?
> >>     
> >
> > Dispatch server: for me, this is server which send workunits to
> > users + workunit generator + robots.txt checker (i put all this
> > things in one bag).
> >   
> I wasn't taking into account robots.txt Although if it's not crawling,
> there's no much need to
> check robots.txt
> 
> 

Server must check robots.txt before add URLs to workunit - so, for this
reason for me this is part of server.

> > About urls: AFAIK (or again something was changed without public
> > announcement ;) Jeremie, please fix me if i wrong) Grub have own
> > database with URLs from which workunit generator get URLs to fetch.
> >   
> So, there's a database with URLs. How to get to the database? There's
> workunit.pl but it
> just reads URLs from stdin...
> 
> 

AFAIK - this part is missing in repository. But remember, i'm not 100%
sure about this, how dispatch server works.

> > And only one way to get new URLs in system are sitemaps generated by
> > clients or send by users (during creating workunits, Grub not
> > connect to Nutch).
> >   
> How sad. Is it really not taking into account the page contents?
> 
> 

Probably not. Jeremie?

> >>>> As a bonus, also add  Accept-Encoding: gzip, lzma
> >>>>     
> >>>>         
> >>> 1. Where add it?
> >>>   
> >>>       
> >> To the last workunit entry.
> >>     
> >
> > But Accept-Encoding is only for response from server, thus if you
> > add this, server send you compressed information about .arc file ;)
> >   
> 
> Good point. It would be a header from the server, but inserted between
> what the client should
>  send to the server :/
> Any suggestion on a better way to express that?

But, please, could you first write, why you want add this header?

Bartek

-- 
Grub Next Generation: http://grub.org
Mailing List: grub-dev at wikia.com
IRC: #wikia-search at irc.freenode.net
Jabber: thindil at jabberpl.org


More information about the Grub-dev mailing list