[Grub-dev] LZMA support in upload server

Balinny balinny at gmail.com
Wed Feb 25 21:27:35 UTC 2009


Bartek Jasicki wrote:
> On 2009-02-25, at 18:57:10
> Balinny <balinny at gmail.com> wrote:
>
>   
>>>> It's a two-minute change. Specially given that the last entry
>>>> must be generated on-the-fly anyway.
>>>>         
>>>   
>>>
>>> That same like run again new workunits with Accept header ;) If i
>>> good count, this 2 minutes take now around 1 month ;) 
>>>       
>> Workunits are pre-generated, but I think the last entry (containing
>> the user name) is made on request?
>>
>>     
>
> AFAIK, yes - http://svn.swlabs.org/grubng/trunk/perl/dispatch.cgi
>   
See? Two minutes :)

@@ -41,1 +41,1 @@
-print "PUT /arcs/$ENV{REMOTE_USER}.$key.arc.gz HTTP/1.0\r\nHost:
soap.grub.org\r\n\r\n";
+print "PUT /arcs/$ENV{REMOTE_USER}.$key.arc HTTP/1.0\r\nHost:
soap.grub.org\r\nAccept-Encoding: gzip,lzma\r\n\r\n";


>>> Unfortunately Jeremie
>>> don't have free time for made any changes in current dispatch
>>> server. So probably this 2 minutes task must wait few months before
>>> i start work on new dispatch server (at this moment i don't have
>>> access to dispatch server).
>>>   
>>>       
>> What really does the dispatch server? Which interface does it use to
>> grab the urls?
>>     
>
> Dispatch server: for me, this is server which send workunits to users +
> workunit generator + robots.txt checker (i put all this things in one
> bag).
>   
I wasn't taking into account robots.txt Although if it's not crawling,
there's no much need to
check robots.txt


> About urls: AFAIK (or again something was changed without public
> announcement ;) Jeremie, please fix me if i wrong) Grub have own
> database with URLs from which workunit generator get URLs to fetch.
>   
So, there's a database with URLs. How to get to the database? There's
workunit.pl but it
just reads URLs from stdin...


> And only one way to get new URLs in system are sitemaps generated by
> clients or send by users (during creating workunits, Grub not connect
> to Nutch).
>   
How sad. Is it really not taking into account the page contents?


>>>> As a bonus, also add  Accept-Encoding: gzip, lzma
>>>>     
>>>>         
>>> 1. Where add it?
>>>   
>>>       
>> To the last workunit entry.
>>     
>
> But Accept-Encoding is only for response from server, thus if you add
> this, server send you compressed information about .arc file ;)
>   

Good point. It would be a header from the server, but inserted between
what the client should
 send to the server :/
Any suggestion on a better way to express that?






More information about the Grub-dev mailing list