[Grub-dev] tiered client, other thoughts...

Balinny balinny at gmail.com
Wed Jan 9 20:46:36 UTC 2008


Yousef Ourabi wrote:
> So as I play around with my currently less broken patched babygrub, a 
> few ideas are floating around that I wanted to share with the list and 
> get feedback.
>
> 1) Have tiered client modes: The current implementation holds the 
> clients at arms length by specifying they should not follow redirects, 
> or follow links. There are strong advantages to this approach as it 
> keeps the overall design simple. However, it might make sense to 
> incorporate the notion that not all clients are equal. For 
> example,for  someone with weaker hardware on a slower connection 
> fetching 250 pages might be a header burden, so instead there could be 
> a "validation" tier of clients that simply do HEAD requests and either 
> validate current existence, or report and error without. There could 
> be "super" clients that are in some way "authenticated" or vetted by 
> the grub server that take more of the processing burden, such as 
> parsing the page for outbound links...etc.
Maybe.

> 2) Ability to report failure -- every time a request is made to the 
> dispatcher, it generates a new work-list. What if the client thread is 
> stopped for some reason, and the admin wants to explicitly re-fetch 
> the last list (I'm thinking of me in my debug mode right now). Does 
> this functionality exist? IF not it should.
I think it's a client problem. It should download the file and work from 
it. While file exists work with it. Then PUT and delete.
It doesn't need to be transactional, just able to retry.
Habing it automatically getting a new set on start is not the best way.
"I" requested a number of units,  but between the programs not working, 
runs which i aborted and that there was some error at the upload code... 
I think not even one was uploaded.
I have some of them as files though.

> 3) Significance of result order -- I have somewhat mixed feelings on 
> this. Having the hash of the URIs in order is cool,. but I'm wondering 
> if it would be just as effective if they were out of order, because it 
> shouldn't make a difference if I crawl hosts a,b, and c in that order 
> or c,a,b. Also the individual URI is a nice unit to divide work in a 
> multi-threaded client, which would then place the burden of 
> re-ordering the results on the client side just so the hash can 
> match...  I don't know, maybe I just need to be convinced some more on 
> this.
Maybe the server cold sent the URL sorted, then resort results before 
hashing?


PS: patch v2 wasn't attached.


More information about the Grub-dev mailing list