So as I play around with my currently less broken patched babygrub, a few ideas are floating around that I wanted to share with the list and get feedback.<br><br>1) Have tiered client modes: The current implementation holds the clients at arms length by specifying they should not follow redirects, or follow links. There are strong advantages to this approach as it keeps the overall design simple. However, it might make sense to incorporate the notion that not all clients are equal. For example,for someone with weaker hardware on a slower connection fetching 250 pages might be a header burden, so instead there could be a "validation" tier of clients that simply do HEAD requests and either validate current existence, or report and error without. There could be "super" clients that are in some way "authenticated" or vetted by the grub server that take more of the processing burden, such as parsing the page for outbound links...etc.
<br><br>2) Ability to report failure -- every time a request is made to the dispatcher, it generates a new work-list. What if the client thread is stopped for some reason, and the admin wants to explicitly re-fetch the last list (I'm thinking of me in my debug mode right now). Does this functionality exist? IF not it should.
<br><br>3) Significance of result order -- I have somewhat mixed feelings on this. Having the hash of the URIs in order is cool,. but I'm wondering if it would be just as effective if they were out of order, because it shouldn't make a difference if I crawl hosts a,b, and c in that order or c,a,b. Also the individual URI is a nice unit to divide work in a multi-threaded client, which would then place the burden of re-ordering the results on the client side just so the hash can match... I don't know, maybe I just need to be convinced some more on this.
<br>