[Grub-dev] Newbie Needs Some Pointers

jer jeremie at jabber.org
Thu Jan 17 08:09:31 UTC 2008


Rough spec with these details: http://search.wikia.com/wiki/GrubWorkUnit

Jer

On Jan 17, 2008, at 1:30 AM, Yousef Ourabi wrote:

> 1) No. Client makes a simple HTTP get request with no parameters to  
> http://dispatch.grub.swlabs.org/do/workunit -- the server decides  
> the number of urls the client should fetch.
>
> 2) Server will give a list, which is now 250 -- but this may change .
>
> 3) Yes
>
> 4) Yes
>
> 5) not exactly. The client writes both the http headers and the  
> response body (the html). they must be in the same order that the  
> server gave them in the original work list.
>
> 6,7) Client doesn't change any html, it writes them to the ARC file  
> exactly. But basic idea is correct.
>
>
> Hope this helps. Feel free to continue asking questions or on the  
> IRC channel #searchwikia
>
> Good luck.
> Yousef
>
>
> On 1/16/08, Mir Tanvir Hossain <mir.tanvir.hossain at gmail.com>  
> wrote: Hello Yousef,
>
> Lemme briefly write here what I have understood so far.
>
> 1. Client will request a list 250 urls from server.
> 2. Server will give 250 urls and a PUT url to upload back the ARC.
> 3. With the urls in hand, client will start crawling those 250 urls.
> 4. Client will not follow any redirects.
> 5. client will dump all the html and check it with a known hash for  
> any
> change.
> 6.The client will make an ARC file with all the changed html pages.
> 7. It will upload the ARC back to the server with changed html pages.
>
> Am I correct? Please tell me if I am wrong and correct me.
>
> Thanks again for your time.
>
> Tanvir
>
>
>
> On Wed, 2008-01-16 at 22:43 -0800, Yousef Ourabi wrote:
> > Tanvir,
> >
> > The "documentation" is all in the mailing list. There is nothing  
> more
> > formal. Here is a brief description:
> >
> > client makes http get request to url
> > server returns list of 250 urls to fetch, with user-agent the last
> > line is an HTTP put where the client should upload the resulting arc
> > file
> >
> > Clients do not follow redirects ie http 301,302,307...
> > Clients do not parse outbound links
> > Clients report http headers verbatim, including errors
> >
> >
> > To learn about the arc format read this:
> > http://lists.wikia.com/pipermail/grub-dev/2008-January/000079.html
> >
> > Read all other emails here:
> > http://lists.wikia.com/pipermail/grub-dev/2008-January/thread.html
> >
> > Ask many questions!
> >
> > Thanks,
> > Yousef
> >
> > On 1/16/08, Mir Hossain < mir.tanvir.hossain at gmail.com> wrote:
> >         Hello Yousef, Thanks for your prompt reply. I will try the
> >         perl version right now. I know C#. May be I will try to
> >         implement the code in C#. before that, I need to know how  
> the
> >         protocol works. Is there any documentation about the  
> protocol?
> >         Please let me know.
> >
> >         Thanks
> >         Tanvir
> >
> >
> >         On Jan 16, 2008 10:06 PM, Yousef Ourabi
> >         <yourabi at zero-analog.com> wrote:
> >                 Tanvir,
> >                 The new SVN repository is http://svn.swlabs.org/ 
> grubng
> >
> >                 We are currently re-writing the code to work with  
> the
> >                 new RESTful API Jer (Jeremie) is implementing -- so
> >                 both the client and the server code is a moving
> >                 target.
> >
> >                 The *most* developed client is currently the perl
> >                 client http://svn.swlabs.org/grubng/trunk/perl --  
> but
> >                 many others are working on other language
> >                 implementations of the same protocol -- Balinny is
> >                 working on a C implementation...etc
> >
> >                 If you are interested in learning a new language it
> >                 might not be a bad idea to start a new language
> >                 implementation of the protocol?
> >
> >                 -Yousef
> >
> >
> >                 On 1/16/08, Mir Tanvir Hossain
> >                 <mir.tanvir.hossain at gmail.com> wrote:
> >
> >                         Hello everybody, I have joined the mailing
> >                         list for couple of weeks now.
> >                         Reading the mails regularly. But I am not
> >                         understanding that much. I am a
> >                         Computer Science student and would like to
> >                         contribute some code for the
> >                         project. However, I am not sure where to
> >                         begin. Could anybody please give
> >                         some pointers on where can I start?
> >
> >                         Sincerely
> >
> >                         Tanvir
> >
> >
> >                          
> _______________________________________________
> >                         Grub-dev mailing list
> >                         Grub-dev at wikia.com
> >                         http://lists.wikia.com/mailman/listinfo/ 
> grub-dev
> >
> >
> >                 _______________________________________________
> >                 Grub-dev mailing list
> >                 Grub-dev at wikia.com
> >                 http://lists.wikia.com/mailman/listinfo/grub-dev
> >
> >
> >
> >
> >         _______________________________________________
> >         Grub-dev mailing list
> >         Grub-dev at wikia.com
> >         http://lists.wikia.com/mailman/listinfo/grub-dev
> >
> >
> > _______________________________________________
> > Grub-dev mailing list
> > Grub-dev at wikia.com
> > http://lists.wikia.com/mailman/listinfo/grub-dev
>
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev
>
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev



More information about the Grub-dev mailing list