[Grub-dev] Newbie Needs Some Pointers
Yousef Ourabi
yourabi at zero-analog.com
Thu Jan 17 07:30:56 UTC 2008
1) No. Client makes a simple HTTP get request with no parameters to
http://dispatch.grub.swlabs.org/do/workunit -- the server decides the number
of urls the client should fetch.
2) Server will give a list, which is now 250 -- but this may change .
3) Yes
4) Yes
5) not exactly. The client writes both the http headers and the response
body (the html). they must be in the same order that the server gave them in
the original work list.
6,7) Client doesn't change any html, it writes them to the ARC file exactly.
But basic idea is correct.
Hope this helps. Feel free to continue asking questions or on the IRC
channel #searchwikia
Good luck.
Yousef
On 1/16/08, Mir Tanvir Hossain <mir.tanvir.hossain at gmail.com> wrote:
>
> Hello Yousef,
>
> Lemme briefly write here what I have understood so far.
>
> 1. Client will request a list 250 urls from server.
> 2. Server will give 250 urls and a PUT url to upload back the ARC.
> 3. With the urls in hand, client will start crawling those 250 urls.
> 4. Client will not follow any redirects.
> 5. client will dump all the html and check it with a known hash for any
> change.
> 6.The client will make an ARC file with all the changed html pages.
> 7. It will upload the ARC back to the server with changed html pages.
>
> Am I correct? Please tell me if I am wrong and correct me.
>
> Thanks again for your time.
>
> Tanvir
>
>
>
> On Wed, 2008-01-16 at 22:43 -0800, Yousef Ourabi wrote:
> > Tanvir,
> >
> > The "documentation" is all in the mailing list. There is nothing more
> > formal. Here is a brief description:
> >
> > client makes http get request to url
> > server returns list of 250 urls to fetch, with user-agent the last
> > line is an HTTP put where the client should upload the resulting arc
> > file
> >
> > Clients do not follow redirects ie http 301,302,307...
> > Clients do not parse outbound links
> > Clients report http headers verbatim, including errors
> >
> >
> > To learn about the arc format read this:
> > http://lists.wikia.com/pipermail/grub-dev/2008-January/000079.html
> >
> > Read all other emails here:
> > http://lists.wikia.com/pipermail/grub-dev/2008-January/thread.html
> >
> > Ask many questions!
> >
> > Thanks,
> > Yousef
> >
> > On 1/16/08, Mir Hossain <mir.tanvir.hossain at gmail.com> wrote:
> > Hello Yousef, Thanks for your prompt reply. I will try the
> > perl version right now. I know C#. May be I will try to
> > implement the code in C#. before that, I need to know how the
> > protocol works. Is there any documentation about the protocol?
> > Please let me know.
> >
> > Thanks
> > Tanvir
> >
> >
> > On Jan 16, 2008 10:06 PM, Yousef Ourabi
> > <yourabi at zero-analog.com> wrote:
> > Tanvir,
> > The new SVN repository is http://svn.swlabs.org/grubng
> >
> > We are currently re-writing the code to work with the
> > new RESTful API Jer (Jeremie) is implementing -- so
> > both the client and the server code is a moving
> > target.
> >
> > The *most* developed client is currently the perl
> > client http://svn.swlabs.org/grubng/trunk/perl -- but
> > many others are working on other language
> > implementations of the same protocol -- Balinny is
> > working on a C implementation...etc
> >
> > If you are interested in learning a new language it
> > might not be a bad idea to start a new language
> > implementation of the protocol?
> >
> > -Yousef
> >
> >
> > On 1/16/08, Mir Tanvir Hossain
> > <mir.tanvir.hossain at gmail.com> wrote:
> >
> > Hello everybody, I have joined the mailing
> > list for couple of weeks now.
> > Reading the mails regularly. But I am not
> > understanding that much. I am a
> > Computer Science student and would like to
> > contribute some code for the
> > project. However, I am not sure where to
> > begin. Could anybody please give
> > some pointers on where can I start?
> >
> > Sincerely
> >
> > Tanvir
> >
> >
> > _______________________________________________
> > Grub-dev mailing list
> > Grub-dev at wikia.com
> > http://lists.wikia.com/mailman/listinfo/grub-dev
> >
> >
> > _______________________________________________
> > Grub-dev mailing list
> > Grub-dev at wikia.com
> > http://lists.wikia.com/mailman/listinfo/grub-dev
> >
> >
> >
> >
> > _______________________________________________
> > Grub-dev mailing list
> > Grub-dev at wikia.com
> > http://lists.wikia.com/mailman/listinfo/grub-dev
> >
> >
> > _______________________________________________
> > Grub-dev mailing list
> > Grub-dev at wikia.com
> > http://lists.wikia.com/mailman/listinfo/grub-dev
>
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikia.com/pipermail/grub-dev/attachments/20080116/c02afaf3/attachment-0001.html
More information about the Grub-dev
mailing list