[Grub-dev] Newbie Needs Some Pointers
jer
jeremie at jabber.org
Thu Jan 17 08:09:31 UTC 2008
Rough spec with these details: http://search.wikia.com/wiki/GrubWorkUnit
Jer
On Jan 17, 2008, at 1:30 AM, Yousef Ourabi wrote:
> 1) No. Client makes a simple HTTP get request with no parameters to
> http://dispatch.grub.swlabs.org/do/workunit -- the server decides
> the number of urls the client should fetch.
>
> 2) Server will give a list, which is now 250 -- but this may change .
>
> 3) Yes
>
> 4) Yes
>
> 5) not exactly. The client writes both the http headers and the
> response body (the html). they must be in the same order that the
> server gave them in the original work list.
>
> 6,7) Client doesn't change any html, it writes them to the ARC file
> exactly. But basic idea is correct.
>
>
> Hope this helps. Feel free to continue asking questions or on the
> IRC channel #searchwikia
>
> Good luck.
> Yousef
>
>
> On 1/16/08, Mir Tanvir Hossain <mir.tanvir.hossain at gmail.com>
> wrote: Hello Yousef,
>
> Lemme briefly write here what I have understood so far.
>
> 1. Client will request a list 250 urls from server.
> 2. Server will give 250 urls and a PUT url to upload back the ARC.
> 3. With the urls in hand, client will start crawling those 250 urls.
> 4. Client will not follow any redirects.
> 5. client will dump all the html and check it with a known hash for
> any
> change.
> 6.The client will make an ARC file with all the changed html pages.
> 7. It will upload the ARC back to the server with changed html pages.
>
> Am I correct? Please tell me if I am wrong and correct me.
>
> Thanks again for your time.
>
> Tanvir
>
>
>
> On Wed, 2008-01-16 at 22:43 -0800, Yousef Ourabi wrote:
> > Tanvir,
> >
> > The "documentation" is all in the mailing list. There is nothing
> more
> > formal. Here is a brief description:
> >
> > client makes http get request to url
> > server returns list of 250 urls to fetch, with user-agent the last
> > line is an HTTP put where the client should upload the resulting arc
> > file
> >
> > Clients do not follow redirects ie http 301,302,307...
> > Clients do not parse outbound links
> > Clients report http headers verbatim, including errors
> >
> >
> > To learn about the arc format read this:
> > http://lists.wikia.com/pipermail/grub-dev/2008-January/000079.html
> >
> > Read all other emails here:
> > http://lists.wikia.com/pipermail/grub-dev/2008-January/thread.html
> >
> > Ask many questions!
> >
> > Thanks,
> > Yousef
> >
> > On 1/16/08, Mir Hossain < mir.tanvir.hossain at gmail.com> wrote:
> > Hello Yousef, Thanks for your prompt reply. I will try the
> > perl version right now. I know C#. May be I will try to
> > implement the code in C#. before that, I need to know how
> the
> > protocol works. Is there any documentation about the
> protocol?
> > Please let me know.
> >
> > Thanks
> > Tanvir
> >
> >
> > On Jan 16, 2008 10:06 PM, Yousef Ourabi
> > <yourabi at zero-analog.com> wrote:
> > Tanvir,
> > The new SVN repository is http://svn.swlabs.org/
> grubng
> >
> > We are currently re-writing the code to work with
> the
> > new RESTful API Jer (Jeremie) is implementing -- so
> > both the client and the server code is a moving
> > target.
> >
> > The *most* developed client is currently the perl
> > client http://svn.swlabs.org/grubng/trunk/perl --
> but
> > many others are working on other language
> > implementations of the same protocol -- Balinny is
> > working on a C implementation...etc
> >
> > If you are interested in learning a new language it
> > might not be a bad idea to start a new language
> > implementation of the protocol?
> >
> > -Yousef
> >
> >
> > On 1/16/08, Mir Tanvir Hossain
> > <mir.tanvir.hossain at gmail.com> wrote:
> >
> > Hello everybody, I have joined the mailing
> > list for couple of weeks now.
> > Reading the mails regularly. But I am not
> > understanding that much. I am a
> > Computer Science student and would like to
> > contribute some code for the
> > project. However, I am not sure where to
> > begin. Could anybody please give
> > some pointers on where can I start?
> >
> > Sincerely
> >
> > Tanvir
> >
> >
> >
> _______________________________________________
> > Grub-dev mailing list
> > Grub-dev at wikia.com
> > http://lists.wikia.com/mailman/listinfo/
> grub-dev
> >
> >
> > _______________________________________________
> > Grub-dev mailing list
> > Grub-dev at wikia.com
> > http://lists.wikia.com/mailman/listinfo/grub-dev
> >
> >
> >
> >
> > _______________________________________________
> > Grub-dev mailing list
> > Grub-dev at wikia.com
> > http://lists.wikia.com/mailman/listinfo/grub-dev
> >
> >
> > _______________________________________________
> > Grub-dev mailing list
> > Grub-dev at wikia.com
> > http://lists.wikia.com/mailman/listinfo/grub-dev
>
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev
>
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev
More information about the Grub-dev
mailing list