[Grub-dev] Newbie Needs Some Pointers
Mir Tanvir Hossain
mir.tanvir.hossain at gmail.com
Thu Jan 17 08:07:57 UTC 2008
Hello, I am kind of confused about the arc file. So the client is gonna
crawl each given urls, and append the resulting http header as well as
any html to a single file,right?
Than the program will compress the file using ARC format and upload it
to the server. Right?
Tanvir
On Wed, 2008-01-16 at 23:30 -0800, Yousef Ourabi wrote:
> 1) No. Client makes a simple HTTP get request with no parameters to
> http://dispatch.grub.swlabs.org/do/workunit -- the server decides the
> number of urls the client should fetch.
>
> 2) Server will give a list, which is now 250 -- but this may change .
>
> 3) Yes
>
> 4) Yes
>
> 5) not exactly. The client writes both the http headers and the
> response body (the html). they must be in the same order that the
> server gave them in the original work list.
>
> 6,7) Client doesn't change any html, it writes them to the ARC file
> exactly. But basic idea is correct.
>
>
> Hope this helps. Feel free to continue asking questions or on the IRC
> channel #searchwikia
>
> Good luck.
> Yousef
>
>
> On 1/16/08, Mir Tanvir Hossain <mir.tanvir.hossain at gmail.com> wrote:
> Hello Yousef,
>
> Lemme briefly write here what I have understood so far.
>
> 1. Client will request a list 250 urls from server.
> 2. Server will give 250 urls and a PUT url to upload back the
> ARC.
> 3. With the urls in hand, client will start crawling those 250
> urls.
> 4. Client will not follow any redirects.
> 5. client will dump all the html and check it with a known
> hash for any
> change.
> 6.The client will make an ARC file with all the changed html
> pages.
> 7. It will upload the ARC back to the server with changed html
> pages.
>
> Am I correct? Please tell me if I am wrong and correct me.
>
> Thanks again for your time.
>
> Tanvir
>
>
>
> On Wed, 2008-01-16 at 22:43 -0800, Yousef Ourabi wrote:
> > Tanvir,
> >
> > The "documentation" is all in the mailing list. There is
> nothing more
> > formal. Here is a brief description:
> >
> > client makes http get request to url
> > server returns list of 250 urls to fetch, with user-agent
> the last
> > line is an HTTP put where the client should upload the
> resulting arc
> > file
> >
> > Clients do not follow redirects ie http 301,302,307...
> > Clients do not parse outbound links
> > Clients report http headers verbatim, including errors
> >
> >
> > To learn about the arc format read this:
> >
> http://lists.wikia.com/pipermail/grub-dev/2008-January/000079.html
> >
> > Read all other emails here:
> >
> http://lists.wikia.com/pipermail/grub-dev/2008-January/thread.html
> >
> > Ask many questions!
> >
> > Thanks,
> > Yousef
> >
> > On 1/16/08, Mir Hossain <mir.tanvir.hossain at gmail.com>
> wrote:
> > Hello Yousef, Thanks for your prompt reply. I will
> try the
> > perl version right now. I know C#. May be I will try
> to
> > implement the code in C#. before that, I need to
> know how the
> > protocol works. Is there any documentation about the
> protocol?
> > Please let me know.
> >
> > Thanks
> > Tanvir
> >
> >
> > On Jan 16, 2008 10:06 PM, Yousef Ourabi
> > <yourabi at zero-analog.com> wrote:
> > Tanvir,
> > The new SVN repository is
> http://svn.swlabs.org/grubng
> >
> > We are currently re-writing the code to work
> with the
> > new RESTful API Jer (Jeremie) is
> implementing -- so
> > both the client and the server code is a
> moving
> > target.
> >
> > The *most* developed client is currently the
> perl
> > client
> http://svn.swlabs.org/grubng/trunk/perl -- but
> > many others are working on other language
> > implementations of the same protocol --
> Balinny is
> > working on a C implementation...etc
> >
> > If you are interested in learning a new
> language it
> > might not be a bad idea to start a new
> language
> > implementation of the protocol?
> >
> > -Yousef
> >
> >
> > On 1/16/08, Mir Tanvir Hossain
> > <mir.tanvir.hossain at gmail.com> wrote:
> >
> > Hello everybody, I have joined the
> mailing
> > list for couple of weeks now.
> > Reading the mails regularly. But I
> am not
> > understanding that much. I am a
> > Computer Science student and would
> like to
> > contribute some code for the
> > project. However, I am not sure
> where to
> > begin. Could anybody please give
> > some pointers on where can I start?
> >
> > Sincerely
> >
> > Tanvir
> >
> >
> >
> _______________________________________________
> > Grub-dev mailing list
> > Grub-dev at wikia.com
> >
> http://lists.wikia.com/mailman/listinfo/grub-dev
> >
> >
> >
> _______________________________________________
> > Grub-dev mailing list
> > Grub-dev at wikia.com
> >
> http://lists.wikia.com/mailman/listinfo/grub-dev
> >
> >
> >
> >
> > _______________________________________________
> > Grub-dev mailing list
> > Grub-dev at wikia.com
> > http://lists.wikia.com/mailman/listinfo/grub-dev
> >
> >
> > _______________________________________________
> > Grub-dev mailing list
> > Grub-dev at wikia.com
> > http://lists.wikia.com/mailman/listinfo/grub-dev
>
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev
>
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev
More information about the Grub-dev
mailing list