[Grub-dev] Newbie Needs Some Pointers

Mir Tanvir Hossain mir.tanvir.hossain at gmail.com
Thu Jan 17 08:07:57 UTC 2008


Hello, I am kind of confused about the arc file. So the client is gonna
crawl each given urls, and append the resulting http header as well as
any html to a single file,right? 

Than the program will compress the file using ARC format and upload it
to the server. Right?

Tanvir

 
On Wed, 2008-01-16 at 23:30 -0800, Yousef Ourabi wrote:
> 1) No. Client makes a simple HTTP get request with no parameters to
> http://dispatch.grub.swlabs.org/do/workunit -- the server decides the
> number of urls the client should fetch. 
> 
> 2) Server will give a list, which is now 250 -- but this may change . 
> 
> 3) Yes
> 
> 4) Yes
> 
> 5) not exactly. The client writes both the http headers and the
> response body (the html). they must be in the same order that the
> server gave them in the original work list. 
> 
> 6,7) Client doesn't change any html, it writes them to the ARC file
> exactly. But basic idea is correct.
> 
> 
> Hope this helps. Feel free to continue asking questions or on the IRC
> channel #searchwikia
> 
> Good luck.
> Yousef
> 
> 
> On 1/16/08, Mir Tanvir Hossain <mir.tanvir.hossain at gmail.com> wrote:
>         Hello Yousef,
>         
>         Lemme briefly write here what I have understood so far.
>         
>         1. Client will request a list 250 urls from server. 
>         2. Server will give 250 urls and a PUT url to upload back the
>         ARC.
>         3. With the urls in hand, client will start crawling those 250
>         urls.
>         4. Client will not follow any redirects.
>         5. client will dump all the html and check it with a known
>         hash for any 
>         change.
>         6.The client will make an ARC file with all the changed html
>         pages.
>         7. It will upload the ARC back to the server with changed html
>         pages.
>         
>         Am I correct? Please tell me if I am wrong and correct me. 
>         
>         Thanks again for your time.
>         
>         Tanvir
>         
>         
>         
>         On Wed, 2008-01-16 at 22:43 -0800, Yousef Ourabi wrote:
>         > Tanvir,
>         >
>         > The "documentation" is all in the mailing list. There is
>         nothing more 
>         > formal. Here is a brief description:
>         >
>         > client makes http get request to url
>         > server returns list of 250 urls to fetch, with user-agent
>         the last
>         > line is an HTTP put where the client should upload the
>         resulting arc 
>         > file
>         >
>         > Clients do not follow redirects ie http 301,302,307...
>         > Clients do not parse outbound links
>         > Clients report http headers verbatim, including errors
>         >
>         >
>         > To learn about the arc format read this: 
>         >
>         http://lists.wikia.com/pipermail/grub-dev/2008-January/000079.html
>         >
>         > Read all other emails here:
>         >
>         http://lists.wikia.com/pipermail/grub-dev/2008-January/thread.html
>         >
>         > Ask many questions!
>         >
>         > Thanks,
>         > Yousef
>         >
>         > On 1/16/08, Mir Hossain <mir.tanvir.hossain at gmail.com>
>         wrote:
>         >         Hello Yousef, Thanks for your prompt reply. I will
>         try the
>         >         perl version right now. I know C#. May be I will try
>         to
>         >         implement the code in C#. before that, I need to
>         know how the 
>         >         protocol works. Is there any documentation about the
>         protocol?
>         >         Please let me know.
>         >
>         >         Thanks
>         >         Tanvir
>         >
>         >
>         >         On Jan 16, 2008 10:06 PM, Yousef Ourabi 
>         >         <yourabi at zero-analog.com> wrote:
>         >                 Tanvir,
>         >                 The new SVN repository is
>         http://svn.swlabs.org/grubng
>         >
>         >                 We are currently re-writing the code to work
>         with the
>         >                 new RESTful API Jer (Jeremie) is
>         implementing -- so
>         >                 both the client and the server code is a
>         moving 
>         >                 target.
>         >
>         >                 The *most* developed client is currently the
>         perl
>         >                 client
>         http://svn.swlabs.org/grubng/trunk/perl -- but
>         >                 many others are working on other language
>         >                 implementations of the same protocol --
>         Balinny is
>         >                 working on a C implementation...etc
>         >
>         >                 If you are interested in learning a new
>         language it
>         >                 might not be a bad idea to start a new
>         language
>         >                 implementation of the protocol?
>         >
>         >                 -Yousef 
>         >
>         >
>         >                 On 1/16/08, Mir Tanvir Hossain
>         >                 <mir.tanvir.hossain at gmail.com> wrote:
>         >
>         >                         Hello everybody, I have joined the
>         mailing 
>         >                         list for couple of weeks now.
>         >                         Reading the mails regularly. But I
>         am not
>         >                         understanding that much. I am a
>         >                         Computer Science student and would
>         like to 
>         >                         contribute some code for the
>         >                         project. However, I am not sure
>         where to
>         >                         begin. Could anybody please give
>         >                         some pointers on where can I start? 
>         >
>         >                         Sincerely
>         >
>         >                         Tanvir
>         >
>         >
>         >
>         _______________________________________________
>         >                         Grub-dev mailing list 
>         >                         Grub-dev at wikia.com
>         >
>         http://lists.wikia.com/mailman/listinfo/grub-dev
>         >
>         >
>         >
>         _______________________________________________
>         >                 Grub-dev mailing list
>         >                 Grub-dev at wikia.com 
>         >
>         http://lists.wikia.com/mailman/listinfo/grub-dev
>         >
>         >
>         >
>         >
>         >         _______________________________________________ 
>         >         Grub-dev mailing list
>         >         Grub-dev at wikia.com
>         >         http://lists.wikia.com/mailman/listinfo/grub-dev
>         >
>         >
>         > _______________________________________________
>         > Grub-dev mailing list
>         > Grub-dev at wikia.com
>         > http://lists.wikia.com/mailman/listinfo/grub-dev
>         
>         _______________________________________________
>         Grub-dev mailing list
>         Grub-dev at wikia.com
>         http://lists.wikia.com/mailman/listinfo/grub-dev
> 
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev



More information about the Grub-dev mailing list