[Grub-dev] grub server/clientt

bruce bedouglas at earthlink.net
Fri Jan 9 18:09:20 UTC 2009


hi Bartek,

thanks for the reply.

ok. sounds likt this might be useful. here's my situation. i have a group of
sites that i want to parse, and i've developed small parsing scripts that
parse the sites, and drill down, get behind the forms, use passwords, etc..
to get my information. each page of the parsing process, generates a
separate page (which i need to parse to get the links to the next level of
parsing...) my scripts currently handle this process.

so i can essentially call my script, passing it the information needed to
parse the next level, until i get to the final level that i care about.

so if i uniderstand what you posted below, i could use grub, starting with a
list of initial urls that i want to parse. i then create my 'workunits'
which are then used by the client app to fetch the url's page/text.

in this case, a client would then fetch the text, and return it in the
".arc" file to the server. is this correct?

for my needs, i'd like to be able to modify the client process for my needs.
in my model, the client process would request the "url" or multiple "urls"
from the server in the form of the 'workunit', and then the client process
would call/invoke my python scripts on the client machine.

i envision running all of this using the amazon/google cloud service, so
security isn't an issue...

this is along the same process that i'm considering using the BOINC process.

does grub have a python client on the client machine?

thanks

-bruce


-----Original Message-----
From: grub-dev-bounces at wikia.com [mailto:grub-dev-bounces at wikia.com]On
Behalf Of Bartek Jasicki
Sent: Friday, January 09, 2009 8:37 AM
To: grub-dev at wikia.com
Subject: Re: [Grub-dev] grub server/clientt


On 2009-01-09, at 07:45:38
"bruce" <bedouglas at earthlink.net> wrote:

> Hi...
>
> New to grub. just discovered it... (or rediscovered it)
>
> I have a possible project that I'm playing around with. In the
> concept stage right now. But I wanted to get some
> information/understanding about Grub, beyond what I've found on the
> net (grub.org).
>
> As I understand grub, it's a server/client app, that allows clients
> to be run on multiple client servers, and that these client apps
> actualy perform the fetching of the pages from the targeted sites...
> Do I have this correct?
>

Hi

Yes, little more clarify:
1) Dispatch server create file (we call it workunit) with a list of
URL's to crawl.
2) Clients connect to dispatch server, download workunit file and start
visiting URL's and write results to .arc file.
3) After crawling, clients send .arc file to upload server.
4) Back to #1

> I'm toying with an idea for a very specific crawling kind of
> function, and it might make sense to use/examine the grub
> server/client process to see how it works. I'm also looking into the
> BOINC project as well.
>
> Is there someone I can talk to to get additional information, or is
> this the best forum for these questions... Is there an IRC channel as
> well?
>
> thanks
>
> -bruce
>

If you have any questions, you can ask it here (probably it is fastest
way to get answer), on page forum or on our IRC channel - #wikia-search
at irc.freenode.net (but IRC is the longest way to get some help with
Grub, mainly because, it is not too popular ;) ).

Best Regards

Bartek Jasicki

--
Grub Next Generation: http://grub.org
Mailing List: grub-dev at wikia.com
IRC: #wikia-search at irc.freenode.net
Jabber: thindil at jabberpl.org
_______________________________________________
Grub-dev mailing list
Grub-dev at wikia.com
http://lists.wikia.com/mailman/listinfo/grub-dev



More information about the Grub-dev mailing list