[Grub-dev] Grub Architect Question...

bruce bedouglas at earthlink.net
Fri Jan 16 14:51:07 UTC 2009


Hi Bartek,

We've talked a couple of weeks ago. I'm looking at possible using/extend
Grub for a distributed crawler, based on the Grub client/server app. Right
now, I'm focusing on how the client side/architecture will have to work for
my needs. I'd like to get your (or anyone's) feedback on some of my
thoughts.

For the client side/system, I'm looking to have a process where the "client
side" continually calls the server, and fetches input_files. The input files
is essentially a list of URLs to be parsed, as well as the parsing script to
run in order to parse the URL. The idea is that the client side, allows
multiple parsing scripts to be run in a simultaneous/parallel manner, so I
can continually ensure that X parsing scripts are allways running.

Keep in mind, my need is a little different from the baseline Grub app. I
will have complete control over the systems running the server/client. Also,
I'm looking to deeply parse, a specific set of sites, each of which will
have its own parsing script.

Right now, I'm working through what has to be on the client side so I can
effectively manage the multiple copies of the parsing scripts so I can track
their progress, the health/status of the client app, as well as when I've
finished parsing all the URLs in the input file. Each parsing script
generates an output_file which contains another list of URLS/data, which is
copied back to the master server, where it becomes the basis for an input
file to a different client parsing script... This process completes untill
the parsing script gets all the required data for the targeted site.

At this point, I'm wondering if anyone related to Grub has thought about
implementing a kind of process that allows/tracks multiple copies a client
app to download/upload files from the master server, so you can more
efficiently sruff/use the client box?

Thoughts/Comments/Etc... are welcome.

Thanks

-bruce






More information about the Grub-dev mailing list