[Grub-dev] grub server/clientt
Bartek Jasicki
thindil2 at gmail.com
Sat Jan 10 17:12:04 UTC 2009
On 2009-01-10, at. 07:41:13
"bruce" <bedouglas at earthlink.net> wrote:
> hi again bartek!
>
Welcome again ;)
> thanks for the replies on this one....
>
> so, as i understand:
>
> -the full code for the server, and client apps for grub are open
> source, and can be downloaded from the grub.org site
Yes, from page or directly from our Subversion repository:
http://svn.swlabs.org/grubng/ or
http://people.swlabs.org/~bartek/websvn/listing.php?repname=GrubNG&path=%2F&sc=0
> -the server/client architecture is such that the server basically
> maintains a list of urls to fetch, and it replies to requests
> from the clients, distributing the urls on a 1st come, 1st served
> basis.
Yes
> -the client app is a "dumb" app that fetches the url(s) that it
> has fetched from the server
>
Exactly
> a few questions:
> -does the server do any kind of quality assurance, checking on
> the returned data from the client url fetch. is there any kind
> of built in redundancy for fetchin the same url from multiple
> clients to assure that the data/page content is valid
At this moment, server only check order and what links are returned by
client in .arc file (plus check correctness of .arc file too). This
option about which you ask was discussed some time ago, but it is not
yet implemented in upload server. That same is with E-Tag header. Main
reason of this situation: from last 3 months servers developers team
counts 1 person (earlier was 0 persons ;) ). So, probably in future
this option been added to server.
> -does the app permit multiple clients to be run on a given
> client server at the same time (simultaneous clients running
> on the same server)
Sure, for example one C# client can run simultaneously up to 50
crawlers. Other clients - can be run few times for get this same effect.
Because each client/crawler get own workunit, there no problem to run
any amount clients/crawlers on one machine (limited only by cpu, memory
and net connection).
> -does the server track the status/health of the overall
> client servers/client apps for the network?
>
No, servers and clients are independent. At this moment, as i wrote
earlier, servers are simple perl scripts. Plus this is not compliant
with Grub (and Wikia Search) one from principles: privacy. Most of
clients works on volunteers computers, not on project machines. I plan
add to upload server ability to check his status, but not for clients.
> in evaluating BOINC, it appears that BOINC doesn't easily permit
> multiple boinc client apps to be run in a simultaneous manner, which
> means i'd have to craft a client, than in effect would spawn off
> child threads/processes on the local machine which would perform the
> actual work. this would cause issues, as the page fetch of some pages
> might complete, and the app would essentially have to wait for the
> stragglers to complete... if i could have a client process, that
> wouls continually go back to the server to fetch data, based on the
> available system resources... then i could maximize the client
> servers for this function....
>
> my hope is that grub might handle this (or be able to be adapted to
> handle this) easily if i can't accomplish it with BOINC.
>
IMO, Grub system can be easier to adapt for this case. In Grub
you have:
- ability to run multiple crawlers simultaneously
- simple fetching pages mechanism which write all data in one standard
(.arc files are used not only by Grub but by Internet Archive too)
- simple authentication client/server
- few clients have automatic mode for work (something similar to Fire
and Forget ;) )
- simple servers/clients apps, which can be used to create own
system (IMO is better to start with existing code, than writing
something from beginning ;) )
What don't have Grub:
- crawling links from visited page if they aren't in workunit (and this
option Grub never been implemented)
- checking for quality of sending url's (Grub not have it yet)
- checking for status of clients/servers (for servers it can be, for
clients, probably never).
So, even if Grub cannot works for you, you can simply examine code, get
this part which can be useful and you save some time ;) Remember, BOINC
was created to use mainly memory and cpu on machines, they are
completely not interested in using net connection on client machines
(long time ago i don't look to BOINC client code, maybe this was
changed).
> thoughts/comments/etc..
>
> -bruce
>
>
Bartek
--
Grub Next Generation: http://grub.org
Mailing List: grub-dev at wikia.com
IRC: #wikia-search at irc.freenode.net
Jabber: thindil at jabberpl.org
More information about the Grub-dev
mailing list