[Grub-dev] Wondering about Architecture, Bug-base, Discussions, ...

Jeremie Miller jeremie at jabber.org
Tue Oct 28 07:49:59 UTC 2008


I'm just following up on this thread, I was out for almost a week on  
vaca and finally catching up :)

Bartek answered all the questions wonderfully so there's not much for  
me to add, except perhaps to explain why it's such a small project  
yet: it only barely works, *grin*.

The very basics work fine, workunits are handed out and clients are  
crawling them and uploading the arcs just fine... it's all the other  
smarts that are missing, tracking a global list of urls to generate  
good workunits (right now it just dumps them from a manually made  
list), tracking which pages have errors to retry/ignore or aren't  
changing, re-using I-M-S and ETag headers, etc.

One of the challenges has been how to store and process the incoming  
ARCs into Hadoop and track them using Hbase, they're pretty new  
(particularly Hbase) but should scale up well.

I'd love to see the whole system evolving further and hope to bring  
some more attention and usage out of it soon :)

Jer


On Oct 22, 2008, at 4:28 AM, Bartek Jasicki wrote:

> 2008-10-22, 08:09:40
> Swaroop C H <swaroop at swaroopch.com> wrote:
>
>>
>> Out of curiosity, why are there so many clients?
>> Especially because they are not all in sync?
>>
>
> 1. Most of clients was created when we start work on Grub. Our
> development model is (using terminology from
> http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ )
> typical bazaar - we only show what we want to achieve, not way to it.
> So, if you want start work (for example) on Python client or continue
> work on any existing client - there is no problem ;)
> 2. We don't delete any code, even if it is made as a experiment (like
> Bash client - for me it is only experiment).
> 3. C and C# clients focus on different groups of users. C client is
> made mainly for usage on servers, when C# client is focused on desktop
> computers. Of course they can be used in second way too ;)
>
>> If it is considered useful, I can work on the Perl client, if someone
>> is willing to guide me on Grub :)
>>
>
> Any help are welcome ;) Short documentation about client is here:
> http://grub.org/?q=/node/140
> If you been have any question about this documentation, feel free to
> ask ;) At this moment, Perl client need IMO support for proxy, content
> coding and creation of sitemaps from crawled URLs. Additionally it can
> have automatic mode.
>
>> Oh wow, didn't realize there are so few people involved.
>>
>> Not intending to rake a controversy, but is there a reason why? It
>> seems that both these people are working full-time for Wikia? Why is
>> there a lack of interest from outside contributors? Or is there
>> something else to the bigger picture (more interest in Nutch?)
>>
>
> Difficult questions ;) IMO - there is few reasons:
> 1. By long time, Grub page was dead. With old content, broken links  
> and
> no information about current progress of work on Grub. So, all project
> looks like dead. Current page was only 4-5 months and many
> documentations or informations occur on it in last month or two. And  
> of
> course - this informations are still incomplete.
> 2. Informations about Grub on Wikia Search are good hidden or
> outdated (yes, again I rant about this) ;) I'm really surprised that
> someone can find Grub ;)
> 3. Grub is not that spectacular like Wikia Search interface ;) Less
> people use it, so if you join to Grub you gain less fame ;)
> 4. If you look on very similar project - BOINC, you see similar
> situation: program used by hundreds of thousands people on world, but
> developed only by few developers.
>
> And one more thing - yes, there is only few developers, but
> additionally other people contribute to Grub too: few people made
> translations of C# client or page, reports bugs etc. In
> distributed projects like Grub, users are that same important (or
> even more) like developers - because most important part of project is
> not client or servers but output produced by project (database).
> Developers can "only" made user friendly (and stable) tools, but only
> users can produce this output ;) So, IMO every Grub user is a part of
> team ;)
>
> Bartek
>
> -- 
> Grub Next Generation: http://grub.org
> Mailing List: grub-dev at wikia.com
> IRC: #wikia-search at irc.freenode.net
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev
>



More information about the Grub-dev mailing list