[Grub-dev] abandon ship

Mark (Markie) newsmarkie at googlemail.com
Wed Apr 1 18:02:03 UTC 2009


detailed crawls of wikipedia and sister projects including commons, flickr
and im sure theres more free content sites

regards

mark

2009/4/1 Bani <borboleta at gmail.com>

> Even if we find a starting point the proccess of discovering new URLs
> (which I think is one of the main features of a crawler) may not work very
> well.
> One potential starting point is wikimedia commons.
>
>
> On Wed, Apr 1, 2009 at 2:31 PM, Jeremie Miller <jeremie at jabber.org> wrote:
>
>> I've not looked much yet, but is there any obvious list of URLs with
>> (potential) CCed content we can start adding into the grub
>> dispatcher?  If there were enough, we could replace the entire
>> dispatch list and immediately start archiving CCed only content :)
>>
>> Jer
>>
>> On Apr 1, 2009, at 3:55 AM, Giorgos Logiotatidis wrote:
>>
>> > This is bad news to hear but the close down of wikia search because of
>> > the economical crisis, does not mean that the project has no
>> > potential.
>> > Grub should survive and an another index, as all you say said,
>> > should be
>> > build.
>> >
>> > I would prefer to focus of Creative Commons content, as there is some
>> > challenge to index growing set of web pages offering CCed content.
>> > Also
>> > the creativecommons.org guys may be kind to add us in the
>> > http://search.creativecommons.org/ webpage to start having some
>> > traffic
>> > immediately.
>> >
>> > I would love to see some map of machines and/or workflow that grub now
>> > uses to dispatch workunits and store it's content. Firstly so we can
>> > have some documentation about the procedure and secondly if it's
>> > needed
>> > to move away from these, maybe some can donate machines or money.
>> >
>> > Regards,
>> > Giorgos
>> >
>> >
>> > _______________________________________________
>> > Grub-dev mailing list
>> > Grub-dev at wikia.com
>> > http://lists.wikia.com/mailman/listinfo/grub-dev
>> >
>>
>> _______________________________________________
>> Grub-dev mailing list
>> Grub-dev at wikia.com
>> http://lists.wikia.com/mailman/listinfo/grub-dev
>>
>
>
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikia.com/pipermail/grub-dev/attachments/20090401/4e14149f/attachment-0001.html 


More information about the Grub-dev mailing list