Even if we find a starting point the proccess of discovering new URLs (which I think is one of the main features of a crawler) may not work very well.<br>One potential starting point is wikimedia commons.<br><br><div class="gmail_quote">
On Wed, Apr 1, 2009 at 2:31 PM, Jeremie Miller <span dir="ltr"><<a href="mailto:jeremie@jabber.org">jeremie@jabber.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I've not looked much yet, but is there any obvious list of URLs with<br>
(potential) CCed content we can start adding into the grub<br>
dispatcher? If there were enough, we could replace the entire<br>
dispatch list and immediately start archiving CCed only content :)<br>
<br>
Jer<br>
<div><div></div><div class="h5"><br>
On Apr 1, 2009, at 3:55 AM, Giorgos Logiotatidis wrote:<br>
<br>
> This is bad news to hear but the close down of wikia search because of<br>
> the economical crisis, does not mean that the project has no<br>
> potential.<br>
> Grub should survive and an another index, as all you say said,<br>
> should be<br>
> build.<br>
><br>
> I would prefer to focus of Creative Commons content, as there is some<br>
> challenge to index growing set of web pages offering CCed content.<br>
> Also<br>
> the <a href="http://creativecommons.org" target="_blank">creativecommons.org</a> guys may be kind to add us in the<br>
> <a href="http://search.creativecommons.org/" target="_blank">http://search.creativecommons.org/</a> webpage to start having some<br>
> traffic<br>
> immediately.<br>
><br>
> I would love to see some map of machines and/or workflow that grub now<br>
> uses to dispatch workunits and store it's content. Firstly so we can<br>
> have some documentation about the procedure and secondly if it's<br>
> needed<br>
> to move away from these, maybe some can donate machines or money.<br>
><br>
> Regards,<br>
> Giorgos<br>
><br>
><br>
> _______________________________________________<br>
> Grub-dev mailing list<br>
> <a href="mailto:Grub-dev@wikia.com">Grub-dev@wikia.com</a><br>
> <a href="http://lists.wikia.com/mailman/listinfo/grub-dev" target="_blank">http://lists.wikia.com/mailman/listinfo/grub-dev</a><br>
><br>
<br>
_______________________________________________<br>
Grub-dev mailing list<br>
<a href="mailto:Grub-dev@wikia.com">Grub-dev@wikia.com</a><br>
<a href="http://lists.wikia.com/mailman/listinfo/grub-dev" target="_blank">http://lists.wikia.com/mailman/listinfo/grub-dev</a><br>
</div></div></blockquote></div><br>