[Grub-dev] Submitting URLs for the Grub Clients to fetch?

Jeremie Miller jeremie at jabber.org
Tue Sep 23 19:12:41 UTC 2008


Awesome!

I set up a place for anyone to upload sitemap xml files that I can  
generate workunits from and add into the current source pool (which  
gets recycled often so you will see duplicates yes):
	http://lists.wikia.com/pipermail/grub-dev/2008-May/000265.html

I need to manually run the sitemap->workunit converter, so just let me  
or the list know once you upload a bunch and I'll re-populate our main  
dispatcher pool.

There's been some discussion around making at least two source pools,  
one for "long term" stable/static pages, and one for dynamic  
constantly changing pages... until someone is doing analysis to  
determine which is which it hasn't really been practical to do so yet,  
but it sounds like that might be changing with your work :)

Jer

On Sep 23, 2008, at 10:16 AM, Chris McLennan wrote:

> Hello fellow Grubbers,
>
> Myself and some folks I work with have downloaded and processed a  
> large number of the output files from the Grub clients into a  
> database (making sure we remove dups and only insert the latest and  
> greatest html of a site).  We've also parsed the html of the sites  
> to extract links (external, internal, etc.) and have cross  
> referenced the host+domain of the link w/ some data from Amazon's  
> Alexa traffic rankings to come up with a list of URLs that we would  
> like to 'get', and then to continue to spider.  Is there a way we  
> can submit the URLs of the pages we would like to obtain to a  
> process at Grub.org so that the Grub clients would work to pull them  
> down?
>
> Please advise!
>
> FYI: We see a number of the same URLs being pulled down by our Grub  
> clients week to week, so maybe the pages we are requesting would be  
> of benefit to others in the Grub community as well!
>
> Best regards,
>
> Chris
> Wikia ID: piper984
>
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev



More information about the Grub-dev mailing list