[Grub-dev] Wikis url extraction

Balinny balinny at gmail.com
Tue Jan 15 23:58:35 UTC 2008


 From some urls i got, you seem to have been extracting URLs from wiki 
sources.
Please note that any trailing ] should be removed and you should skip 
any urls
containing braces {{ won't be valid and thus doesn't need to be crawled 
(but the
urls generated via that template do, so the best way is using 
externallinks table).


BTW: Which page parser is used?



More information about the Grub-dev mailing list