[Grub-dev] Yet another invalid workunit entry

Jeremie Miller jeremie at jabber.org
Wed Jul 9 19:54:07 UTC 2008


It's not been touched, and since there's a master copy of the  
workunits handed out I can grep through them to track these down:

find master -type f -exec grep chiesadicristo {} \;
Host: firenze.chiesadicristo.org
Host: www.chiesadicristofe.org
Host: www.chiesadicristo.it
Host: www.chiesadicristo-padova.it
Host: firenze.chiesadicristo.org

Maybe it's something else messing it up?

Also, I checked into grubng svn the hack of a perl script that make  
these workunits just to make sure... with any luck it'll go away soon  
to be replaced by map-reduce jobs that work on the submitted ARCs and  
sitemap files :)

Jer

On Jul 9, 2008, at 12:31 PM, Balinny wrote:

> And more invalid ones:
>
>
> GET / HTTP/1.0
> Host: www.tice.de
> User-Agent: GrubNG 20080128
>
> GET /bunrei.html HTTP/1.0
> He.chiesadicristo.org
> User-Agent: GrubNG 20080128
>
> GET /en/index.php HTTP/1.0
> Host: carmencuevas.com
> User-Agent: GrubNG 20080128
>
> -----
>
> GET /topsite.php?id=36616 HTTP/1.0
> Host: click.listinus.de
> User-Agent: GrubNG 20080128
>
> GET /reference.htm HTTP/1.0
> HostGET /u2_hime/lotus/ HTTP/1.0
> Host: www.geocities.jp
> User-Agent: GrubNG 20080128
>
> GET /wwwboard.html HTTP/1.0
> Host: www.klotz-kamelien.de
> User-Agent: GrubNG 20080128
>
> ---
>
>
> GET /region6/index.htm HTTP/1.0
> Host: www.stcregion.org
> User-Agent: GrubNG 20080128
>
> GET /story/arts/national/2006/06/27/toronto-film-cannes.ht\nUser- 
> Agent:
> GrubNG 20080128
>
> GET /2000/ALLPOLITICS/stories/12/06/senateleaders.ap/ HTTP/1.0
> Host: cnn.com
> User-Agent: GrubNG 20080128
>
> In the last one there's a literal \n (ASCII 10) instead of CRLF (+the
> missing host).
>
> Has the workunit generator been changed recently?
>
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev
>



More information about the Grub-dev mailing list