[Grub-dev] problem of understanding "crawl-corruption" - anyone care to explain?

Balinny balinny at gmail.com
Mon Feb 4 14:45:46 UTC 2008


ab wrote:
> i have posted examples of wrong and valid workunits (according to which 
> didnt and which did get accepted by the soap-server) in the ticket 
> inside comments #22 and #23
>
> <http://dev.grub.org/cgi-bin/trac.cgi/ticket/9#comment:22>
>   
At fail1 there're 248 urls instead of 250. Notice how the first two URLs 
at the workunit were skipped.

> <http://dev.grub.org/cgi-bin/trac.cgi/ticket/9#comment:23>
>   
fail3 and fail4 miss the first 2 urls. fail5 only the first.

> maybe you can also take a look what the heck is wrong with those 
> resulting files and/or workunits.
>
> thanks and cheers
Seems like errors on first results aren't written to the arc.
I also miss a final \n on all archives, but it isn't on correctly 
uploaded either so the server isn't complaining about it.
About the format used, you can see the spec at 
http://www.archive.org/web/researcher/ArcFileFormat.php
and the discussions on this list archives on how to interpret them.


More information about the Grub-dev mailing list