[Grub-dev] problem of understanding "crawl-corruption" - anyone care to explain?
Balinny
balinny at gmail.com
Mon Feb 4 14:45:46 UTC 2008
ab wrote:
> i have posted examples of wrong and valid workunits (according to which
> didnt and which did get accepted by the soap-server) in the ticket
> inside comments #22 and #23
>
> <http://dev.grub.org/cgi-bin/trac.cgi/ticket/9#comment:22>
>
At fail1 there're 248 urls instead of 250. Notice how the first two URLs
at the workunit were skipped.
> <http://dev.grub.org/cgi-bin/trac.cgi/ticket/9#comment:23>
>
fail3 and fail4 miss the first 2 urls. fail5 only the first.
> maybe you can also take a look what the heck is wrong with those
> resulting files and/or workunits.
>
> thanks and cheers
Seems like errors on first results aren't written to the arc.
I also miss a final \n on all archives, but it isn't on correctly
uploaded either so the server isn't complaining about it.
About the format used, you can see the spec at
http://www.archive.org/web/researcher/ArcFileFormat.php
and the discussions on this list archives on how to interpret them.
More information about the Grub-dev
mailing list