[Grub-dev] ARC format (was Re: do we even really need a native client)

jer jeremie at jabber.org
Tue Jan 15 01:23:44 UTC 2008


I've checked in a few tweaks and the current version in SVN right now  
appears to be producing legit ARC files.  I also checked in a script  
using the BnfArcTools module that parses the produced ARCs.  I tried  
to check against libarc 0.2 but it requires gzip'd members, whereas  
babygrub is producing one big gzip file.

So, I think this is authoritative:

FILE IP TIME CTYPE LENGTH\nHEADER
\nURL IP TIME CTYPE LENGTH\nBODY
\nURL IP TIME CTYPE LENGTH\nBODY
\nURL IP TIME CTYPE LENGTH\nBODY

If a grub client wants to gzip each "member" individually then that's  
perfectly fine as well, but not required.

Jer

On Jan 13, 2008, at 10:00 PM, Yousef Ourabi wrote:

> I did a simple crawl of digg, slashdot...etc with Heritrix 1.12,  
> here are some "official" ARC files.
>
> These links will be valid for 48 hours, everyone is welcome to  
> download so they can verify.
>
> https://s3.amazonaws.com/06B761M1QFYBMGSM2602.FreeBSD/ 
> IAH-20080114022148-00000-titus.arc.gz? 
> AWSAccessKeyId=06B761M1QFYBMGSM2602&Expires=1200455885&Signature=M9% 
> 2FYNeQRkgDhLDNBUWosghAP46M%3D
> https://s3.amazonaws.com/06B761M1QFYBMGSM2602.FreeBSD/ 
> IAH-20080114022148-00001-titus.arc.gz? 
> AWSAccessKeyId=06B761M1QFYBMGSM2602&Expires=1200455920&Signature=J2XcW 
> 7IgEFY6HqWS6FEJZ9KRlMg%3D
>
> I haven't cracked them open yet to verify the discussion of the  
> format yet, so someone else may beat me to it, but when I get  
> around to it I'll post to this thread.
>
> Thanks,
> Yousef
>
> On 1/13/08, Yousef Ourabi <yourabi at zero-analog.com> wrote: I'll do  
> a quick crawl with heretrix and upload the arc somewhere we can all  
> download it.
>
> That will hopefully answer this question once and for all :-)
>
>
>
>
> On 1/13/08, Balinny <balinny at gmail.com> wrote: jer wrote:
> > I *believe* after quite a while testing here, that the correct thing
> > to do is:
> >
> >       print $arc "http://$host$path $ip 19691231175959 $ctype  
> ",length
> > ($body),"\n$body\n";
> >
> > That each URL record needs a terminating \n after the body.
> >
> > Can anyone else can verify this?
> >
> > Thanks,
> >
> > Jer
> I'm not so sure. They have a \n after the body just because the next
> record has a \n before.
> So no need to \n at the end, and after the version block there  
> would be
> three LF!
> Does archive.org provide any arc file for download?
>
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev
>
>
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev



More information about the Grub-dev mailing list