[Grub-dev] do we even really need a native client
Yousef Ourabi
yourabi at zero-analog.com
Fri Jan 11 07:42:09 UTC 2008
I've got to re-read all the stuff about the arc format and incorporate it
into the patch. Expect v3 sometime late tomorrow.
II'm also going to have to re-read the emails Jer just sent to the wikia
mailing list to fully digest, but I really look forward to learning more
about the Nutch setup wikia is using to gain the full "perspective" on the
back-end aspects of wikia search.
Per the generated work-units -- Jer: how are you generating them now? I'm
assuming this isn't the current "server" but some modified version you have
running? It would be great to learn a bit about your next steps around that.
More tomorrow.
Thanks.
Yousef
On 1/10/08, jer <jeremie at jabber.org> wrote:
>
> >> So, I think you're right and it's missing a \n, but maybe it's
> >> missing TWO of them?
> >>
> >> doc == <nl><URL-record><nl><network_doc>
> >>
> >> URL-record-v1 == <url><sp>
> >> <ip-address><sp>
> >> <archive-date><sp>
> >> <content-type><sp>
> >> <length><nl>
> >>
> >> So, there should be a \n before each URL record, and two of them
> >> after it, one defined as the terminator in URL-record-v1, and one
> >> defined as the separator between URL-record and network_doc. Is that
> >> correct?
> >>
> >> print $arc "\nhttp://$host$path $ip 19691231175959 $ctype",length
> >> ($body),"\n\n$body";
> >>
> >> Is that correct? Can anyone else verify?
> >>
> > So it seems.
>
> Anyone else can verify this is correct? \n URL-stuff \n \n CONTENT ?
>
> >> The workunits can (someday) start to define HTTP/1.1 with a
> >> Connection: close, and an Accept-encoding: gzip. A client supporting
> >> the current workunit format shouldn't care or know any different,
> >> right?
> >>
> > The client's bandwidth might care ;-)
>
> Yep, easy enough to add these headers in the workunits as well :)
>
> >> Doh! My bad, I can fix it when I generate some more workunits :)
> > Aren't they generated on-the-fly?
>
> Heh, nope, there's no DB in this back-end so it's much faster and
> easier to pre-generate batches of these from flat lists right now.
>
> Jer
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikia.com/pipermail/grub-dev/attachments/20080110/6ec737a6/attachment.html
More information about the Grub-dev
mailing list