[Grub-dev] do we even really need a native client

Yousef Ourabi yourabi at zero-analog.com
Fri Jan 11 07:42:09 UTC 2008


I've got to re-read all the stuff about the arc format and incorporate it
into the patch. Expect v3 sometime late tomorrow.

II'm also going to have to re-read the emails Jer just sent to the wikia
mailing list to fully digest, but I really look forward to learning more
about the Nutch setup wikia is using to gain the full "perspective" on the
back-end aspects of wikia search.

Per the generated work-units -- Jer: how are you generating them now? I'm
assuming this isn't the current "server" but some modified version you have
running? It would be great to learn a bit about your next steps around that.

More tomorrow.

Thanks.
Yousef


On 1/10/08, jer <jeremie at jabber.org> wrote:
>
> >> So, I think you're right and it's missing a \n, but maybe it's
> >> missing TWO of them?
> >>
> >> doc == <nl><URL-record><nl><network_doc>
> >>
> >> URL-record-v1 == <url><sp>
> >> <ip-address><sp>
> >> <archive-date><sp>
> >> <content-type><sp>
> >> <length><nl>
> >>
> >> So, there should be a \n before each URL record, and two of them
> >> after it, one defined as the terminator in URL-record-v1, and one
> >> defined as the separator between URL-record and network_doc.  Is that
> >> correct?
> >>
> >> print $arc "\nhttp://$host$path $ip 19691231175959 $ctype",length
> >> ($body),"\n\n$body";
> >>
> >> Is that correct?  Can anyone else verify?
> >>
> > So it seems.
>
> Anyone else can verify this is correct?  \n URL-stuff \n \n CONTENT ?
>
> >> The workunits can (someday) start to define HTTP/1.1 with a
> >> Connection: close, and an Accept-encoding: gzip.  A client supporting
> >> the current workunit format shouldn't care or know any different,
> >> right?
> >>
> > The client's bandwidth might care ;-)
>
> Yep, easy enough to add these headers in the workunits as well :)
>
> >> Doh!  My bad, I can fix it when I generate some more workunits :)
> > Aren't they generated on-the-fly?
>
> Heh, nope, there's no DB in this back-end so it's much faster and
> easier to pre-generate batches of these from flat lists right now.
>
> Jer
> _______________________________________________
> Grub-dev mailing list
> Grub-dev at wikia.com
> http://lists.wikia.com/mailman/listinfo/grub-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikia.com/pipermail/grub-dev/attachments/20080110/6ec737a6/attachment.html 


More information about the Grub-dev mailing list