[atlas-l] Collector Query (CQ), <strike>SQL</strike>

jer jeremie at jabber.org
Tue Sep 11 17:55:09 UTC 2007


> Looking good. I don't think SQL is quite such a mess as you suggest  
> but for
> this context, it's ludicrous overkill.

And it doesn't evolve appropriately for this use :)

> Some minor comments on the queries. Note that the date query requeries
> absolute date - this isn't quite "how old it is", that would be a
> relative date.
> Would relative dating be a better idea? Smaller fields, possibly less
> calculation?

I'm pretty sure most or all indexes are going to store it by date, so  
it's easiest to just specify it that way IMO.  It also allows any  
query to be static, not relative to the situation in which it was  
made, which is generally a good rule of thumb.

> URL query refers to "prefix" of URL. Does this simply mean any initial
> sub-sequence of the URL string? If so need to clarify whether such
> queries should include the method (http://) part or not? Or is the  
> collector
> expected to understand this? [Similar comments apply to host port  
> number
> etc.,]

Good question, suggestions?

> Clearly collectors do the ranking and different collectors are free to
> implement different rankings (persumably). Some ranking algorithms
> may, quite properly, be coarse grained and only give rank as an
> integer in range 0-10 (for example). How would this interact with
> the "skip" field as there may be many "knuggets" with the same
> rank?

Skip is simply the number of knuggets to skip, not anything relating  
to the rank.  You are correct though that different collectors can/ 
should rank things differently, and some could be very coarse.

> Had a quick look at "fuz", looks sensible to me. Only observation
> at this stage is that if lines MUST end with "\n" only this could  
> cause
> problems with machines that have the misfortune to be running
> anything other than Unix/Linux. Think about the network
> standard (as in HTTP) CR/LF.

*nod*, I forgot about this actually during all the various re- 
writes... the FuzView API is supposed to automatically remove any  
trailing \r when accessing it as a word/noun.  This way, CR/LF is  
just fine when people are manually typing things, and if binary/raw  
is being used it will expect things to be exact (as people won't be  
typing them).

Jer



More information about the Atlas-l mailing list