[atlas-l] Collector Query (CQ), <strike>SQL</strike>
peter burden
peter.burden at gmail.com
Sun Sep 2 15:02:53 UTC 2007
jer wrote:
> I'm a dork, I totally have to backpedal on suggesting SQL as a basis
> for the CQ :)
>
> Since I began very prelim prototyping of the various systems, I
> quickly realized that SQL is sorely deficient for doing exactly the
> type of queries that a Collector has to handle. My initial impetus
> towards SQL was based on my desire to keep the understanding of a
> Collector as a very simple table-like system, keywords, knuggets, and
> ranking.
>
Looking good. I don't think SQL is quite such a mess as you suggest but for
this context, it's ludicrous overkill.
Some minor comments on the queries. Note that the date query requeries
absolute date - this isn't quite "how old it is", that would be a
relative date.
Would relative dating be a better idea? Smaller fields, possibly less
calculation?
URL query refers to "prefix" of URL. Does this simply mean any initial
sub-sequence of the URL string? If so need to clarify whether such
queries should include the method (http://) part or not? Or is the collector
expected to understand this? [Similar comments apply to host port number
etc.,]
Clearly collectors do the ranking and different collectors are free to
implement different rankings (persumably). Some ranking algorithms
may, quite properly, be coarse grained and only give rank as an
integer in range 0-10 (for example). How would this interact with
the "skip" field as there may be many "knuggets" with the same
rank?
Had a quick look at "fuz", looks sensible to me. Only observation
at this stage is that if lines MUST end with "\n" only this could cause
problems with machines that have the misfortune to be running
anything other than Unix/Linux. Think about the network
standard (as in HTTP) CR/LF.
>
More information about the Atlas-l
mailing list