<br><br><div><span class="gmail_quote">On 7/9/07, <b class="gmail_sendername">jer</b> <<a href="mailto:jeremie@jabber.org">jeremie@jabber.org</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> Ahh, so that's what I seem to have missed in the definition of a<br>> Collector - that it has structured content only.<br>> As the famous Emily Litella, on Saturday Night Live, used to say<br>> "Ohh!,
<br>> ....Ne-e-ver M-i-ind!"<br><br>The "definition" is only really happening through the discussion here<br>at the moment so no worries, it all helps :)<br><br>> So we don't expect to do queries that involve text matching when
<br>> querying a collector, right?<br><br>In a general sense of match the exact phrase "the cat and the dog ran<br>away" or the ilk of full-text matching, not really. I consider that<br>secondary if at all in scope for the general Collector.
</blockquote><div><br>So I think some of this begs further definition refinement then. We've got to assume the results served by a broker may be very, very smart - even if it's outside our ability right now, we've got to spec this this to allow for growth. So, a broker should be able to (theoretically at least) serve results to as complex queries as we can dream up (again - even it the technology doesn't allow it now, the *infrastructure* shouldn't stop us from doing it in the future). So then - if the collector collects (and ranks?) results, where do we draw the line between broker and collector? I think factory is probably pretty clean - a factory is basically a crawler and little else, right? But we should talk about how much intelligence the broker and collector may have.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> I still seem to be thinking of a collector as the analog of the<br>> "search
<br>> engine" as we know it nowand I need to reset my thinking on that.<br><br>I'm trying to separate all the functions including the various query<br>types first, before deciding that any of them can/should be re-
<br>combined. So, consider a Collector to be a specific keyword-<br> >knuggets ranked mapping, with various other fixed attributes (time,<br>host, location, etc) that can filter the results.</blockquote><div><br>Jer, I would like to suggest that we not just limit the collector to a pre-conceived data structure. Certainly there will be more common data structures, but let's not built that limitation into Atlas. The kind of alludes to your "SELECT document, rank FROM content WHERE keyword='searchword'" example. Perhaps you build in some pre-conceived fields (document, rank, time, host, location, id, ???) but the structure should be forward looking and expandable. The good thing about an SQL-like syntax is that it probably accomodates that, but I agree that we should not leap to conclusions about the best solution. Let's define the requirements first (like any good project, eh? Define the requirements then match potential solutions against the requirements).
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Jer<br><br></blockquote></div><br>Best Regards,<br>Aerik<br><br clear="all"><br>
-- <br><a href="http://www.wikidweb.com">http://www.wikidweb.com</a> - the Wiki Directory of the Web