[atlas-l] thread - the fundamental unit: knugget
jer
jeremie at jabber.org
Sat Jul 7 16:48:24 UTC 2007
Everything within Atlas revolves around one unit, the knugget, a
shorthand for "knowledge nugget" but still pronounced the same as
just nugget. This is the atom that everything builds from, a Factory
produces knuggets, a Collector ranks them, and a Broker processes them.
A knugget is informally thought of as a single search result, not
unlike what you often see in typical search results (title, link,
clipping), but this definition isn't very accurate as ultimately what
a Broker delivers as a single search result will likely be combining
a few knuggets.
The hardest part to understand here is that a knugget is a *human*
definition, not a technical one, and Atlas is ultimately serving
humans both on the incoming content and the outgoing results, so be
warned that there is going to be a lot of really judgmental nature
about knuggets, and that's OK.
The formal definition is: the smallest standalone unit of context
that most average people would recognize. What this really means is
a string of text that anyone could make some sense of, could
understand, outside of any other context about that text. Examples
of knuggets are a title + link, a sentence stating something, a row
from a table, an object + description, etc, all with a reference to
their source URL of course.
Given the loose nature of the definition, every Factory has a
tremendous amount of lee-way in how it wants to generate knuggets.
It's job is to take content and break it into the smallest and most
valuable units possible, to do the best job in understanding the
content and serving it. A Factory then publishes these knuggets to
whatever Collectors it has relationships with, who index the
individual knuggets.
Importantly, a Collector only indexes a reference to the original
knugget, and does not copy/store, thus when a Broker queries a
Collector it only gets back a list of references and rankings.
Brokers must also then interact with and retrieve the knuggets it's
interested in from the original Factory. This second fetch serves
primarily as the titles and clippings that a Broker shows in it's
results, but critically also as a validation stage, so that a
Collector hasn't been poisoned and the reputation of the Factory is
intact.
So, the pipeline looks like F -> C -> B -> F. The knugget is the
vehicle by which all data is moved within Atlas through this pipeline.
That's perhaps enough to chew on for now, I'll follow up in a few
days with some wire-format ideas for what knuggets could look like in
practice.
Jer
More information about the Atlas-l
mailing list