[atlas-l] thread - the fundamental unit: knugget

jer jeremie at jabber.org
Sat Jul 7 16:48:24 UTC 2007


Everything within Atlas revolves around one unit, the knugget, a  
shorthand for "knowledge nugget" but still pronounced the same as  
just nugget.  This is the atom that everything builds from, a Factory  
produces knuggets, a Collector ranks them, and a Broker processes them.

A knugget is informally thought of as a single search result, not  
unlike what you often see in typical search results (title, link,  
clipping), but this definition isn't very accurate as ultimately what  
a Broker delivers as a single search result will likely be combining  
a few knuggets.

The hardest part to understand here is that a knugget is a *human*  
definition, not a technical one, and Atlas is ultimately serving  
humans both on the incoming content and the outgoing results, so be  
warned that there is going to be a lot of really judgmental nature  
about knuggets, and that's OK.

The formal definition is: the smallest standalone unit of context  
that most average people would recognize.  What this really means is  
a string of text that anyone could make some sense of, could  
understand, outside of any other context about that text.  Examples  
of knuggets are a title + link, a sentence stating something, a row  
from a table, an object + description, etc, all with a reference to  
their source URL of course.

Given the loose nature of the definition, every Factory has a  
tremendous amount of lee-way in how it wants to generate knuggets.   
It's job is to take content and break it into the smallest and most  
valuable units possible, to do the best job in understanding the  
content and serving it.  A Factory then publishes these knuggets to  
whatever Collectors it has relationships with, who index the  
individual knuggets.

Importantly, a Collector only indexes a reference to the original  
knugget, and does not copy/store, thus when a Broker queries a  
Collector it only gets back a list of references and rankings.   
Brokers must also then interact with and retrieve the knuggets it's  
interested in from the original Factory.  This second fetch serves  
primarily as the titles and clippings that a Broker shows in it's  
results, but critically also as a validation stage, so that a  
Collector hasn't been poisoned and the reputation of the Factory is  
intact.

So, the pipeline looks like F -> C -> B -> F.  The knugget is the  
vehicle by which all data is moved within Atlas through this pipeline.

That's perhaps enough to chew on for now, I'll follow up in a few  
days with some wire-format ideas for what knuggets could look like in  
practice.

Jer


More information about the Atlas-l mailing list