[atlas-l] Atlas - Internet Search Infrastructure

jer jeremie at jabber.org
Thu Jul 5 05:47:18 UTC 2007


This is a brief overview of a large vision: enabling search to become  
a part of the Internet's infrastructure.  Building on Atlas as an  
open protocol, search can become a fully distributed and  
interoperable world-wide community.  All of the participants can  
interact openly and in any role where they believe they can add value  
to the network.

A search engine can be constructed from many independent entities  
serving different roles instead of one monolithic system. These  
entities are exchanging aggregate information, or knowledge, and can  
decide with whom they want to work with. To design this working  
economy based on knowledge, there must be balance between these  
various entities.  Each actor must have incentive to act both for  
their own benefit and for the benefit of the whole, and enough  
information to make and validate those decisions.  Reputations and  
relationships are the essential fabric of Atlas, just as they are in  
a real-world free market.

There are three primary roles within Atlas:

     Factory - Responsible to the content.
     Collector - Responsible to the keyword.
     Broker - Responsible to the searcher.

Each of these actors must interact with the others to complete any  
search request. Any two roles could be performed by a single entity  
(whereas if all three are performed by one entity, the result would  
be a traditional, monolithic search engine).

A Factory is akin to a crawler in today's search engines.  An Atlas  
Factory must fetch and process the content as intelligently as  
possible, performing analysis (such as Natural Language Processing)  
and normalizing it into distinct units.  A Factory shares its highly  
refined and processed output with one or more Collectors based on who  
they believe is best utilizing it.

A Collector absorbs and indexes output from one or more Factories,  
with one primary goal: ranking.  An Atlas Collector must provide the  
most intelligent ranking and relationship analysis possible.  A  
Collector has to compete for the output of a Factory, as well as  
compete to provide the best ranking quality for Brokers.

A Broker must provide a searcher with the best possible results. It  
does so by combining diverse ranking results from Collectors and also  
by retrieving content from the original Factories.  This last step, a  
Broker interacting with a Factory, is critical to maintaining a  
balanced ecosystem.  All Factories must be aware of and approve how  
their results are being used and by whom.

Reputation and reward is bi-directional between all parties (Factory- 
Collector, Collector-Broker, and Broker-Factory).  Each entity may  
choose to interact on principle (free, Commons), attribution (results  
provided by), or commercially (as a paid service), the Atlas protocol  
is purely a facilitator and does not restrict how the relationships  
between any entities are formed.  In considering these motives for  
the various entities, it's likely that the free-based networks will  
tend to become more specialized, commercial ones will compete on  
quality, and attribution based networks will mature in both directions.

This simple yet powerful division of roles, responsibilities, and  
relationships will result in a distributed economic foundation for an  
Internet Search Infrastructure.  The wire protocol and further  
definition of the interactions between these entities is openly  
evolving, anyone interested is welcomed to join the discussions and  
see the initial proposals at http://lists.wikia.com/mailman/listinfo/ 
atlas-l over the coming weeks.

Thanks, looking forward to a radically different search ecosystem in  
the coming years :)

Jer



More information about the Atlas-l mailing list