[Search-l] call to action....

peter burden peter.burden at gmail.com
Wed Jun 13 09:14:48 UTC 2007


Sami M wrote:
>
> I referred to the paper to clarify the scope. Once you start working 
> on it, you realize that it is very compact publication for the amount 
> of work that was done. With exception of PageRank, most of what they 
> talk about is public domain knowledge covered in CS text books (for 
> example *Managing Gigabytes* by Witten, Moffat, Bell). PageRank in its 
> original form is not as useful anymore (thanks to the SEO's) and 
> Google doesn't have a patent on link graph analysis. That was the 
> basic premise of my work – design & implement a system + algorithm 
> that is more effective and SEO resistant than PageRank, HITS etc. I 
> believe Google results have gone down in quality post 2003 or so. Most 
> of this is due to the pollution of the ranking metrics by SEO's. The 
> opportunity to improve it by a great deal is still out there. However, 
> the problem is a lot harder to solve as well. I have some ideas in 
> that area that I'm in process of evaluating. Given the right 
> idea/technology & team a big shift is possible in search area (unlike 
> DBMS or OS area for example). And it is more likely to come from 
> someone reading this list then Microsoft or Yahoos of the world. That 
> is my opinion. I saw it happen before me in 2000 when altavista 
> enjoyed over 80% + marketshare. Anything is possible...
>
> I apologize if this wasn't the right forum to discuss this. I am just 
> looking for feedback. I am a big supporter of open source software 
> without which this would've been a lot harder project. I am at a point 
> where I need help in moving forward. This is a task for a team & a 
> great one at that. There are several directions I can go. Here is what 
> I am considering at the moment:
>
> - Grow the team & build a web scale search engine running on a cluster 
> of few hundred nodes competing with the big guys. That would require 
> me going down VC funding path though.
>
> - My original plan was to build this into a niche search engine 
> tweaked and marketed for superior subdomain performance (for example 
> .edu, sports etc.). The investments required are a lot smaller & it 
> can always be scaled up based on success.
>
> - Open-source it! I just got this idea & that is why I thought of 
> posting here. However, since I didn't start off like that… what 
> incentive do I have for it now? If this was a product I could go the 
> Jboss/mysql route.
>
You could build a commercial engine on an OS platform. The 
distinguishing features (unique selling
propositions) would be (1) the corpus of data held by the engine (2) the 
ranking algorithm.

Both of these could easily be ring-fenced from any OS 
platform/infrastructure. MySQL is a good
comparison, the code is OS but neither the data people store in MySQL 
DBs nor the SQL they
use to query it are OS.

There are obvious synergistic advantages to open source infrastructure 
but remember most open
source development goes where it's developers want it to go and if your 
commercial application
needs some feature urgently you may have to fund its development 
yourself and then have to
give it away as a consequence of GPL licensing. [Again MySQL illustrates 
the point with the
development of clustering.]

BTW I'd agree with earlier remarks about the quality of Google results 
not being what it used to be
pre-2003. This may be due to a side effect of the page ranking 
algorithm. :- people searching for
information will find high ranking sites and will then link to them to 
support whatever new pages
they're creating - this just reinforces the rank. New sites can find it 
difficult to get known. A similar
effect is observable in the world of academic paper authoring where 
"key" papers continue to
attract citations whereas new researchers find it hard to get those 
citations that are so vital to
academic career progression. I've some ideas on how to handle this that 
are far from fully
baked but do involve a significant measure of community input coupled 
with some fairly heavy
processing that could be distributed non-real-time.
>
> All ideas are welcome. I can do some demos & go into technical details 
> of my implementation if needed. Thanks for all the feedback.
>
> Sami
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Search-l mailing list
> Search-l at wikia.com
> http://lists.wikia.com/mailman/listinfo/search-l
> Change options or unsubscribe: http://lists.wikia.com/mailman/options/search-l





More information about the Search-l mailing list