[Search-l] Sorry to do this but its coming, yes a rant :-(

Mark (Markie) newsmarkie at googlemail.com
Mon Mar 31 20:54:03 UTC 2008


hmmm sent one reply to this, but forgot to comment :-p

so, comments below

mark

On Mon, Mar 31, 2008 at 2:04 AM, Dennis Kubes <kubes at apache.org> wrote:

> Hi Markie,
>
> First let me say that if anything has been missed, or promised and then
> not delivered, it was not intentional.


okay so maybe there not intentional, but any chance of them being sorted?
:-p


> Second, I would agree with you
> that while we have been working to make changes to improve the accuracy
> of the search results, we have not been doing a very good job of keeping
> the community informed about those or other changes and that is
> something we need to work on.
>
> For my part I will attempt to communicate more of what we are working on
>  in terms of the search engine internals, starting now.


excellent :-D many thanks


> Probably the
> biggest improvement we have seen in terms of relevancy is changing how
> inbound link text is index.
>
> Inbound link text the text of anchors pointing to a page.  We currently
> index that text along with a given page.  So for example if page x links
> to page y and the anchor text reads "hotels" that text will get put into
> the index under page y.  The problem we were having was we would index
> the first N number of links pointing to a page without regard for what
> were the best links.  That provided for some weird results when we
> launched, for instance google.com would come up in a search for dallas
> hotels because it had one inbound link that said "dallas" and another
> that said "hotels".  To fix this we started looking and inbound links
> according to the score of their parent (pointing from) page.  The idea
> behind this was that higher scoring pages would have better outbound
> links.  In our current index we first determine what the *best* links
> are by their parent pages score and then index the first N best links.
> And what we have seen as a result is a big increase in the relevancy of
> the search results.
>

excellent, as this has been one of our major problems, so im glad to hear
that work is being done to sort the problem


>
> Here is a list of the things I see that could help improve search
> relevancy going forward:
>
> - Being able to score elements of web pages.  For example determine if a
>  piece of text is a h1, h2, div, etc.  Currently our web pages parsers
> don't support that.


are these codes available anywhere in wikia's svn?


>
>
> - Better integration of the star system into the rankings and better
> ability for the community to tag pages as spam.  This is part of the KT
> stuff Jer has been working on.


:-D


>
>
> - Overall improvement in the search algorithm.  Currently the algorithm
> is based on nutch's OPIC implementation.  Long story short this
> algorithm is unstable after a few iterations because web page score keep
> increasing exponentially.  This is more of a Nutch problem and has
> already been discussed on the Nutch lists but essentially we need a new
> process for scoring and probably a new algorithm that is more
> pagerank-like and has some type of convergence.
>
> There are other items as well but I think these things would help show a
>  dramatic improvement in search quality.
>
> Last let me say that anybody should feel free to email me at any time.
> If something isn't being done fast enough or something seems to be
> getting left out.  Give me a nudge. :)
>

/me adds you to contacts :-p


>
> Dennis
>
>
> Mark (Markie) wrote:
> > re sending in case it was missed, from 4/5 days ago, maybe the people
> > copied in (wikia staff/founders) would be willing to give a small amount
> > of time to reply?!?
> >
> > mark
> >
> > ---------- Forwarded message ----------
> > From: *Mark (Markie)* <newsmarkie at googlemail.com
> > <mailto:newsmarkie at googlemail.com>>
> > Date: Wed, Mar 26, 2008 at 11:13 PM
> > Subject: Sorry to do this but its coming, yes a rant :-(
> > To: Mailing list for Search Wikia <search-l at wikia.com
> > <mailto:search-l at wikia.com>>, Search Wiki <searchwiki at wikia.com
> > <mailto:searchwiki at wikia.com>>, Jimmy Wales <jwales at wikia.com
> > <mailto:jwales at wikia.com>>, jer <jeremie at jabber.org
> > <mailto:jeremie at jabber.org>>, dennis at igfoo.com <mailto:dennis at igfoo.com>
> >
> >
> > Right, im afraid the time has come once again where i have been
> > wondering to my self again, and i feel that things need to be said, so
> > here they are.
> >
> > *Whats happening with the project.  AFAIK overall (and i know somethings
> > have happened) but *very* little seems to have happened since the
> > launch.  Now i know that things are probably happening with the team,
> > but any chance of actually telling the users about this, cos its not
> > looking good from here atm.
> >
> > Ive copied in the so called pillars of search
> >
> >    1. *Transparency* - riiiiiiight :-(
> >    2. *Community* - hmmm contribute to stale projects?
> >    3. *Quality* - well....
> >    4. *Privacy <http://search.wikia.com/wiki/search:Privacy>* - hmm yes
> >       that seems to have been done to an extent ( by the community mind)
> >
> >
> > Ive been on the project since dec 2006, and so have been waiting along
> > time for this to happen, so its not purely a case of i want everything
> > to happen NOW, i just want it to look like SOMETHING will happen SOON.
> >
> > *This brings me onto the next topic of where is the project going???
> > There has been practically no progress, and frankly i cant see much
> > being done from my point.  The launch has happened, many people were
> > interested, contributed but have now left, because NOTHING has happened.
> > so overall the net gain of launching the project?? bad press and a few
> > (relative to the web) minis.
> >
> > *Many things have been promised by various people, which havent
> > happened. Most specifically this has come from a certain member of
> > staff, one specifically, that has said that they will do many things,
> > but even the most basic of tasks seem to have not happened. so
> > Broken/missed promises. Well iirc (name here) said he would make sure
> > that the about pages etc were created, hmm...
> > (http://alpha.search.wikia.com/about.html in case you forgot where those
> > were).  This is a wikia project, any chance of getting ANY
> > involvement/input/co-ordination from the team who, ultimately, want us
> > to make them more successfull and a profit (if were being frank).
> >
> > Now i know i havent been that active recently on the wiki, but i have
> > been reading the mailing lists and talking in irc, but the main reason
> > for me not being active on the wiki, is mainly the fact that i just dont
> > have the motivation to do anything because of the above.  Frankly atm
> > its a stale project, but hopefully this rant (which i hate doing) will
> > mean that the project will hopefully become better.
> >
> > If i have offended anyone above then i am sorry, but i feel that certain
> > things need to be said right now, in order to make the project better,
> > which is my aim.
> >
> > Many thanks and look forward to the responses to this, especially from
> > wikia staff
> >
> > Regards
> >
> > mark
> >
> > (user:Markie)
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Wikia Search mailing list
> > http://alpha.search.wikia.com/
> > Change options or unsubscribe:
> http://lists.wikia.com/mailman/options/search-l
> _______________________________________________
> Wikia Search mailing list
> http://alpha.search.wikia.com/
> Change options or unsubscribe:
> http://lists.wikia.com/mailman/options/search-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikia.com/pipermail/search-l/attachments/20080331/9a33f285/attachment.html 


More information about the Search-l mailing list