hmmm sent one reply to this, but forgot to comment :-p<br><br>so, comments below <br><br>mark<br><br><div class="gmail_quote">On Mon, Mar 31, 2008 at 2:04 AM, Dennis Kubes <<a href="mailto:kubes@apache.org">kubes@apache.org</a>> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hi Markie,<br>
<br>
First let me say that if anything has been missed, or promised and then<br>
not delivered, it was not intentional. </blockquote><div><br>okay so maybe there not intentional, but any chance of them being sorted? :-p<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Second, I would agree with you<br>
that while we have been working to make changes to improve the accuracy<br>
of the search results, we have not been doing a very good job of keeping<br>
the community informed about those or other changes and that is<br>
something we need to work on.<br>
<br>
For my part I will attempt to communicate more of what we are working on<br>
in terms of the search engine internals, starting now. </blockquote><div><br>excellent :-D many thanks<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Probably the<br>
biggest improvement we have seen in terms of relevancy is changing how<br>
inbound link text is index.<br>
<br>
Inbound link text the text of anchors pointing to a page. We currently<br>
index that text along with a given page. So for example if page x links<br>
to page y and the anchor text reads "hotels" that text will get put into<br>
the index under page y. The problem we were having was we would index<br>
the first N number of links pointing to a page without regard for what<br>
were the best links. That provided for some weird results when we<br>
launched, for instance <a href="http://google.com" target="_blank">google.com</a> would come up in a search for dallas<br>
hotels because it had one inbound link that said "dallas" and another<br>
that said "hotels". To fix this we started looking and inbound links<br>
according to the score of their parent (pointing from) page. The idea<br>
behind this was that higher scoring pages would have better outbound<br>
links. In our current index we first determine what the *best* links<br>
are by their parent pages score and then index the first N best links.<br>
And what we have seen as a result is a big increase in the relevancy of<br>
the search results.<br>
</blockquote><div><br>excellent, as this has been one of our major problems, so im glad to hear that work is being done to sort the problem<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
Here is a list of the things I see that could help improve search<br>
relevancy going forward:<br>
<br>
- Being able to score elements of web pages. For example determine if a<br>
piece of text is a h1, h2, div, etc. Currently our web pages parsers<br>
don't support that.</blockquote><div><br>are these codes available anywhere in wikia's svn?<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
<br>
- Better integration of the star system into the rankings and better<br>
ability for the community to tag pages as spam. This is part of the KT<br>
stuff Jer has been working on.</blockquote><div><br>:-D<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
<br>
- Overall improvement in the search algorithm. Currently the algorithm<br>
is based on nutch's OPIC implementation. Long story short this<br>
algorithm is unstable after a few iterations because web page score keep<br>
increasing exponentially. This is more of a Nutch problem and has<br>
already been discussed on the Nutch lists but essentially we need a new<br>
process for scoring and probably a new algorithm that is more<br>
pagerank-like and has some type of convergence.<br>
<br>
There are other items as well but I think these things would help show a<br>
dramatic improvement in search quality.<br>
<br>
Last let me say that anybody should feel free to email me at any time.<br>
If something isn't being done fast enough or something seems to be<br>
getting left out. Give me a nudge. :)<br>
</blockquote><div><br>/me adds you to contacts :-p<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
Dennis<br>
<div class="Ih2E3d"><br>
<br>
Mark (Markie) wrote:<br>
> re sending in case it was missed, from 4/5 days ago, maybe the people<br>
> copied in (wikia staff/founders) would be willing to give a small amount<br>
> of time to reply?!?<br>
><br>
> mark<br>
><br>
> ---------- Forwarded message ----------<br>
> From: *Mark (Markie)* <<a href="mailto:newsmarkie@googlemail.com">newsmarkie@googlemail.com</a><br>
</div><div class="Ih2E3d">> <mailto:<a href="mailto:newsmarkie@googlemail.com">newsmarkie@googlemail.com</a>>><br>
> Date: Wed, Mar 26, 2008 at 11:13 PM<br>
> Subject: Sorry to do this but its coming, yes a rant :-(<br>
> To: Mailing list for Search Wikia <<a href="mailto:search-l@wikia.com">search-l@wikia.com</a><br>
</div><div class="Ih2E3d">> <mailto:<a href="mailto:search-l@wikia.com">search-l@wikia.com</a>>>, Search Wiki <<a href="mailto:searchwiki@wikia.com">searchwiki@wikia.com</a><br>
> <mailto:<a href="mailto:searchwiki@wikia.com">searchwiki@wikia.com</a>>>, Jimmy Wales <<a href="mailto:jwales@wikia.com">jwales@wikia.com</a><br>
> <mailto:<a href="mailto:jwales@wikia.com">jwales@wikia.com</a>>>, jer <<a href="mailto:jeremie@jabber.org">jeremie@jabber.org</a><br>
> <mailto:<a href="mailto:jeremie@jabber.org">jeremie@jabber.org</a>>>, <a href="mailto:dennis@igfoo.com">dennis@igfoo.com</a> <mailto:<a href="mailto:dennis@igfoo.com">dennis@igfoo.com</a>><br>
><br>
><br>
> Right, im afraid the time has come once again where i have been<br>
> wondering to my self again, and i feel that things need to be said, so<br>
> here they are.<br>
><br>
> *Whats happening with the project. AFAIK overall (and i know somethings<br>
> have happened) but *very* little seems to have happened since the<br>
> launch. Now i know that things are probably happening with the team,<br>
> but any chance of actually telling the users about this, cos its not<br>
> looking good from here atm.<br>
><br>
> Ive copied in the so called pillars of search<br>
><br>
</div><div class="Ih2E3d">> 1. *Transparency* - riiiiiiight :-(<br>
> 2. *Community* - hmmm contribute to stale projects?<br>
> 3. *Quality* - well....<br>
</div>> 4. *Privacy <<a href="http://search.wikia.com/wiki/search:Privacy" target="_blank">http://search.wikia.com/wiki/search:Privacy</a>>* - hmm yes<br>
<div><div></div><div class="Wj3C7c">> that seems to have been done to an extent ( by the community mind)<br>
><br>
><br>
> Ive been on the project since dec 2006, and so have been waiting along<br>
> time for this to happen, so its not purely a case of i want everything<br>
> to happen NOW, i just want it to look like SOMETHING will happen SOON.<br>
><br>
> *This brings me onto the next topic of where is the project going???<br>
> There has been practically no progress, and frankly i cant see much<br>
> being done from my point. The launch has happened, many people were<br>
> interested, contributed but have now left, because NOTHING has happened.<br>
> so overall the net gain of launching the project?? bad press and a few<br>
> (relative to the web) minis.<br>
><br>
> *Many things have been promised by various people, which havent<br>
> happened. Most specifically this has come from a certain member of<br>
> staff, one specifically, that has said that they will do many things,<br>
> but even the most basic of tasks seem to have not happened. so<br>
> Broken/missed promises. Well iirc (name here) said he would make sure<br>
> that the about pages etc were created, hmm...<br>
> (<a href="http://alpha.search.wikia.com/about.html" target="_blank">http://alpha.search.wikia.com/about.html</a> in case you forgot where those<br>
> were). This is a wikia project, any chance of getting ANY<br>
> involvement/input/co-ordination from the team who, ultimately, want us<br>
> to make them more successfull and a profit (if were being frank).<br>
><br>
> Now i know i havent been that active recently on the wiki, but i have<br>
> been reading the mailing lists and talking in irc, but the main reason<br>
> for me not being active on the wiki, is mainly the fact that i just dont<br>
> have the motivation to do anything because of the above. Frankly atm<br>
> its a stale project, but hopefully this rant (which i hate doing) will<br>
> mean that the project will hopefully become better.<br>
><br>
> If i have offended anyone above then i am sorry, but i feel that certain<br>
> things need to be said right now, in order to make the project better,<br>
> which is my aim.<br>
><br>
> Many thanks and look forward to the responses to this, especially from<br>
> wikia staff<br>
><br>
> Regards<br>
><br>
> mark<br>
><br>
> (user:Markie)<br>
><br>
><br>
</div></div>> ------------------------------------------------------------------------<br>
><br>
> _______________________________________________<br>
> Wikia Search mailing list<br>
<div class="Ih2E3d">> <a href="http://alpha.search.wikia.com/" target="_blank">http://alpha.search.wikia.com/</a><br>
</div><div class="Ih2E3d">> Change options or unsubscribe: <a href="http://lists.wikia.com/mailman/options/search-l" target="_blank">http://lists.wikia.com/mailman/options/search-l</a><br>
</div>_______________________________________________<br>
Wikia Search mailing list<br>
<div class="Ih2E3d"><a href="http://alpha.search.wikia.com/" target="_blank">http://alpha.search.wikia.com/</a><br>
</div><div><div></div><div class="Wj3C7c">Change options or unsubscribe: <a href="http://lists.wikia.com/mailman/options/search-l" target="_blank">http://lists.wikia.com/mailman/options/search-l</a><br>
</div></div></blockquote></div><br>