From jmcc at hackwatch.com Tue Mar 11 18:08:23 2008 From: jmcc at hackwatch.com (John McCormac) Date: Tue, 11 Mar 2008 18:08:23 +0000 Subject: [Search-l] LA Times On Wikia / Search Wikia Message-ID: <47D6CA97.9020101@hackwatch.com> The LA Times ran an article on Wikia and searchwikia. http://www.latimes.com/features/magazine/la-tm-wikia.03march2,1,4786701.story It namechecks a few people and even mentions Ayn Rand. Though I think that the whole concept of searchwikia and wikis in general would be a bit altruistically socialist in nature. The article is quite enlightening. The point made by Charlene Li of Forrester Research that ?Wikipedia worked really well because there really wasn?t anything else, but Wikia Search is up against very tough competitors who are very, very good.? is a very important one. And it is one that has cropped up repeatedly on this list. In the search business, searchengines have to have a unique selling proposition. Altavista had it with the size of its index. Google had it with relevance and page rank. Searchwikia has its social network. The article leaves the question of searchwikia's survival unanswered. Searchwikia might have an impact on the search business but opinions on that are split. One question not covered by the article is whether searchwikia is just a social network based on the Wikipedia model with an underlying search facility or a real search engine based on providing accurate and relevant results with a social networking overlay. The answer to that question may decide the fate of searchwikia. If it is the former, then searchwikia is, perhaps, a more up-to-date version of Dmoz. Ask.com had, I think, about 4.7% of the US search market in December 2007. It has recently given up the search angle. Microsoft is trying to take over Yahoo. The market is still in a state of flux and there are opportunities for well thought out searchengine ventures. If searchwikia could even get 1% of this search volume it would be doing well. A viable search engine has to have both relevant results and users. The Catch 22 is that without the relevant results it will not attract users. And that may be where the social networking element comes in. John Palfrey of the Berkman Center for Internet & Society at Harvard Law School, as quoted in the LA Times article, seems to be optimistic in that he thinks that Jimbo is "leveraging many of the same things that made Wikipedia a global force" and that "Wikia can have a huge impact on search engines over time.". A good search engine is like a telephone directory in that it provides the user with what they are searching for with the minimum of fuss. Searchwikia, with its mini-articles is more like a tourist guide than a telephone directory. A classical search engine seems to be aimed at a user who knows what they want to find. Searchwikia seems to be aimed at a user who doesn't know what they want and is not sure if they even want to find it - hence the mini-articles which interrupt the search process. The deceptive simplicity of the search results interface of the major search engines is one that was arrived at through a process of evolution. The search engines that didn't evolve didn't survive. Search engine development is a quest to put knowledge in context. It is the process of turning information into knowledge. Wikipedians may be driven by a need to explain whereas search engine developers may be driven by a need to understand. This is the fundamental way, I think, in which search engine developers differ in outlook from many Wikipedia contributors. Extending the idea, searchwikia is an attempt to impose a social structure on the information of the web. Search engine development is based, to a large part, on the idea that algorithms can be used to turn the information of the web into knowledge. If you can understand, you can measure. If you can measure you can create an algorithm. If you can create an algorithm, you can automate the search process. This philosophy seems diametrically opposed to the searchwikia model. However searchwikia will have a hard battle to fight against Google and a possible merged Yahoo/Microsoft. They are not going to willingly give up marketshare to searchwikia or any other venture. Regards...jmcc -- ****************************************************** John McCormac * e-mail: jmcc at whoisireland.com MC2 * voice: +353-51-873640 22 Viewmount * web: http://www.whoisireland.com/ Waterford * blog: http://blog.whoisireland.com Ireland * Irish Domain Stats & Market Research ****************************************************** From newsmarkie at googlemail.com Fri Mar 14 18:14:19 2008 From: newsmarkie at googlemail.com (Mark (Markie)) Date: Fri, 14 Mar 2008 18:14:19 +0000 Subject: [Search-l] Yahoo and semantics Message-ID: Not sure if this has already been posted, but an interesting story for you all Yahoo makes semantic search shift -> http://news.bbc.co.uk/1/hi/technology/7296056.stm regards mark {User:Markie) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20080314/671048b5/attachment.html From jmcc at hackwatch.com Mon Mar 17 09:25:24 2008 From: jmcc at hackwatch.com (John McCormac) Date: Mon, 17 Mar 2008 09:25:24 +0000 Subject: [Search-l] An Interesting Social Search Application Message-ID: <47DE3904.50507@hackwatch.com> A very interesting social search application: http://www.technologyreview.com/Infotech/20405/?a=f Happy St Patrick's Day. Regards...jmcc -- ****************************************************** John McCormac * e-mail: jmcc at whoisireland.com MC2 * voice: +353-51-873640 22 Viewmount * web: http://www.whoisireland.com/ Waterford * blog: http://blog.whoisireland.com Ireland * Irish Domain Stats & Market Research ****************************************************** From newsmarkie at googlemail.com Wed Mar 26 22:13:24 2008 From: newsmarkie at googlemail.com (Mark (Markie)) Date: Wed, 26 Mar 2008 22:13:24 +0000 Subject: [Search-l] Sorry to do this but its coming, yes a rant :-( Message-ID: Right, im afraid the time has come once again where i have been wondering to my self again, and i feel that things need to be said, so here they are. *Whats happening with the project. AFAIK overall (and i know somethings have happened) but *very* little seems to have happened since the launch. Now i know that things are probably happening with the team, but any chance of actually telling the users about this, cos its not looking good from here atm. Ive copied in the so called pillars of search 1. *Transparency* - riiiiiiight :-( 2. *Community* - hmmm contribute to stale projects? 3. *Quality* - well.... 4. *Privacy * - hmm yes that seems to have been done to an extent ( by the community mind) Ive been on the project since dec 2006, and so have been waiting along time for this to happen, so its not purely a case of i want everything to happen NOW, i just want it to look like SOMETHING will happen SOON. *This brings me onto the next topic of where is the project going??? There has been practically no progress, and frankly i cant see much being done from my point. The launch has happened, many people were interested, contributed but have now left, because NOTHING has happened. so overall the net gain of launching the project?? bad press and a few (relative to the web) minis. *Many things have been promised by various people, which havent happened. Most specifically this has come from a certain member of staff, one specifically, that has said that they will do many things, but even the most basic of tasks seem to have not happened. so Broken/missed promises. Well iirc (name here) said he would make sure that the about pages etc were created, hmm... (http://alpha.search.wikia.com/about.html in case you forgot where those were). This is a wikia project, any chance of getting ANY involvement/input/co-ordination from the team who, ultimately, want us to make them more successfull and a profit (if were being frank). Now i know i havent been that active recently on the wiki, but i have been reading the mailing lists and talking in irc, but the main reason for me not being active on the wiki, is mainly the fact that i just dont have the motivation to do anything because of the above. Frankly atm its a stale project, but hopefully this rant (which i hate doing) will mean that the project will hopefully become better. If i have offended anyone above then i am sorry, but i feel that certain things need to be said right now, in order to make the project better, which is my aim. Many thanks and look forward to the responses to this, especially from wikia staff Regards mark (user:Markie) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20080326/90d8d721/attachment.html From jeremie at jabber.org Thu Mar 27 01:49:55 2008 From: jeremie at jabber.org (jer) Date: Wed, 26 Mar 2008 20:49:55 -0500 Subject: [Search-l] Sorry to do this but its coming, yes a rant :-( In-Reply-To: References: Message-ID: <01F26483-D8E8-475E-B20A-1DB658480B9A@jabber.org> Ask my kids would say, "burrrrrn!" ;-) I feel and share your frustration, I know at least for my part I've done a poor job keeping going a steady stream of updates on the things I've been playing with. Of course, I've never personally been good at regularity, tend to step back and push things out in spurts... all I can do though is try to catch up for the last month or so of various chaotic activities. First, Grub is partially stalled right now as we began in Feb the migration to the NG version, who's server side is back-ended by hbase, a very alpha open source bigtable clone written atop hadoop. That process has been painfully sputtery, a cluster of ~7 servers is maxing out at a few mil docs, not nearly good enough to push GrubNG to grow significantly. Just last week though I think I came up with a way to partially work around the challenges of using hbase before it's ready, so with luck in the next week or so we can get that distributed crawler growing again :) I mentioned Grub/hbase first as while working on it I spent the time to really think about the bigtable model with hbase and how we can use it, I even went so far as to design and build a system I called KT (for Keyword Tuple, it's in the re.search svn repo), that should be able to scale up while supporting a huge variety of really exciting features. I even prototyped an implementation of both the server and a search result interface with a bunch of features which those that I IM with regularly have seen. I've hesitated to show it off to a wider audience as the prototype was both horribly insecure and unusable unless someone showed you first. There's a light at the end of the tunnel though, I've attached a pdf of a ROUGH DRAFT that I hope we can actually let everyone start playing with in as little as a week. Here's a quick list of the various features that KT will be enabling in our search results: - add new result - edit any result's title/summary (and see revisions, revert) - highlight any single result as the best one - star ratings (already exist) - delete/trash any result (leaves the title but grey'd out, can be undeleted easily) - full change history for that set of results, and rss feed to watch for changes - add comments to any result - select text, images, or input forms from the target site to be shown below it's search result - alternative related searches and did-you-mean (user driven) - custom backgrounds in the header for given searches I've found myself doing a bunch of these things because it's simply *fun*, so if anyone just can't wait and is willing to suffer through some UI or random breakage pain, just ping me on IM and I can accommodate some you've-been-warned early access to play. It's possible to do lots more than what's above as well, KT is a very simple and flexible framework. Besides helping with the things above, I know Dennis and Seth have been working hard on improving the index, the result quality has improved significantly and our infrastructure/systems are a lot healthier than how they ended up in the mad rush to get it ready for that crazy first week (we did 5mil queries the first day and didn't break, not bad). As a reminder in general though, this is overall a long term project, there's going to be lulls and storms. It's also a pretty insane project, and insane people seem to be rather poor organizers and communicators, lots will be broken and go un-addressed for way too long, guess we're well on track for something :) Jer -------------- next part -------------- A non-text attachment was scrubbed... Name: Search Wikia - Web Search Results.pdf Type: application/pdf Size: 340992 bytes Desc: not available Url : http://lists.wikia.com/pipermail/search-l/attachments/20080326/3c411108/attachment.pdf -------------- next part -------------- On Mar 26, 2008, at 5:13 PM, Mark (Markie) wrote: > Right, im afraid the time has come once again where i have been > wondering to my self again, and i feel that things need to be said, > so here they are. > > *Whats happening with the project. AFAIK overall (and i know > somethings have happened) but *very* little seems to have happened > since the launch. Now i know that things are probably happening > with the team, but any chance of actually telling the users about > this, cos its not looking good from here atm. > > Ive copied in the so called pillars of search > > Transparency - riiiiiiight :-( > Community - hmmm contribute to stale projects? > Quality - well.... > Privacy - hmm yes that seems to have been done to an extent ( by > the community mind) > > Ive been on the project since dec 2006, and so have been waiting > along time for this to happen, so its not purely a case of i want > everything to happen NOW, i just want it to look like SOMETHING > will happen SOON. > > *This brings me onto the next topic of where is the project > going??? There has been practically no progress, and frankly i cant > see much being done from my point. The launch has happened, many > people were interested, contributed but have now left, because > NOTHING has happened. so overall the net gain of launching the > project?? bad press and a few (relative to the web) minis. > > *Many things have been promised by various people, which havent > happened. Most specifically this has come from a certain member of > staff, one specifically, that has said that they will do many > things, but even the most basic of tasks seem to have not happened. > so Broken/missed promises. Well iirc (name here) said he would make > sure that the about pages etc were created, hmm... (http:// > alpha.search.wikia.com/about.html in case you forgot where those > were). This is a wikia project, any chance of getting ANY > involvement/input/co-ordination from the team who, ultimately, want > us to make them more successfull and a profit (if were being frank). > > Now i know i havent been that active recently on the wiki, but i > have been reading the mailing lists and talking in irc, but the > main reason for me not being active on the wiki, is mainly the fact > that i just dont have the motivation to do anything because of the > above. Frankly atm its a stale project, but hopefully this rant > (which i hate doing) will mean that the project will hopefully > become better. > > If i have offended anyone above then i am sorry, but i feel that > certain things need to be said right now, in order to make the > project better, which is my aim. > > Many thanks and look forward to the responses to this, especially > from wikia staff > > Regards > > mark > > (user:Markie) From newsmarkie at googlemail.com Thu Mar 27 09:51:17 2008 From: newsmarkie at googlemail.com (Mark (Markie)) Date: Thu, 27 Mar 2008 09:51:17 +0000 Subject: [Search-l] Sorry to do this but its coming, yes a rant :-( In-Reply-To: <01F26483-D8E8-475E-B20A-1DB658480B9A@jabber.org> References: <01F26483-D8E8-475E-B20A-1DB658480B9A@jabber.org> Message-ID: many thanks for this post, this is the kind of stuff i like to hear :-D i dont mind spurts and lulls, even expect them, but if we could have an indication of spurts, or little tasters of whats to come.. :-D and as for the lunatics, welll ill join you there :-p /me will bring the straight jackets? thanks mark On Thu, Mar 27, 2008 at 1:49 AM, jer wrote: > Ask my kids would say, "burrrrrn!" ;-) > > I feel and share your frustration, I know at least for my part I've > done a poor job keeping going a steady stream of updates on the > things I've been playing with. Of course, I've never personally been > good at regularity, tend to step back and push things out in > spurts... all I can do though is try to catch up for the last month > or so of various chaotic activities. > > First, Grub is partially stalled right now as we began in Feb the > migration to the NG version, who's server side is back-ended by > hbase, a very alpha open source bigtable clone written atop hadoop. > That process has been painfully sputtery, a cluster of ~7 servers is > maxing out at a few mil docs, not nearly good enough to push GrubNG > to grow significantly. Just last week though I think I came up with > a way to partially work around the challenges of using hbase before > it's ready, so with luck in the next week or so we can get that > distributed crawler growing again :) > > I mentioned Grub/hbase first as while working on it I spent the time > to really think about the bigtable model with hbase and how we can > use it, I even went so far as to design and build a system I called > KT (for Keyword Tuple, it's in the re.search svn repo), that should > be able to scale up while supporting a huge variety of really > exciting features. I even prototyped an implementation of both the > server and a search result interface with a bunch of features which > those that I IM with regularly have seen. I've hesitated to show it > off to a wider audience as the prototype was both horribly insecure > and unusable unless someone showed you first. > > There's a light at the end of the tunnel though, I've attached a pdf > of a ROUGH DRAFT that I hope we can actually let everyone start > playing with in as little as a week. Here's a quick list of the > various features that KT will be enabling in our search results: > > - add new result > - edit any result's title/summary (and see revisions, revert) > - highlight any single result as the best one > - star ratings (already exist) > - delete/trash any result (leaves the title but grey'd out, can be > undeleted easily) > - full change history for that set of results, and rss feed to > watch > for changes > - add comments to any result > - select text, images, or input forms from the target site to be > shown below it's search result > - alternative related searches and did-you-mean (user driven) > - custom backgrounds in the header for given searches > > I've found myself doing a bunch of these things because it's simply > *fun*, so if anyone just can't wait and is willing to suffer through > some UI or random breakage pain, just ping me on IM and I can > accommodate some you've-been-warned early access to play. > > It's possible to do lots more than what's above as well, KT is a very > simple and flexible framework. > > Besides helping with the things above, I know Dennis and Seth have > been working hard on improving the index, the result quality has > improved significantly and our infrastructure/systems are a lot > healthier than how they ended up in the mad rush to get it ready for > that crazy first week (we did 5mil queries the first day and didn't > break, not bad). > > As a reminder in general though, this is overall a long term project, > there's going to be lulls and storms. It's also a pretty insane > project, and insane people seem to be rather poor organizers and > communicators, lots will be broken and go un-addressed for way too > long, guess we're well on track for something :) > > Jer > > On Mar 26, 2008, at 5:13 PM, Mark (Markie) wrote: > > > Right, im afraid the time has come once again where i have been > > wondering to my self again, and i feel that things need to be said, > > so here they are. > > > > *Whats happening with the project. AFAIK overall (and i know > > somethings have happened) but *very* little seems to have happened > > since the launch. Now i know that things are probably happening > > with the team, but any chance of actually telling the users about > > this, cos its not looking good from here atm. > > > > Ive copied in the so called pillars of search > > > > Transparency - riiiiiiight :-( > > Community - hmmm contribute to stale projects? > > Quality - well.... > > Privacy - hmm yes that seems to have been done to an extent ( by > > the community mind) > > > > Ive been on the project since dec 2006, and so have been waiting > > along time for this to happen, so its not purely a case of i want > > everything to happen NOW, i just want it to look like SOMETHING > > will happen SOON. > > > > *This brings me onto the next topic of where is the project > > going??? There has been practically no progress, and frankly i cant > > see much being done from my point. The launch has happened, many > > people were interested, contributed but have now left, because > > NOTHING has happened. so overall the net gain of launching the > > project?? bad press and a few (relative to the web) minis. > > > > *Many things have been promised by various people, which havent > > happened. Most specifically this has come from a certain member of > > staff, one specifically, that has said that they will do many > > things, but even the most basic of tasks seem to have not happened. > > so Broken/missed promises. Well iirc (name here) said he would make > > sure that the about pages etc were created, hmm... (http:// > > alpha.search.wikia.com/about.html in case you forgot where those > > were). This is a wikia project, any chance of getting ANY > > involvement/input/co-ordination from the team who, ultimately, want > > us to make them more successfull and a profit (if were being frank). > > > > Now i know i havent been that active recently on the wiki, but i > > have been reading the mailing lists and talking in irc, but the > > main reason for me not being active on the wiki, is mainly the fact > > that i just dont have the motivation to do anything because of the > > above. Frankly atm its a stale project, but hopefully this rant > > (which i hate doing) will mean that the project will hopefully > > become better. > > > > If i have offended anyone above then i am sorry, but i feel that > > certain things need to be said right now, in order to make the > > project better, which is my aim. > > > > Many thanks and look forward to the responses to this, especially > > from wikia staff > > > > Regards > > > > mark > > > > (user:Markie) > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20080327/116f56cd/attachment.html From aerik at thesylvans.com Fri Mar 28 07:19:53 2008 From: aerik at thesylvans.com (Aerik Sylvan) Date: Fri, 28 Mar 2008 00:19:53 -0700 Subject: [Search-l] Alpha version of Clucened - a CLucene based daemon that could be used for category intersections and other interesting search applications Message-ID: <355a36af0803280019t78fce45ekcf9561aca4fe760@mail.gmail.com> Hi All, So, I went off and learned about compiling and linking and C++ and makefiles, and I've got a very early version search daemon working. It's got a couple of minor bugs, but I wanted to bring it up again as a possible unencumbered search and indexing solution. I set up a project page at clucened.com, and there is a pointer to my test implementation. Currently it isn't pointed at the categories index, but I'll do that shortly. I also posted the source code for the searcher daemon. I will work on fixing the bugs, implementing paging, etc., etc.,... I'd love any feedback. Yes, I already know my code is really rough and needs a lot of improvement and more error checking, but it'll get there. ("Release early and often"). Aerik -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20080328/d4557577/attachment.html From newsmarkie at googlemail.com Sun Mar 30 21:32:56 2008 From: newsmarkie at googlemail.com (Mark (Markie)) Date: Sun, 30 Mar 2008 22:32:56 +0100 Subject: [Search-l] Sorry to do this but its coming, yes a rant :-( In-Reply-To: References: Message-ID: re sending in case it was missed, from 4/5 days ago, maybe the people copied in (wikia staff/founders) would be willing to give a small amount of time to reply?!? mark ---------- Forwarded message ---------- From: Mark (Markie) Date: Wed, Mar 26, 2008 at 11:13 PM Subject: Sorry to do this but its coming, yes a rant :-( To: Mailing list for Search Wikia , Search Wiki < searchwiki at wikia.com>, Jimmy Wales , jer < jeremie at jabber.org>, dennis at igfoo.com Right, im afraid the time has come once again where i have been wondering to my self again, and i feel that things need to be said, so here they are. *Whats happening with the project. AFAIK overall (and i know somethings have happened) but *very* little seems to have happened since the launch. Now i know that things are probably happening with the team, but any chance of actually telling the users about this, cos its not looking good from here atm. Ive copied in the so called pillars of search 1. *Transparency* - riiiiiiight :-( 2. *Community* - hmmm contribute to stale projects? 3. *Quality* - well.... 4. *Privacy * - hmm yes that seems to have been done to an extent ( by the community mind) Ive been on the project since dec 2006, and so have been waiting along time for this to happen, so its not purely a case of i want everything to happen NOW, i just want it to look like SOMETHING will happen SOON. *This brings me onto the next topic of where is the project going??? There has been practically no progress, and frankly i cant see much being done from my point. The launch has happened, many people were interested, contributed but have now left, because NOTHING has happened. so overall the net gain of launching the project?? bad press and a few (relative to the web) minis. *Many things have been promised by various people, which havent happened. Most specifically this has come from a certain member of staff, one specifically, that has said that they will do many things, but even the most basic of tasks seem to have not happened. so Broken/missed promises. Well iirc (name here) said he would make sure that the about pages etc were created, hmm... (http://alpha.search.wikia.com/about.html in case you forgot where those were). This is a wikia project, any chance of getting ANY involvement/input/co-ordination from the team who, ultimately, want us to make them more successfull and a profit (if were being frank). Now i know i havent been that active recently on the wiki, but i have been reading the mailing lists and talking in irc, but the main reason for me not being active on the wiki, is mainly the fact that i just dont have the motivation to do anything because of the above. Frankly atm its a stale project, but hopefully this rant (which i hate doing) will mean that the project will hopefully become better. If i have offended anyone above then i am sorry, but i feel that certain things need to be said right now, in order to make the project better, which is my aim. Many thanks and look forward to the responses to this, especially from wikia staff Regards mark (user:Markie) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20080330/68c6b9a5/attachment.html From jwales at wikia.com Sun Mar 30 22:31:58 2008 From: jwales at wikia.com (Jimmy Wales) Date: Sun, 30 Mar 2008 23:31:58 +0100 Subject: [Search-l] Sorry to do this but its coming, yes a rant :-( In-Reply-To: References: Message-ID: <47F014DE.4010306@wikia.com> Mark (Markie) wrote: > re sending in case it was missed, from 4/5 days ago, maybe the people > copied in (wikia staff/founders) would be willing to give a small amount > of time to reply?!? I am actually on a (very rare) actual holiday at the moment, will be back full speed on Wednesday... From newsmarkie at googlemail.com Sun Mar 30 22:33:30 2008 From: newsmarkie at googlemail.com (Mark (Markie)) Date: Sun, 30 Mar 2008 23:33:30 +0100 Subject: [Search-l] Sorry to do this but its coming, yes a rant :-( In-Reply-To: <47F014DE.4010306@wikia.com> References: <47F014DE.4010306@wikia.com> Message-ID: okay, no problem, many thanks and look forward to it then enjoy your holidays mark On Sun, Mar 30, 2008 at 11:31 PM, Jimmy Wales wrote: > Mark (Markie) wrote: > > re sending in case it was missed, from 4/5 days ago, maybe the people > > copied in (wikia staff/founders) would be willing to give a small amount > > of time to reply?!? > > I am actually on a (very rare) actual holiday at the moment, will be > back full speed on Wednesday... > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20080330/9cc09e33/attachment.html From kubes at apache.org Mon Mar 31 01:04:46 2008 From: kubes at apache.org (Dennis Kubes) Date: Sun, 30 Mar 2008 20:04:46 -0500 Subject: [Search-l] Sorry to do this but its coming, yes a rant :-( In-Reply-To: References: Message-ID: <47F038AE.2000503@apache.org> Hi Markie, First let me say that if anything has been missed, or promised and then not delivered, it was not intentional. Second, I would agree with you that while we have been working to make changes to improve the accuracy of the search results, we have not been doing a very good job of keeping the community informed about those or other changes and that is something we need to work on. For my part I will attempt to communicate more of what we are working on in terms of the search engine internals, starting now. Probably the biggest improvement we have seen in terms of relevancy is changing how inbound link text is index. Inbound link text the text of anchors pointing to a page. We currently index that text along with a given page. So for example if page x links to page y and the anchor text reads "hotels" that text will get put into the index under page y. The problem we were having was we would index the first N number of links pointing to a page without regard for what were the best links. That provided for some weird results when we launched, for instance google.com would come up in a search for dallas hotels because it had one inbound link that said "dallas" and another that said "hotels". To fix this we started looking and inbound links according to the score of their parent (pointing from) page. The idea behind this was that higher scoring pages would have better outbound links. In our current index we first determine what the *best* links are by their parent pages score and then index the first N best links. And what we have seen as a result is a big increase in the relevancy of the search results. Here is a list of the things I see that could help improve search relevancy going forward: - Being able to score elements of web pages. For example determine if a piece of text is a h1, h2, div, etc. Currently our web pages parsers don't support that. - Better integration of the star system into the rankings and better ability for the community to tag pages as spam. This is part of the KT stuff Jer has been working on. - Overall improvement in the search algorithm. Currently the algorithm is based on nutch's OPIC implementation. Long story short this algorithm is unstable after a few iterations because web page score keep increasing exponentially. This is more of a Nutch problem and has already been discussed on the Nutch lists but essentially we need a new process for scoring and probably a new algorithm that is more pagerank-like and has some type of convergence. There are other items as well but I think these things would help show a dramatic improvement in search quality. Last let me say that anybody should feel free to email me at any time. If something isn't being done fast enough or something seems to be getting left out. Give me a nudge. :) Dennis Mark (Markie) wrote: > re sending in case it was missed, from 4/5 days ago, maybe the people > copied in (wikia staff/founders) would be willing to give a small amount > of time to reply?!? > > mark > > ---------- Forwarded message ---------- > From: *Mark (Markie)* > > Date: Wed, Mar 26, 2008 at 11:13 PM > Subject: Sorry to do this but its coming, yes a rant :-( > To: Mailing list for Search Wikia >, Search Wiki >, Jimmy Wales >, jer >, dennis at igfoo.com > > > Right, im afraid the time has come once again where i have been > wondering to my self again, and i feel that things need to be said, so > here they are. > > *Whats happening with the project. AFAIK overall (and i know somethings > have happened) but *very* little seems to have happened since the > launch. Now i know that things are probably happening with the team, > but any chance of actually telling the users about this, cos its not > looking good from here atm. > > Ive copied in the so called pillars of search > > 1. *Transparency* - riiiiiiight :-( > 2. *Community* - hmmm contribute to stale projects? > 3. *Quality* - well.... > 4. *Privacy * - hmm yes > that seems to have been done to an extent ( by the community mind) > > > Ive been on the project since dec 2006, and so have been waiting along > time for this to happen, so its not purely a case of i want everything > to happen NOW, i just want it to look like SOMETHING will happen SOON. > > *This brings me onto the next topic of where is the project going??? > There has been practically no progress, and frankly i cant see much > being done from my point. The launch has happened, many people were > interested, contributed but have now left, because NOTHING has happened. > so overall the net gain of launching the project?? bad press and a few > (relative to the web) minis. > > *Many things have been promised by various people, which havent > happened. Most specifically this has come from a certain member of > staff, one specifically, that has said that they will do many things, > but even the most basic of tasks seem to have not happened. so > Broken/missed promises. Well iirc (name here) said he would make sure > that the about pages etc were created, hmm... > (http://alpha.search.wikia.com/about.html in case you forgot where those > were). This is a wikia project, any chance of getting ANY > involvement/input/co-ordination from the team who, ultimately, want us > to make them more successfull and a profit (if were being frank). > > Now i know i havent been that active recently on the wiki, but i have > been reading the mailing lists and talking in irc, but the main reason > for me not being active on the wiki, is mainly the fact that i just dont > have the motivation to do anything because of the above. Frankly atm > its a stale project, but hopefully this rant (which i hate doing) will > mean that the project will hopefully become better. > > If i have offended anyone above then i am sorry, but i feel that certain > things need to be said right now, in order to make the project better, > which is my aim. > > Many thanks and look forward to the responses to this, especially from > wikia staff > > Regards > > mark > > (user:Markie) > > > ------------------------------------------------------------------------ > > _______________________________________________ > Wikia Search mailing list > http://alpha.search.wikia.com/ > Change options or unsubscribe: http://lists.wikia.com/mailman/options/search-l From newsmarkie at googlemail.com Mon Mar 31 10:59:45 2008 From: newsmarkie at googlemail.com (Mark (Markie)) Date: Mon, 31 Mar 2008 11:59:45 +0100 Subject: [Search-l] Sorry to do this but its coming, yes a rant :-( In-Reply-To: <47F038AE.2000503@apache.org> References: <47F038AE.2000503@apache.org> Message-ID: jer, dennis i would like to say thank you very very much for the feedback that you have sent. the point of my email was not to take/make personal digs at people, the aim was to find out what was actually going on with the team, as from a normal contributors view nothing was happening. my thoughts were that lots was going on, but we didnt know about it, which is not good as far as i was concerned. thus i want to thank you again for all your great work, and hope that this kinda thing, with more openess of all the work that is going on, continues long into the future many thanks, and apologies if i crossed the line mark On Mon, Mar 31, 2008 at 2:04 AM, Dennis Kubes wrote: > Hi Markie, > > First let me say that if anything has been missed, or promised and then > not delivered, it was not intentional. Second, I would agree with you > that while we have been working to make changes to improve the accuracy > of the search results, we have not been doing a very good job of keeping > the community informed about those or other changes and that is > something we need to work on. > > For my part I will attempt to communicate more of what we are working on > in terms of the search engine internals, starting now. Probably the > biggest improvement we have seen in terms of relevancy is changing how > inbound link text is index. > > Inbound link text the text of anchors pointing to a page. We currently > index that text along with a given page. So for example if page x links > to page y and the anchor text reads "hotels" that text will get put into > the index under page y. The problem we were having was we would index > the first N number of links pointing to a page without regard for what > were the best links. That provided for some weird results when we > launched, for instance google.com would come up in a search for dallas > hotels because it had one inbound link that said "dallas" and another > that said "hotels". To fix this we started looking and inbound links > according to the score of their parent (pointing from) page. The idea > behind this was that higher scoring pages would have better outbound > links. In our current index we first determine what the *best* links > are by their parent pages score and then index the first N best links. > And what we have seen as a result is a big increase in the relevancy of > the search results. > > Here is a list of the things I see that could help improve search > relevancy going forward: > > - Being able to score elements of web pages. For example determine if a > piece of text is a h1, h2, div, etc. Currently our web pages parsers > don't support that. > > - Better integration of the star system into the rankings and better > ability for the community to tag pages as spam. This is part of the KT > stuff Jer has been working on. > > - Overall improvement in the search algorithm. Currently the algorithm > is based on nutch's OPIC implementation. Long story short this > algorithm is unstable after a few iterations because web page score keep > increasing exponentially. This is more of a Nutch problem and has > already been discussed on the Nutch lists but essentially we need a new > process for scoring and probably a new algorithm that is more > pagerank-like and has some type of convergence. > > There are other items as well but I think these things would help show a > dramatic improvement in search quality. > > Last let me say that anybody should feel free to email me at any time. > If something isn't being done fast enough or something seems to be > getting left out. Give me a nudge. :) > > Dennis > > > Mark (Markie) wrote: > > re sending in case it was missed, from 4/5 days ago, maybe the people > > copied in (wikia staff/founders) would be willing to give a small amount > > of time to reply?!? > > > > mark > > > > ---------- Forwarded message ---------- > > From: *Mark (Markie)* > > > > Date: Wed, Mar 26, 2008 at 11:13 PM > > Subject: Sorry to do this but its coming, yes a rant :-( > > To: Mailing list for Search Wikia > >, Search Wiki > >, Jimmy Wales > >, jer > >, dennis at igfoo.com > > > > > > Right, im afraid the time has come once again where i have been > > wondering to my self again, and i feel that things need to be said, so > > here they are. > > > > *Whats happening with the project. AFAIK overall (and i know somethings > > have happened) but *very* little seems to have happened since the > > launch. Now i know that things are probably happening with the team, > > but any chance of actually telling the users about this, cos its not > > looking good from here atm. > > > > Ive copied in the so called pillars of search > > > > 1. *Transparency* - riiiiiiight :-( > > 2. *Community* - hmmm contribute to stale projects? > > 3. *Quality* - well.... > > 4. *Privacy * - hmm yes > > that seems to have been done to an extent ( by the community mind) > > > > > > Ive been on the project since dec 2006, and so have been waiting along > > time for this to happen, so its not purely a case of i want everything > > to happen NOW, i just want it to look like SOMETHING will happen SOON. > > > > *This brings me onto the next topic of where is the project going??? > > There has been practically no progress, and frankly i cant see much > > being done from my point. The launch has happened, many people were > > interested, contributed but have now left, because NOTHING has happened. > > so overall the net gain of launching the project?? bad press and a few > > (relative to the web) minis. > > > > *Many things have been promised by various people, which havent > > happened. Most specifically this has come from a certain member of > > staff, one specifically, that has said that they will do many things, > > but even the most basic of tasks seem to have not happened. so > > Broken/missed promises. Well iirc (name here) said he would make sure > > that the about pages etc were created, hmm... > > (http://alpha.search.wikia.com/about.html in case you forgot where those > > were). This is a wikia project, any chance of getting ANY > > involvement/input/co-ordination from the team who, ultimately, want us > > to make them more successfull and a profit (if were being frank). > > > > Now i know i havent been that active recently on the wiki, but i have > > been reading the mailing lists and talking in irc, but the main reason > > for me not being active on the wiki, is mainly the fact that i just dont > > have the motivation to do anything because of the above. Frankly atm > > its a stale project, but hopefully this rant (which i hate doing) will > > mean that the project will hopefully become better. > > > > If i have offended anyone above then i am sorry, but i feel that certain > > things need to be said right now, in order to make the project better, > > which is my aim. > > > > Many thanks and look forward to the responses to this, especially from > > wikia staff > > > > Regards > > > > mark > > > > (user:Markie) > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Wikia Search mailing list > > http://alpha.search.wikia.com/ > > Change options or unsubscribe: > http://lists.wikia.com/mailman/options/search-l > _______________________________________________ > Wikia Search mailing list > http://alpha.search.wikia.com/ > Change options or unsubscribe: > http://lists.wikia.com/mailman/options/search-l > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20080331/5847245a/attachment.html From newsmarkie at googlemail.com Mon Mar 31 20:54:03 2008 From: newsmarkie at googlemail.com (Mark (Markie)) Date: Mon, 31 Mar 2008 21:54:03 +0100 Subject: [Search-l] Sorry to do this but its coming, yes a rant :-( In-Reply-To: <47F038AE.2000503@apache.org> References: <47F038AE.2000503@apache.org> Message-ID: hmmm sent one reply to this, but forgot to comment :-p so, comments below mark On Mon, Mar 31, 2008 at 2:04 AM, Dennis Kubes wrote: > Hi Markie, > > First let me say that if anything has been missed, or promised and then > not delivered, it was not intentional. okay so maybe there not intentional, but any chance of them being sorted? :-p > Second, I would agree with you > that while we have been working to make changes to improve the accuracy > of the search results, we have not been doing a very good job of keeping > the community informed about those or other changes and that is > something we need to work on. > > For my part I will attempt to communicate more of what we are working on > in terms of the search engine internals, starting now. excellent :-D many thanks > Probably the > biggest improvement we have seen in terms of relevancy is changing how > inbound link text is index. > > Inbound link text the text of anchors pointing to a page. We currently > index that text along with a given page. So for example if page x links > to page y and the anchor text reads "hotels" that text will get put into > the index under page y. The problem we were having was we would index > the first N number of links pointing to a page without regard for what > were the best links. That provided for some weird results when we > launched, for instance google.com would come up in a search for dallas > hotels because it had one inbound link that said "dallas" and another > that said "hotels". To fix this we started looking and inbound links > according to the score of their parent (pointing from) page. The idea > behind this was that higher scoring pages would have better outbound > links. In our current index we first determine what the *best* links > are by their parent pages score and then index the first N best links. > And what we have seen as a result is a big increase in the relevancy of > the search results. > excellent, as this has been one of our major problems, so im glad to hear that work is being done to sort the problem > > Here is a list of the things I see that could help improve search > relevancy going forward: > > - Being able to score elements of web pages. For example determine if a > piece of text is a h1, h2, div, etc. Currently our web pages parsers > don't support that. are these codes available anywhere in wikia's svn? > > > - Better integration of the star system into the rankings and better > ability for the community to tag pages as spam. This is part of the KT > stuff Jer has been working on. :-D > > > - Overall improvement in the search algorithm. Currently the algorithm > is based on nutch's OPIC implementation. Long story short this > algorithm is unstable after a few iterations because web page score keep > increasing exponentially. This is more of a Nutch problem and has > already been discussed on the Nutch lists but essentially we need a new > process for scoring and probably a new algorithm that is more > pagerank-like and has some type of convergence. > > There are other items as well but I think these things would help show a > dramatic improvement in search quality. > > Last let me say that anybody should feel free to email me at any time. > If something isn't being done fast enough or something seems to be > getting left out. Give me a nudge. :) > /me adds you to contacts :-p > > Dennis > > > Mark (Markie) wrote: > > re sending in case it was missed, from 4/5 days ago, maybe the people > > copied in (wikia staff/founders) would be willing to give a small amount > > of time to reply?!? > > > > mark > > > > ---------- Forwarded message ---------- > > From: *Mark (Markie)* > > > > Date: Wed, Mar 26, 2008 at 11:13 PM > > Subject: Sorry to do this but its coming, yes a rant :-( > > To: Mailing list for Search Wikia > >, Search Wiki > >, Jimmy Wales > >, jer > >, dennis at igfoo.com > > > > > > Right, im afraid the time has come once again where i have been > > wondering to my self again, and i feel that things need to be said, so > > here they are. > > > > *Whats happening with the project. AFAIK overall (and i know somethings > > have happened) but *very* little seems to have happened since the > > launch. Now i know that things are probably happening with the team, > > but any chance of actually telling the users about this, cos its not > > looking good from here atm. > > > > Ive copied in the so called pillars of search > > > > 1. *Transparency* - riiiiiiight :-( > > 2. *Community* - hmmm contribute to stale projects? > > 3. *Quality* - well.... > > 4. *Privacy * - hmm yes > > that seems to have been done to an extent ( by the community mind) > > > > > > Ive been on the project since dec 2006, and so have been waiting along > > time for this to happen, so its not purely a case of i want everything > > to happen NOW, i just want it to look like SOMETHING will happen SOON. > > > > *This brings me onto the next topic of where is the project going??? > > There has been practically no progress, and frankly i cant see much > > being done from my point. The launch has happened, many people were > > interested, contributed but have now left, because NOTHING has happened. > > so overall the net gain of launching the project?? bad press and a few > > (relative to the web) minis. > > > > *Many things have been promised by various people, which havent > > happened. Most specifically this has come from a certain member of > > staff, one specifically, that has said that they will do many things, > > but even the most basic of tasks seem to have not happened. so > > Broken/missed promises. Well iirc (name here) said he would make sure > > that the about pages etc were created, hmm... > > (http://alpha.search.wikia.com/about.html in case you forgot where those > > were). This is a wikia project, any chance of getting ANY > > involvement/input/co-ordination from the team who, ultimately, want us > > to make them more successfull and a profit (if were being frank). > > > > Now i know i havent been that active recently on the wiki, but i have > > been reading the mailing lists and talking in irc, but the main reason > > for me not being active on the wiki, is mainly the fact that i just dont > > have the motivation to do anything because of the above. Frankly atm > > its a stale project, but hopefully this rant (which i hate doing) will > > mean that the project will hopefully become better. > > > > If i have offended anyone above then i am sorry, but i feel that certain > > things need to be said right now, in order to make the project better, > > which is my aim. > > > > Many thanks and look forward to the responses to this, especially from > > wikia staff > > > > Regards > > > > mark > > > > (user:Markie) > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Wikia Search mailing list > > http://alpha.search.wikia.com/ > > Change options or unsubscribe: > http://lists.wikia.com/mailman/options/search-l > _______________________________________________ > Wikia Search mailing list > http://alpha.search.wikia.com/ > Change options or unsubscribe: > http://lists.wikia.com/mailman/options/search-l > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20080331/9a33f285/attachment.html