From newsmarkie at googlemail.com Sun Sep 2 13:38:23 2007 From: newsmarkie at googlemail.com (Wikinews Markie) Date: Sun, 2 Sep 2007 14:38:23 +0100 Subject: [Search-l] Search Wikia Launch Date Message-ID: Read about a possible release date of Search Wikia on the un-official blog here. http://searchwikia.wordpress.com/ Info comes from this article(The Times Online) about an interview with Jimmy talking about wikipedia and Search Wikia. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20070902/09799497/attachment.html From peter.burden at gmail.com Sun Sep 2 15:10:51 2007 From: peter.burden at gmail.com (peter burden) Date: Sun, 02 Sep 2007 16:10:51 +0100 Subject: [Search-l] Search Wikia Launch Date In-Reply-To: References: Message-ID: <46DAD27B.9000507@gmail.com> Wikinews Markie wrote: > Read about a possible release date of Search Wikia on the un-official > blog here. > > http://searchwikia.wordpress.com/ > > Info comes from this article > > (The Times Online) about an interview with Jimmy talking about > wikipedia and Search Wikia. The article also appears in today's (2/9) Sunday Times Business Section as a half-page article under the heading "Wikipedia Aims to roll over Google". And the date; "In December, he (Jimmy Wales) will launch Wikia Search, a search engine to compete with giants such as Google and Yahoo". [Doesn't say which December ;-)] > > Mark > ------------------------------------------------------------------------ > > _______________________________________________ > Search-l mailing list > Search-l at wikia.com > http://lists.wikia.com/mailman/listinfo/search-l > Change options or unsubscribe: http://lists.wikia.com/mailman/options/search-l From jmcc at hackwatch.com Sun Sep 2 22:59:37 2007 From: jmcc at hackwatch.com (John McCormac) Date: Sun, 02 Sep 2007 23:59:37 +0100 Subject: [Search-l] Search Wikia Launch Date In-Reply-To: <46DAD27B.9000507@gmail.com> References: <46DAD27B.9000507@gmail.com> Message-ID: <46DB4059.3000607@hackwatch.com> peter burden wrote: > Wikinews Markie wrote: > >>Read about a possible release date of Search Wikia on the un-official >>blog here. >> >>http://searchwikia.wordpress.com/ >> >>Info comes from this article >> >>(The Times Online) about an interview with Jimmy talking about >>wikipedia and Search Wikia. > > The article also appears in today's (2/9) Sunday Times Business Section > as a half-page article under > the heading "Wikipedia Aims to roll over Google". And the date; "In > December, he (Jimmy Wales) will > launch Wikia Search, a search engine to compete with giants such as > Google and Yahoo". > > [Doesn't say which December ;-)] True. Nearly a year after the shock and awe of the press barrage started, there is finally a release date. Now if only there was a product. Wikia could be the next Microsoft... ;) Regards...jmcc -- ****************************************************** John McCormac * e-mail: jmcc at whoisireland.com MC2 * voice: +353-51-873640 22 Viewmount * web: http://www.whoisireland.com/ Waterford * blog: http://blog.whoisireland.com Ireland * Irish Domain Stats & Market Research ****************************************************** From me at mark-elliott.net Wed Sep 5 16:33:36 2007 From: me at mark-elliott.net (Mark Elliott) Date: Wed, 5 Sep 2007 17:33:36 +0100 (BST) Subject: [Search-l] Invite from Mark Elliott (me@mark-elliott.net) Message-ID: <20070905163337.2A2352B096D@mail2.quechup.com> MarkElliott (me at mark-elliott.net) has invited you as a friend on Quechup... ...the social networking platform sweeping the globe Go to: http://quechup.com/join.php/aT0wMDAwMDAwMDA5MzkzNDQyJmM9OTgyOTA%3D to accept Mark's invite You can use Quechup to meet new people, catch up with old friends, maintain a blog, share videos & photos, chat with other members, play games, and more. It's no wonder Quechup is fast becoming 'The Social Networking site to be on' Join Mark and his friends today: http://quechup.com/join.php/aT0wMDAwMDAwMDA5MzkzNDQyJmM9OTgyOTA%3D ------------------------------------------------------------------ You received this because Mark Elliott (me at mark-elliott.net) knows and agreed to invite you. You will only receive one invitation from me at mark-elliott.net. Quechup will not spam or sell your email address, see our privacy policy - http://quechup.com/privacy.php Go to http://quechup.com/emailunsubscribe.php/ZW09c2VhcmNoLWxAd2lraWEuY29t if you do not wish to receive any more emails from Quechup. ------------------------------------------------------------------ Copyright Quechup.com 2007. ------------------------------------ Go to http://quechup.com/emailunsubscribe.php/ZW09c2VhcmNoLWxAd2lraWEuY29t if you do not wish to receive any more emails from Quechup -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20070905/6bccc6ef/attachment.html From me at mark-elliott.net Wed Sep 5 22:18:39 2007 From: me at mark-elliott.net (Mark Elliott) Date: Thu, 6 Sep 2007 08:18:39 +1000 Subject: [Search-l] Invite from Mark Elliott (me@mark-elliott.net) In-Reply-To: <20070905163337.2A2352B096D@mail2.quechup.com> References: <20070905163337.2A2352B096D@mail2.quechup.com> Message-ID: <21e75e70709051518o18ddde65v616866a9b5ca6e34@mail.gmail.com> sorry, this site seems to have a very annoying auto invite function that i didn't initiate. beware. mark On 9/6/07, Mark Elliott wrote: > > [image: Quechup.com] > Trouble viewing this e-mail - click here > *Mark > Elliott* (me at mark-elliott.net) > has invited you as a friend on Quechup... > ...the social networking platform sweeping the globe > Click here to accept Mark's invite > You can use Quechup to meet new people, catch up with old friends, > maintain a blog, share videos & photos, chat with other members, play games, > and more. It's no wonder Quechup is fast becoming 'The Social Networking > site to be on'. > Join Mark and his friends today: > http://quechup.com/join.php/aT0wMDAwMDAwMDA5MzkzNDQyJmM9OTgyOTA%3D > You received this because Mark Elliott (me at mark-elliott.net) knows and > agreed to invite you. You will only receive one invite from > me at mark-elliott.net. Quechup will not spam or sell your email address - privacy > policy . (c) Quechup 2007. > Click here if you do not wish to receive any more emails from Quechup > > _______________________________________________ > Search-l mailing list > Search-l at wikia.com > http://lists.wikia.com/mailman/listinfo/search-l > Change options or unsubscribe: > http://lists.wikia.com/mailman/options/search-l > -- ----- Mark Elliott PhD Candidate The Centre for Ideas Victorian College of the Arts The University of Melbourne 234 St Kilda Rd SOUTHBANK 3006 Victoria, Australia Mob: 0421 978 501 http://mark-elliott.net/, http://metacollab.net/ me at mark-elliott.net -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20070906/f53b83f4/attachment.html From jason at calacanis.com Thu Sep 6 00:01:33 2007 From: jason at calacanis.com (Jason Calacanis) Date: Wed, 5 Sep 2007 17:01:33 -0700 Subject: [Search-l] Wikia social networking + MediaWiki Message-ID: <70b3cf150709051701k232cee1awdd917d0d75ed6f74@mail.gmail.com> Team Wikia, Are you guys contributing back the social networking stuff on Wikia/ArmchairGM to MediaWiki? Seems like really cool evolution of MediaWiki. Was reading about it here: http://www.techcrunch.com/2007/09/04/wikias-airmchair-gm-wiki-meets-social-network/ best j --------------------- Jason McCabe Calacanis CEO, http://www.Mahalo.com Mobile: 310-456-4900 My blog: http://www.calacanis.com AOL IM/Skype: jasoncalacanis -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20070905/bbd5d072/attachment.html From newsmarkie at googlemail.com Thu Sep 6 21:12:59 2007 From: newsmarkie at googlemail.com (Wikinews Markie) Date: Thu, 6 Sep 2007 22:12:59 +0100 Subject: [Search-l] Interview with Jeremie Miller Message-ID: I'm currently writing a piece including an interview with Jeremie for the blog. If you want to ask a question then please email it to me in the next few days (say 2) and depending on the numbers i will try to ask him. Thanks Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20070906/09809abf/attachment.html From jeremie at jabber.org Fri Sep 7 21:49:13 2007 From: jeremie at jabber.org (jer) Date: Fri, 7 Sep 2007 16:49:13 -0500 Subject: [Search-l] Search Wikia Launch Date In-Reply-To: <46DB4059.3000607@hackwatch.com> References: <46DAD27B.9000507@gmail.com> <46DB4059.3000607@hackwatch.com> Message-ID: <28DC47FB-038F-4FD5-9805-D02482748056@jabber.org> >> [Doesn't say which December ;-)] > > True. Nearly a year after the shock and awe of the press barrage > started, there is finally a release date. Now if only there was a > product. Wikia could be the next Microsoft... ;) Heh, I think Facebook is trying hard for that, not in respect to vaporware but more so as an owned platform :) As for having something to play with by December, that's a great goal and all, but it isn't going to be anything but experimental and rather tiny at best by then. The current thinking is to get the Grub crawling at least somewhat stable and start to have an experimental nutch/lucene based index atop it, along with testing some social feedback ideas in the result pages. Jer From newsmarkie at googlemail.com Wed Sep 26 15:37:07 2007 From: newsmarkie at googlemail.com (Wikinews Markie) Date: Wed, 26 Sep 2007 16:37:07 +0100 Subject: [Search-l] Short interview with Jeremie Miller Message-ID: Ive just published the longly awaited short interview with Jer. Unfortunately the interview was very short due to few questions, however i have also included a development post which highlights the progress of Grubsdevelopment. It's published on the UN-official blog at the link below. http://searchwikia.wordpress.com/ Thanks Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20070926/27e8ef21/attachment.html From newsmarkie at googlemail.com Thu Sep 27 10:17:42 2007 From: newsmarkie at googlemail.com (Wikinews Markie) Date: Thu, 27 Sep 2007 11:17:42 +0100 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <1527109668@web.de> References: <1527109668@web.de> Message-ID: yes it is kinda spam but its totally related to the project and many people want to read this article. also many media sources are attached to this list and the amount of articles that were started on the basis of stuff written on the blog was huge. if you think its just spam then dont read it - i dont care who reads it but its for the people who are interested. mark On 9/27/07, Franz Zimmer wrote: > > SPAM > > -----Urspr?ngliche Nachricht----- > Von: "Wikinews Markie" > Gesendet: 26.09.07 17:39:18 > An: grub-dev at wikia.com > Betreff: [Search-l] Short interview with Jeremie Miller > > Ive just published the longly awaited short interview with > Jer. Unfortunately the interview was very short due to few questions, > however i have also included a development post which highlights the > progress of > Grubs development. It's published on the UN-official blog at the link > below. > > > http://searchwikia.wordpress.com/ > > > > Thanks > > Mark > > ----------------------------------------------------------------- > _______________________________________________ > Search-l mailing list > Search-l at wikia.com > http://lists.wikia.com/mailman/listinfo/search-l > Change options or unsubscribe: > http://lists.wikia.com/mailman/options/search-l > > > > _______________________________________________________________________ > Jetzt neu! Sch?tzen Sie Ihren PC mit McAfee und WEB.DE. 3 Monate > kostenlos testen. http://www.pc-sicherheit.web.de/startseite/?mc=022220 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20070927/09f55334/attachment.html From jmcc at hackwatch.com Thu Sep 27 10:52:12 2007 From: jmcc at hackwatch.com (John McCormac) Date: Thu, 27 Sep 2007 11:52:12 +0100 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: References: Message-ID: <46FB8B5C.4080207@hackwatch.com> Wikinews Markie wrote: > Ive just published the longly awaited short interview with Jer. > Unfortunately the interview was very short due to few questions, however > i have also included a development post which highlights the progress of > Grubs development. It's published on the > UN-official blog at the link below. Interesting interview. I didn't realise that Grub was quite that bad. The repackaging of website data like that is going to create quite a backlash as there is nothing stopping a Made For Adsense (basically a site that scrapes the content of others and sticks it up with Adsense in an attempt to make money) using the content of real sites. As a direct consequence, any Wikiasearch affiliated bot will be banned from many websites in the same way that Grub and other troublesome scraper programs are banned. On the search side, the Wikiasearch project (if we can call it that) doesn't seem to be doing anything beyond what hundreds of small search startups are doing. The management, bundling and repackaging aspects is, so far, perhaps the only innovative angles. Regards...jmcc -- ****************************************************** John McCormac * e-mail: jmcc at whoisireland.com MC2 * voice: +353-51-873640 22 Viewmount * web: http://www.whoisireland.com/ Waterford * blog: http://blog.whoisireland.com Ireland * Irish Domain Stats & Market Research ****************************************************** From jason at calacanis.com Fri Sep 28 00:39:03 2007 From: jason at calacanis.com (Jason Calacanis) Date: Thu, 27 Sep 2007 17:39:03 -0700 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <46FB8B5C.4080207@hackwatch.com> References: <46FB8B5C.4080207@hackwatch.com> Message-ID: <70b3cf150709271739y58a1ebb4uda104a0a2caa6b0@mail.gmail.com> Some interesting feedback from Seth below... interested to hear the answers to these questions. best j ------------------ Jason McCabe Calacanis CEO, http://www.Mahalo.com "Wikia Search" interview and state of Wikipedia-model search project There's a brief interview with Jeremie Millier about the current status of what he's doing for "Wikia search", which is the for-profit Wikipedia-model search project. I'd submitted a few suggested questions for this interview, but they were all rejected. I had wanted to know: 1) Roughly, how many people will be *paid* on the project? 1b) Can you specify whether at developed vs. developing economy pay scales? 2) Do you plan to hire anyone with search engine development expertise? 3) Do you think there's a cultural conflict between Wikipedia's model of operating, where in theory nobody owns any articles, and code development, where typically specific people "own" various subsystems? Which path do you plan to try to follow? Note understanding 1b) requires some context. It was based on how the company Wikia had decided to offshore programmers - to Poland! That's definitely not something that's talked about a lot. "Wales said he settled on Poland in part because software engineers there are simultaneously highly skilled and affordable, a combination that he said is hard to find, even elsewhere in Eastern Europe." [Keep in mind, all you US programmers who are tempted to fall for the marketing, you're not affordable - everyone thinks it's going to be the other guy who works for free.] Anyway, even though the interview only covers technical topics, it's still worth a read if you're interested in some details of what's behind the hype the audience is being fed. For summary, given my position above, I'll just quote John McCormac's list-comment Interesting interview. I didn't realise that Grub was quite that bad. ... On the search side, the Wikiasearch project (if we can call it that) doesn't seem to be doing anything beyond what hundreds of small search startups are doing. The management, bundling and repackaging aspects is, so far, perhaps the only innovative angles. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20070927/b1135f78/attachment.html From jeremie at jabber.org Fri Sep 28 07:11:14 2007 From: jeremie at jabber.org (jer) Date: Fri, 28 Sep 2007 02:11:14 -0500 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <46FB8B5C.4080207@hackwatch.com> References: <46FB8B5C.4080207@hackwatch.com> Message-ID: > The repackaging of website data like that is going to create quite a > backlash as there is nothing stopping a Made For Adsense (basically a > site that scrapes the content of others and sticks it up with > Adsense in > an attempt to make money) using the content of real sites. Access to the crawled content is just another wiki function, the download activity will be transparent on any user page along with some global summary reporting tools. If the community feels any user is misbehaving or acting inappropriately then they can take action, it's just a wiki. > On the search side, the Wikiasearch project (if we can call it that) > doesn't seem to be doing anything beyond what hundreds of small search > startups are doing. A great ecosystem to be part of by the way, if any of the work we do helps any of them succeed then I'll be very happy :) > The management, bundling and repackaging aspects is, > so far, perhaps the only innovative angles. Awesome, I'll take that as a compliment and build on it! Jer From jeremie at jabber.org Fri Sep 28 07:28:08 2007 From: jeremie at jabber.org (jer) Date: Fri, 28 Sep 2007 02:28:08 -0500 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <70b3cf150709271739y58a1ebb4uda104a0a2caa6b0@mail.gmail.com> References: <46FB8B5C.4080207@hackwatch.com> <70b3cf150709271739y58a1ebb4uda104a0a2caa6b0@mail.gmail.com> Message-ID: I'm not really sure who wrote what parts of the text below but it doesn't matter, there's some deeper history and context here that I don't have and will continue to ignore. These angles be made to sound as highbrow as you want but I still find this rather immature and unhelpful. To make this damn clear, I'm not here to build another search engine, I'm here to make sure that everyone else can. This isn't about one company or one person, it's just about making the tools and services for search more open and accessible to *everybody*. Jer On Sep 27, 2007, at 7:39 PM, Jason Calacanis wrote: > Some interesting feedback from Seth below... interested to hear the > answers to these questions. > > best j > ------------------ > Jason McCabe Calacanis > CEO, http://www.Mahalo.com > > > "Wikia Search" interview and state of Wikipedia-model search project > > There's a brief interview with Jeremie Millier about the current > status of what he's doing for "Wikia search", which is the for- > profit Wikipedia-model search project. > > I'd submitted a few suggested questions for this interview, but > they were all rejected. I had wanted to know: > > 1) Roughly, how many people will be *paid* on the project? > > 1b) Can you specify whether at developed vs. developing economy > pay scales? > > 2) Do you plan to hire anyone with search engine development > expertise? > > 3) Do you think there's a cultural conflict between Wikipedia's > model of operating, where in theory nobody owns any articles, and > code development, where typically specific people "own" various > subsystems? Which path do you plan to try to follow? > > Note understanding 1b) requires some context. It was based on how > the company Wikia had decided to offshore programmers - to Poland! > That's definitely not something that's talked about a lot. > > "Wales said he settled on Poland in part because software > engineers there are simultaneously highly skilled and affordable, a > combination that he said is hard to find, even elsewhere in Eastern > Europe." > > [Keep in mind, all you US programmers who are tempted to fall for > the marketing, you're not affordable - everyone thinks it's going > to be the other guy who works for free.] > > Anyway, even though the interview only covers technical topics, > it's still worth a read if you're interested in some details of > what's behind the hype the audience is being fed. > > For summary, given my position above, I'll just quote John > McCormac's list-comment > > Interesting interview. I didn't realise that Grub was quite > that bad. > > ... > > On the search side, the Wikiasearch project (if we can call it > that) doesn't seem to be doing anything beyond what hundreds of > small search startups are doing. The management, bundling and > repackaging aspects is, so far, perhaps the only innovative angles. > _______________________________________________ > Search-l mailing list > Search-l at wikia.com > http://lists.wikia.com/mailman/listinfo/search-l > Change options or unsubscribe: http://lists.wikia.com/mailman/ > options/search-l -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20070928/dfb7e01e/attachment.html From jmcc at hackwatch.com Fri Sep 28 11:12:24 2007 From: jmcc at hackwatch.com (John McCormac) Date: Fri, 28 Sep 2007 12:12:24 +0100 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: References: <46FB8B5C.4080207@hackwatch.com> Message-ID: <46FCE198.3060700@hackwatch.com> jer wrote: > Access to the crawled content is just another wiki function, the > download activity will be transparent on any user page along with some > global summary reporting tools. If the community feels any user is > misbehaving or acting inappropriately then they can take action, it's > just a wiki. It is early here and I'm not sure if I understand this correctly. a: What you are saying is that the snapshots of crawled webpages will be available for download by anyone? b: Wikiasearch denies any responsibility for the end use or abuse of this freely available set of the (copyrighted) data of others? c: Wikiasearch hopes that people will be nice with the data and not abuse it and if someone abuses it, the community will admonish them after the fact? d: Wikiasearch will be repackaging the copyrighted works of others and making them available for download? > A great ecosystem to be part of by the way, if any of the work we do > helps any of them succeed then I'll be very happy :) The whole "ecosystem" term is one that is greatly abused. The search industry is more like a group of medieval warring city states and countries. Each one is desperately fighting for its own survival in a highly competitive and dangerous environment. Many will remain at the warring city state level for years and only a few will rise to the empire stage of Google or Yahoo. I think that you still do not understand the mentality of a search engine developer. Most of us would consider the market in which we operate in terms of threats and opportunities. These search startups are all busy working on search products. Some will succeed and many will fail. It is a business with a very high attrition rate. Those who survive tend to be somewhat cynical about new projects. We have to develop a kind of survival instinct that enables us to quickly determine what will work in the market and what will not. >> The management, bundling and repackaging aspects is, >> so far, perhaps the only innovative angles. > > > Awesome, I'll take that as a compliment and build on it! A compliment as dreadful as my poor grammar. :) Perhaps the manage, crawl and repackage aspect is the only innovative angle. But in the end, what makes Wikiasearch different from those who compile spam lists? One of the main problems that search engines have with their indices is dealing effectively with spam. Your Wikiasearch project will, realistically, add to that search engine spam problem by providing the MFAs with the copyright content of others. Has Wikiasearch really thought about the implications both for the search ecosystem and from a legal point of view? Regards...jmcc -- ****************************************************** John McCormac * e-mail: jmcc at whoisireland.com MC2 * voice: +353-51-873640 22 Viewmount * web: http://www.whoisireland.com/ Waterford * blog: http://blog.whoisireland.com Ireland * Irish Domain Stats & Market Research ****************************************************** From jmcc at hackwatch.com Fri Sep 28 12:05:34 2007 From: jmcc at hackwatch.com (John McCormac) Date: Fri, 28 Sep 2007 13:05:34 +0100 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: References: <46FB8B5C.4080207@hackwatch.com> <70b3cf150709271739y58a1ebb4uda104a0a2caa6b0@mail.gmail.com> Message-ID: <46FCEE0E.8030600@hackwatch.com> jer wrote: > To make this damn clear, I'm not here to build another search engine, So the Google Killer is a thing of the past then? :) It looks like Wikiasearch will have to come up with a new angle - perhaps "the commoditisation of search". Or even "ubiquitisation of search". Or sticking with the biological examples, "grow your own search engine with Wikia". :) Don't worry though - technology journalists rarely have a clue about technology and will perpetually be in awe of the latest well crafted marketing slogan. > I'm here to make sure that everyone else can. This isn't about one > company or one person, it's just about making the tools and services for > search more open and accessible to *everybody*. Seth's point about Wikia programming jobs being outsourced to Poland is an important one. I wonder how American programmers feel about having their jobs "Open Sourced" to Poland in the name of Wikia's profit. Despite all your happy clappy cheer leading Jer, that is a sad betrayal of American programmers by Wikia. But getting back to the core issues - what tools and services (beyond the repackaged content and managed/targeted crawling) is Wikia going to provide? Nutch, Mnogosearch, Datapark and a few others already offer the search engine software. It is possible to get a relatively high spec server for a few hundred Dollars a month these days and the price of bandwidth has fallen dramatically in the last few years. So what will Wikia offer that adds to what is already available? And what happens if *everybody* does not want to build their own search engine? The tools are there for everyone to build to build their own webpages and yet most do not do it. The more I look at this project and its spiral from Google Killer to search toolbox, the more I wonder if Wikia is a transplant from the dot.bomb era. Still though, let's be optimistic and hope that something does appear in December. Regards...jmcc -- ****************************************************** John McCormac * e-mail: jmcc at whoisireland.com MC2 * voice: +353-51-873640 22 Viewmount * web: http://www.whoisireland.com/ Waterford * blog: http://blog.whoisireland.com Ireland * Irish Domain Stats & Market Research ****************************************************** From jeremie at jabber.org Fri Sep 28 13:56:13 2007 From: jeremie at jabber.org (jer) Date: Fri, 28 Sep 2007 08:56:13 -0500 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <46FCE198.3060700@hackwatch.com> References: <46FB8B5C.4080207@hackwatch.com> <46FCE198.3060700@hackwatch.com> Message-ID: <4E1A0A4C-F30A-4BBD-8F5E-B50A77D32CD8@jabber.org> > It is early here and I'm not sure if I understand this correctly. > > a: What you are saying is that the snapshots of crawled webpages > will be available for download by anyone? > > b: Wikiasearch denies any responsibility for the end use or abuse > of this freely available set of the (copyrighted) data of others? > > c: Wikiasearch hopes that people will be nice with the data and not > abuse it and if someone abuses it, the community will admonish them > after the fact? > > d: Wikiasearch will be repackaging the copyrighted works of others > and making them available for download? Wow, you really tried hard to make it sound so awful :) Yes to all of the above, it'll of course follow all the rules like current cached results and the Wayback Machine do as far as copyright goes, and anyone can already crawl, I just want to lower the barrier and increase the collaboration between those trying to do good with it, be it search or research. >> A great ecosystem to be part of by the way, if any of the work we >> do helps any of them succeed then I'll be very happy :) > > The whole "ecosystem" term is one that is greatly abused. The > search industry is more like a group of medieval warring city > states and countries. Each one is desperately fighting for its own > survival in a highly competitive and dangerous environment. Many > will remain at the warring city state level for years and only a > few will rise to the empire stage of Google or Yahoo. And what a horrible state to be in, if an open economy can't make a dent here to improve this situation in a few years then so be it, I still feel it really important to try (and it's obviously needed). > I think that you still do not understand the mentality of a search > engine developer. Most of us would consider the market in which we > operate in terms of threats and opportunities. These search > startups are all busy working on search products. Some will succeed > and many will fail. It is a business with a very high attrition > rate. Those who survive tend to be somewhat cynical about new > projects. We have to develop a kind of survival instinct that > enables us to quickly determine what will work in the market and > what will not. If any of the open source tools or services don't provide a competitive advantage then they don't deserve to succeed either. Ultimately, I'd rather see more search startups all enjoyably working on adding unique value to the whole network, not trying to re-build another entire empire. > A compliment as dreadful as my poor grammar. :) Perhaps the manage, > crawl and repackage aspect is the only innovative angle. > > But in the end, what makes Wikiasearch different from those who > compile spam lists? One of the main problems that search engines > have with their indices is dealing effectively with spam. Your > Wikiasearch project will, realistically, add to that search engine > spam problem by providing the MFAs with the copyright content of > others. Has Wikiasearch really thought about the implications both > for the search ecosystem and from a legal point of view? If an open community of enthusiasts can't collectively add value here, and can't monitor for abuse, then we've done something wrong... perhaps only time will tell. Jer From jmcc at hackwatch.com Fri Sep 28 15:13:06 2007 From: jmcc at hackwatch.com (John McCormac) Date: Fri, 28 Sep 2007 16:13:06 +0100 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <4E1A0A4C-F30A-4BBD-8F5E-B50A77D32CD8@jabber.org> References: <46FB8B5C.4080207@hackwatch.com> <46FCE198.3060700@hackwatch.com> <4E1A0A4C-F30A-4BBD-8F5E-B50A77D32CD8@jabber.org> Message-ID: <46FD1A02.70301@hackwatch.com> jer wrote: > Wow, you really tried hard to make it sound so awful :) > > Yes to all of the above, it'll of course follow all the rules like > current cached results and the Wayback Machine do as far as copyright > goes, and anyone can already crawl, I just want to lower the barrier > and increase the collaboration between those trying to do good with it, > be it search or research. The Wayback Machine does tend to limit access to website only. But then you have to know the website you want to check. The Wikiasearch project wants to provide unfettered access to the data. That's the disturbing part for search engine operators - what is to stop an MFA using this data to flood search engines? > And what a horrible state to be in, Yeah. But that's reality, Jer. :) But in order to survive we all have to have some warrior spirit (and the idea that we can win) and the empires and city states is the best way to describe the search market at the moment. Search empires rise and fall (altavista etc). Country level and niche search engines tend to dominate if successful and they will fight ruthlessly to protect their markets. Alliances are formed and sometimes, the big empires like Google, Yahoo and Microsoft can be defeated on small battlegrounds. Unless you've gone head to head against the major search engines, it is difficult to understand the mentality. Happy clappy we are not. Everything we do is geared towards survival. > If an open community of enthusiasts can't collectively add value here, > and can't monitor for abuse, then we've done something wrong... perhaps > only time will tell. Well it should be interesting to see. How exactly can a bunch of enthusiasts monitor for abuse? The fundamental flaw in this is that search and the facilities to detect and remove spam and abuse are highly automated. Wikiasearch seems to be going back to the Infinite Monkeys approach of replacing highly automated systems with manual ones. Regards...jmcc -- ****************************************************** John McCormac * e-mail: jmcc at whoisireland.com MC2 * voice: +353-51-873640 22 Viewmount * web: http://www.whoisireland.com/ Waterford * blog: http://blog.whoisireland.com Ireland * Irish Domain Stats & Market Research ****************************************************** From fred.benenson at gmail.com Fri Sep 28 15:34:52 2007 From: fred.benenson at gmail.com (Fred Benenson) Date: Fri, 28 Sep 2007 11:34:52 -0400 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <46FD1A02.70301@hackwatch.com> References: <46FB8B5C.4080207@hackwatch.com> <46FCE198.3060700@hackwatch.com> <4E1A0A4C-F30A-4BBD-8F5E-B50A77D32CD8@jabber.org> <46FD1A02.70301@hackwatch.com> Message-ID: <8e447b720709280834r6540e63chfe830328c04e275e@mail.gmail.com> > Well it should be interesting to see. How exactly can a bunch of > enthusiasts monitor for abuse? This, of all the questions, is least worrying to me about Wikiasearch. Wikipedia, one of the most highly trafficked websites, and probably the most likely to be abused of all the highly trafficked websites, manages to handle this problem pretty well. Jimmy will tell you that vanity spam / clueless users are the biggest offenders, not so much automated abuse bots. If a system is open enough and has enough of a dedicated user base, all abuses are shallow. F -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20070928/485f0eaa/attachment.html From jeremie at jabber.org Fri Sep 28 15:46:18 2007 From: jeremie at jabber.org (jer) Date: Fri, 28 Sep 2007 10:46:18 -0500 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <46FD1A02.70301@hackwatch.com> References: <46FB8B5C.4080207@hackwatch.com> <46FCE198.3060700@hackwatch.com> <4E1A0A4C-F30A-4BBD-8F5E-B50A77D32CD8@jabber.org> <46FD1A02.70301@hackwatch.com> Message-ID: <081F2C69-A3AC-4B46-9076-10D3C2D50896@jabber.org> > The Wayback Machine does tend to limit access to website only. But > then you have to know the website you want to check. The > Wikiasearch project wants to provide unfettered access to the data. Alexa has been selling access to this for years, fettered by money. We'll have open access, fettered by the social dynamics of a wiki. > That's the disturbing part for search engine operators - what is to > stop an MFA using this data to flood search engines? Us. >> And what a horrible state to be in, > > Yeah. But that's reality, Jer. :) But in order to survive we all > have to have some warrior spirit (and the idea that we can win) and > the empires and city states is the best way to describe the search > market at the moment. Search empires rise and fall (altavista etc). > Country level and niche search engines tend to dominate if > successful and they will fight ruthlessly to protect their markets. > Alliances are formed and sometimes, the big empires like Google, > Yahoo and Microsoft can be defeated on small battlegrounds. Unless > you've gone head to head against the major search engines, it is > difficult to understand the mentality. Happy clappy we are not. > Everything we do is geared towards survival. I'm not sure in what secret meeting you were appointed minister of all small-search-operators, but I've met lots of others and feel bad for how you're portraying them. I will only speak for myself though, I'm just trying to do something helpful for the whole search industry, it's time to evolve beyond your ruthless empires. Search is too important to humanity to be left to waste like this. >> If an open community of enthusiasts can't collectively add value >> here, and can't monitor for abuse, then we've done something >> wrong... perhaps only time will tell. > > Well it should be interesting to see. How exactly can a bunch of > enthusiasts monitor for abuse? The fundamental flaw in this is that > search and the facilities to detect and remove spam and abuse are > highly automated. Wikiasearch seems to be going back to the > Infinite Monkeys approach of replacing highly automated systems > with manual ones. It will be interesting, fun and useful I sure hope, and it's not just people, it's a combination of both open source tools/automation *and* people helping to guide/correct it. Jer From jmcc at hackwatch.com Fri Sep 28 15:58:33 2007 From: jmcc at hackwatch.com (John McCormac) Date: Fri, 28 Sep 2007 16:58:33 +0100 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <8e447b720709280834r6540e63chfe830328c04e275e@mail.gmail.com> References: <46FB8B5C.4080207@hackwatch.com> <46FCE198.3060700@hackwatch.com> <4E1A0A4C-F30A-4BBD-8F5E-B50A77D32CD8@jabber.org> <46FD1A02.70301@hackwatch.com> <8e447b720709280834r6540e63chfe830328c04e275e@mail.gmail.com> Message-ID: <46FD24A9.5060402@hackwatch.com> Fred Benenson wrote: > > Well it should be interesting to see. How exactly can a bunch of > enthusiasts monitor for abuse? > > > This, of all the questions, is least worrying to me about Wikiasearch. > Wikipedia, one of the most highly trafficked websites, and probably the > most likely to be abused of all the highly trafficked websites, manages > to handle this problem pretty well. Jimmy will tell you that vanity spam > / clueless users are the biggest offenders, not so much automated abuse > bots. If a system is open enough and has enough of a dedicated user > base, all abuses are shallow. Fred, This might work for a community site such as Wikipedia or a bulletin board but a site that provides raw dumps of segments of the web will be abused by MFA operators and others. The monitoring of search indices for spam is something that is done by the search engine company itself. It does provide a feedback url where users can help. But the systems are highly automated. How will the community detect the spam on search engines and link it back to someone from the community? The number of search engine developers in any country is going to be a small figure anyway so that would limit the size of the community. The problem is that once the repackaged content has been taken and reused by an MFA, the damage is done. Regards...jmcc -- ****************************************************** John McCormac * e-mail: jmcc at whoisireland.com MC2 * voice: +353-51-873640 22 Viewmount * web: http://www.whoisireland.com/ Waterford * blog: http://blog.whoisireland.com Ireland * Irish Domain Stats & Market Research ****************************************************** From jmcc at hackwatch.com Fri Sep 28 16:32:30 2007 From: jmcc at hackwatch.com (John McCormac) Date: Fri, 28 Sep 2007 17:32:30 +0100 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <081F2C69-A3AC-4B46-9076-10D3C2D50896@jabber.org> References: <46FB8B5C.4080207@hackwatch.com> <46FCE198.3060700@hackwatch.com> <4E1A0A4C-F30A-4BBD-8F5E-B50A77D32CD8@jabber.org> <46FD1A02.70301@hackwatch.com> <081F2C69-A3AC-4B46-9076-10D3C2D50896@jabber.org> Message-ID: <46FD2C9E.6030709@hackwatch.com> jer wrote: > Alexa has been selling access to this for years, fettered by money. > We'll have open access, fettered by the social dynamics of a wiki. Yes. And Alexa has been banned by many site owners as a result. >> That's the disturbing part for search engine operators - what is to >> stop an MFA using this data to flood search engines? > > Us. And you will do this how? > I'm not sure in what secret meeting you were appointed minister of all > small-search-operators, but I've met lots of others and feel bad for > how you're portraying them. I will only speak for myself though, I'm I take it you are not a member of the illuminati (Small Search Engine working group) then? :) Perhaps I am somewhat more vocal than others in the small search engine business but I'd guess that there are other small search engine operators on the list. I have a lot of respect for people in this business who have managed to survive because I know how tough it is. Therefore I consider it somewhat patronising that Wikiasearch would tell everyone that we should Open Source all our expertise, knowledge and data for the financial benefit of Wikiasearch and its investors without making it clear that the chief benefactor of all this "Open Sourcing" is not the search engine business or the search engine users or humanity in general but rather Wikiasearch and its investors. Maybe I am being too cynical. > just trying to do something helpful for the whole search industry, it's > time to evolve beyond your ruthless empires. Search is too important > to humanity to be left to waste like this. Tell that to Google. Tell that to Yahoo and Microsoft. Most search engines don't set out to become ruthless empires. They become ruthless empires when they achieve significant marketshares. But they have to survive and grow first. Search is a business. > It will be interesting, fun and useful I sure hope, and it's not just > people, it's a combination of both open source tools/automation *and* > people helping to guide/correct it. But the damage done by the MFAs will be inflicted on other search engines. So how does Wikiasearch intend to help repair that damage? Regards...jmcc -- ****************************************************** John McCormac * e-mail: jmcc at whoisireland.com MC2 * voice: +353-51-873640 22 Viewmount * web: http://www.whoisireland.com/ Waterford * blog: http://blog.whoisireland.com Ireland * Irish Domain Stats & Market Research ****************************************************** From jmcc at hackwatch.com Fri Sep 28 19:17:55 2007 From: jmcc at hackwatch.com (John McCormac) Date: Fri, 28 Sep 2007 20:17:55 +0100 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <182351cf0709281150y7cb349aei8cfedeb84d562a75@mail.gmail.com> References: <46FB8B5C.4080207@hackwatch.com> <46FCE198.3060700@hackwatch.com> <4E1A0A4C-F30A-4BBD-8F5E-B50A77D32CD8@jabber.org> <46FD1A02.70301@hackwatch.com> <081F2C69-A3AC-4B46-9076-10D3C2D50896@jabber.org> <46FD2C9E.6030709@hackwatch.com> <182351cf0709281150y7cb349aei8cfedeb84d562a75@mail.gmail.com> Message-ID: <46FD5363.9010907@hackwatch.com> Jason Pump wrote: > I don't see how this is any more of a problem then ODP or wikipedia. The ODP dump is just a set of URLs with descriptions and categories. What Wikiasearch seems to want to do is to make the URLs *and* their content available in a package. Regards...jmcc -- ****************************************************** John McCormac * e-mail: jmcc at whoisireland.com MC2 * voice: +353-51-873640 22 Viewmount * web: http://www.whoisireland.com/ Waterford * blog: http://blog.whoisireland.com Ireland * Irish Domain Stats & Market Research ****************************************************** From jmcc at hackwatch.com Fri Sep 28 21:31:38 2007 From: jmcc at hackwatch.com (John McCormac) Date: Fri, 28 Sep 2007 22:31:38 +0100 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <182351cf0709281302q3423ba8dtf7e7677232cc9d9a@mail.gmail.com> References: <46FB8B5C.4080207@hackwatch.com> <46FCE198.3060700@hackwatch.com> <4E1A0A4C-F30A-4BBD-8F5E-B50A77D32CD8@jabber.org> <46FD1A02.70301@hackwatch.com> <081F2C69-A3AC-4B46-9076-10D3C2D50896@jabber.org> <46FD2C9E.6030709@hackwatch.com> <182351cf0709281150y7cb349aei8cfedeb84d562a75@mail.gmail.com> <46FD5363.9010907@hackwatch.com> <182351cf0709281302q3423ba8dtf7e7677232cc9d9a@mail.gmail.com> Message-ID: <46FD72BA.80206@hackwatch.com> Jason Pump wrote: > I'm unclear, in terms of search engine bait, what is the difference? The difference is in an MFAer effectively having to build a bow and arrow from scratch and being handed a fully loaded M60 machinegun with almost unlimited ammunition. > This is a huge problem on the net right now. I believe the solution to > that problem is to force/teach the monkeys to actually do something > useful with their time. Google has done nothing but exasperate the > problem with their business model. This or something similar could be a > solution to the problem that google has caused. That would be useful. However it could also increase the level of duplicate content on the web because of the way that the harvested content would be fed back into the webscape. Grub is a crawler and detecting duplicate content will take a lot backend of processing. > One thing to consider is that googles adsense product has generated a > barrier to entry to smaller web search engines by making it profitable > to propogate search engine spam on the net. Googles pageweight map > cannot be recreated at this point because they have previous knowledge > about the state of the web before their changes to their business model > caused the web graph to change. Do you really think that (fairly large > and evil thing ) was an accident? Google's original idea was great right up to the moment that people figured out how to game it. The monetisation of the web would have happened with or without Google Adsense. However Adsense did accelerate that change and brought more players into the same market. There was a terrible inevitability about it all but to attribute it to Google being evil might be going a bit far. The web is not a static model - it is continually changing. Regards...jmcc -- ****************************************************** John McCormac * e-mail: jmcc at whoisireland.com MC2 * voice: +353-51-873640 22 Viewmount * web: http://www.whoisireland.com/ Waterford * blog: http://blog.whoisireland.com Ireland * Irish Domain Stats & Market Research ****************************************************** From sethf at sethf.com Fri Sep 28 23:46:57 2007 From: sethf at sethf.com (Seth Finkelstein) Date: Fri, 28 Sep 2007 19:46:57 -0400 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: References: <46FB8B5C.4080207@hackwatch.com> <70b3cf150709271739y58a1ebb4uda104a0a2caa6b0@mail.gmail.com> Message-ID: <20070928234657.GA21065@sethf.com> On Fri, Sep 28, 2007 at 02:28:08AM -0500, jer wrote: > I'm not really sure who wrote what parts of the text below but it doesn't > matter, there's some deeper history and context here that I don't have and > will continue to ignore. These angles be made to sound as highbrow as you > want but I still find this rather immature and unhelpful. I've been pondering whether to send a defense, since I've been asked not to do buzz-harshing posts (my term), and I don't want to antagonize the Wikia people *unnecessarily*. The original is a blog post of mine: http://sethf.com/infothought/blog/archives/001262.html I think I posed reasonable questions. There is of course no obligation to answer them. But I do not see them as in any way "immature and unhelpful" in terms of fair comment. Indeed, one of my overall critiques could be phrased in terms of opposing attempts to foster an "immature", in the sense of "naive and trusting", attitude. And the very exploitative nature of exactly where that's "helpful". Actually, I didn't even think they were particularly tough questions. Maybe they seem so compared to typical journalistic cliches ("Can you really work miracles?" "I'm going to give it 110%"). Note: You are an employee (assuming the word "hired" is accurate) of Wikia, which is a company backed with $14 million dollars of venture capital investment. That's a simple fact. You may have stock options. That's not known for a fact, but it's a reasonable speculation. There's nothing wrong with either of those, quite the opposite. Though you may love the project you're working on with all your heart and soul, but you're still being paid (and more?) in compensation for your effort. And that's a different relationship, as far as is known, than anyone else. > To make this damn clear, I'm not here to build another search engine, I'm > here to make sure that everyone else can. This isn't about one company or > one person, it's just about making the tools and services for search more > open and accessible to *everybody*. That's a laudable sentiment. But, nonetheless, money must come from somewhere, and we shouldn't pretend otherwise. -- Seth Finkelstein Consulting Programmer http://sethf.com/ Infothought blog - http://sethf.com/infothought/blog/ Interview: http://sethf.com/essays/major/greplaw-interview.php From jwales at wikia.com Sat Sep 29 00:14:18 2007 From: jwales at wikia.com (Jimmy Wales) Date: Sat, 29 Sep 2007 09:14:18 +0900 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <20070928234657.GA21065@sethf.com> References: <46FB8B5C.4080207@hackwatch.com> <70b3cf150709271739y58a1ebb4uda104a0a2caa6b0@mail.gmail.com> <20070928234657.GA21065@sethf.com> Message-ID: <46FD98DA.2070206@wikia.com> Though I agree with Jeremie that the questions are a bit silly, I think they are easy enough to answer: > 1) Roughly, how many people will be *paid* on the project? Like any organization, this will depend on the level of success at each stage of the project. As we ramp up, we will hire more and more people. Google has more than 10,000 employees. An open source search engine will create it's own ecosystem of competition so even if we are successful, we would not expect to achieve Google level of market share, and so I doubt if we would end up with that many employees. And if we are unsuccessful, then we will eventually end up with zero employees. I will have to get a job flipping burgers. Jeremie will pursue his basketball career. Gil, of course, will be doing his stand up comedy routine. :) What I am saying: I really don't understand what the question means, other than that. Our number of employees will be dependent on the course of the project and on the success. Just like every other organization on the planet. > 1b) Can you specify whether at developed vs. developing economy pay >scales? We attempt to follow ethical hiring practices around the world. Like Google, Yahoo, and I suppose every company, we try to pay to get the best people... for us, this means trying to pay on the high end of the scale necessary to attract the best talent... One thing we have a moral commitment to doing is offering stock options even in countries and cultures where this is not the norm. > 2) Do you plan to hire anyone with search engine development >expertise? Yes, of course. What a strange question. > 3) Do you think there's a cultural conflict between Wikipedia's >model of operating, where in theory nobody owns any articles, and code >development, where typically specific people "own" various subsystems? >Which path do you plan to try to follow? I do not anticipate that we would attempt a "wikipedia model" for code development. Traditional open source development models are well-tested and proven. One objective for the search project is to find ways to push editorial judgments (and even in the selection of algorithms and parameters for algorithms there is editorial judgments) into the public for transparency and community control. Just exactly how to do that is one of the interesting questions we will need to solve. --Jimbo From sethf at sethf.com Sat Sep 29 01:14:58 2007 From: sethf at sethf.com (Seth Finkelstein) Date: Fri, 28 Sep 2007 21:14:58 -0400 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <46FD98DA.2070206@wikia.com> References: <46FB8B5C.4080207@hackwatch.com> <70b3cf150709271739y58a1ebb4uda104a0a2caa6b0@mail.gmail.com> <20070928234657.GA21065@sethf.com> <46FD98DA.2070206@wikia.com> Message-ID: <20070929011458.GA23181@sethf.com> On Sat, Sep 29, 2007 at 09:14:18AM +0900, Jimmy Wales wrote: > Though I agree with Jeremie that the questions are a bit silly, I > think they are easy enough to answer: [Thanks for the reply - Jeremie, learn from the master! 1/2 :-)] >> 1) Roughly, how many people will be *paid* on the project? > ... > What I am saying: I really don't understand what the question means, > other than that. Our number of employees will be dependent on the > course of the project and on the success. Just like every other > organization on the planet. I had elaborated in my original that it was based on this news report, I probably should have included that in the post: http://www.pcworld.com/article/id,136082-c,sites/article.html "Actually, we might spend a quarter million dollars this year on the ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ whole search project, on the engineers we pay, the search we run. ..." ^^^^^^^^^^^^^^^^^^^^^^^ I was wondering, e.g. how many engineers were meant in that statement. There's no "gotcha" hiding there. It was in "good faith". I didn't phrase it in a lawyer-like manner because I didn't expect that to be necessary. >> 1b) Can you specify whether at developed vs. developing economy pay >>scales? > > We attempt to follow ethical hiring practices around the world. Like > Google, Yahoo, and I suppose every company, we try to pay to get the best > people... for us, this means trying to pay on the high end of the scale > necessary to attract the best talent... > > One thing we have a moral commitment to doing is offering stock options > even in countries and cultures where this is not the norm. And every company faces issues of outsourcing and offshoring. >> 2) Do you plan to hire anyone with search engine development >>expertise? > > Yes, of course. What a strange question. Not a strange question at all, given some of the list discussion, and many of the weaknesses John McCormac has pointed out. >> 3) Do you think there's a cultural conflict between Wikipedia's >> model of operating, where in theory nobody owns any articles, and code >> development, where typically specific people "own" various subsystems? >> Which path do you plan to try to follow? > > I do not anticipate that we would attempt a "wikipedia model" for code > development. Traditional open source development models are well-tested > and proven. OK. I'm honestly relieved to know there's no plans to treat the codebase as if it were a Wikipedia article. I wasn't sure. > One objective for the search project is to find ways to push editorial > judgments (and even in the selection of algorithms and parameters for > algorithms there is editorial judgments) into the public for transparency > and community control. Just exactly how to do that is one of the > interesting questions we will need to solve. Yes, understood, but there's a huge amount of infrastructural support work that needs to be operational and maintained before that becomes a limiting factor in result quality. -- Seth Finkelstein Consulting Programmer http://sethf.com/ Infothought blog - http://sethf.com/infothought/blog/ Interview: http://sethf.com/essays/major/greplaw-interview.php From jwales at wikia.com Sat Sep 29 07:29:22 2007 From: jwales at wikia.com (Jimmy Wales) Date: Sat, 29 Sep 2007 16:29:22 +0900 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <46FCEE0E.8030600@hackwatch.com> References: <46FB8B5C.4080207@hackwatch.com> <70b3cf150709271739y58a1ebb4uda104a0a2caa6b0@mail.gmail.com> <46FCEE0E.8030600@hackwatch.com> Message-ID: <46FDFED2.6010609@wikia.com> John McCormac wrote: > Seth's point about Wikia programming jobs being outsourced to Poland is > an important one. I wonder how American programmers feel about having > their jobs "Open Sourced" to Poland in the name of Wikia's profit. > Despite all your happy clappy cheer leading Jer, that is a sad betrayal > of American programmers by Wikia. All of my work, and indeed my entire world view, is global in scope. I make no apologies for Wikia being an global company with a global outlook from day one. --Jimbo From jwales at wikia.com Sat Sep 29 07:32:36 2007 From: jwales at wikia.com (Jimmy Wales) Date: Sat, 29 Sep 2007 16:32:36 +0900 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <46FD1A02.70301@hackwatch.com> References: <46FB8B5C.4080207@hackwatch.com> <46FCE198.3060700@hackwatch.com> <4E1A0A4C-F30A-4BBD-8F5E-B50A77D32CD8@jabber.org> <46FD1A02.70301@hackwatch.com> Message-ID: <46FDFF94.80803@wikia.com> John McCormac wrote: > Well it should be interesting to see. How exactly can a bunch of > enthusiasts monitor for abuse? The fundamental flaw in this is that > search and the facilities to detect and remove spam and abuse are highly > automated. Wikiasearch seems to be going back to the Infinite Monkeys > approach of replacing highly automated systems with manual ones. I think you are really not getting what our thinking is here. Of course many aspects of what are need in search are automatable and automated. The things that computers do best, computers should do. And those things are subject to human oversight and judgment. The point is not to replace computers with humans doing tedious work, but to recognize that automated systems are not value-neutral, and that the editorial decision making in search should be done in an open, transparent, and participatory way. --Jimbo From jwales at wikia.com Sat Sep 29 07:35:40 2007 From: jwales at wikia.com (Jimmy Wales) Date: Sat, 29 Sep 2007 16:35:40 +0900 Subject: [Search-l] Search Wikia Launch Date In-Reply-To: <46DB4059.3000607@hackwatch.com> References: <46DAD27B.9000507@gmail.com> <46DB4059.3000607@hackwatch.com> Message-ID: <46FE004C.4070208@wikia.com> John McCormac wrote: > True. Nearly a year after the shock and awe of the press barrage > started, there is finally a release date. Now if only there was a > product. Wikia could be the next Microsoft... ;) John, I know you enjoy being snarky and all, but the idea of a launch in December has been what I have been saying to the press for months and months and months now. There has been no new announcement. --Jimbo From sethf at sethf.com Sat Sep 29 11:38:52 2007 From: sethf at sethf.com (Seth Finkelstein) Date: Sat, 29 Sep 2007 07:38:52 -0400 Subject: [Search-l] Search Wikia Launch Date In-Reply-To: <46FE004C.4070208@wikia.com> References: <46DAD27B.9000507@gmail.com> <46DB4059.3000607@hackwatch.com> <46FE004C.4070208@wikia.com> Message-ID: <20070929113852.GA25469@sethf.com> On Sat, Sep 29, 2007 at 04:35:40PM +0900, Jimmy Wales wrote: > John McCormac wrote: > > True. Nearly a year after the shock and awe of the press barrage > > started, there is finally a release date. Now if only there was a > > product. Wikia could be the next Microsoft... ;) > > John, I know you enjoy being snarky and all, but the idea of a launch in > December has been what I have been saying to the press for months and > months and months now. There has been no new announcement. Now, now, this is no trick at all: February 5, 2007 http://www.informationweek.com/news/showArticle.jhtml?articleID=197003494 He said Monday that he believes the first beta will launch within a few months. March 9, 2007 http://www.infoworld.com/article/07/03/09/HNwikiasearch_1.html "Probably what we'll do is launch something in the fourth quarter of this year ..." July 27, 2007 http://www.reuters.com/article/internetNews/idUSN272470320070727 ... when the company launches a public version of the search site toward the end of 2007, Wales said in a phone interview. September 2, 2007 http://business.timesonline.co.uk/tol/business/industry_sectors/media/article2367254.ece In December, he will launch Wikia Search ... It's OK. Just as code is 90% done for 90% of the time, product launch dates can be 3 to 6 months away for many years. -- Seth Finkelstein Consulting Programmer http://sethf.com/ Infothought blog - http://sethf.com/infothought/blog/ Interview: http://sethf.com/essays/major/greplaw-interview.php From jmcc at hackwatch.com Sat Sep 29 15:24:36 2007 From: jmcc at hackwatch.com (John McCormac) Date: Sat, 29 Sep 2007 16:24:36 +0100 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <46FDFF94.80803@wikia.com> References: <46FB8B5C.4080207@hackwatch.com> <46FCE198.3060700@hackwatch.com> <4E1A0A4C-F30A-4BBD-8F5E-B50A77D32CD8@jabber.org> <46FD1A02.70301@hackwatch.com> <46FDFF94.80803@wikia.com> Message-ID: <46FE6E34.1080701@hackwatch.com> Jimmy Wales wrote: > John McCormac wrote: > >> Well it should be interesting to see. How exactly can a bunch of >> enthusiasts monitor for abuse? The fundamental flaw in this is that >> search and the facilities to detect and remove spam and abuse are >> highly automated. Wikiasearch seems to be going back to the Infinite >> Monkeys approach of replacing highly automated systems with manual ones. > > > I think you are really not getting what our thinking is here. Which is the reason for most of my questions here Jimmy, Some of the stuff like the commoditisation of search where Wikiasearch repackages web content for use by anyone is one of the hardest things to understand given the potential misuse of the data. > The point is not to replace computers with humans doing tedious work, > but to recognize that automated systems are not value-neutral, and that > the editorial decision making in search should be done in an open, > transparent, and participatory way. It sounds like something that is similar to web directories where people can vote for particular sites. Is the web content that Wikiasearch harvests going to be made available in bulk to everyone? The problem with successful search engines is that their algorithms, once people know and understand them, will be gamed. It therefore becomes a continual process of upgrade and modification. This is where I wonder if you and Jer have any idea of the process of running a search engine in real time. With success comes greater responsibilities, opportunities, competition and threats. What safeguards will there be to stop its abuse of the repackaged web content? What is to stop Wikiasearch from ending up like ODP with its editorial structure? Is there any chance that you and Jer can create something like a Wikiasearch Manifesto or even a FAQ on the whole project? Regards...jmcc -- ****************************************************** John McCormac * e-mail: jmcc at whoisireland.com MC2 * voice: +353-51-873640 22 Viewmount * web: http://www.whoisireland.com/ Waterford * blog: http://blog.whoisireland.com Ireland * Irish Domain Stats & Market Research ****************************************************** From terry at jon.es Sun Sep 30 00:07:23 2007 From: terry at jon.es (Terry Jones) Date: Sun, 30 Sep 2007 02:07:23 +0200 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: Your message at 13:05:34 on Friday, 28 September 2007 References: <46FB8B5C.4080207@hackwatch.com> <70b3cf150709271739y58a1ebb4uda104a0a2caa6b0@mail.gmail.com> <46FCEE0E.8030600@hackwatch.com> Message-ID: <18174.59579.173065.418132@terry.local> Hi John I really appreciate your presence, persistence, and attitude on this mailing list (in fact, without you I might already have unsubscribed), but I find this a bit odd: >>>>> "John" == John McCormac writes: John> Seth's point about Wikia programming jobs being outsourced to Poland John> is an important one. I wonder how American programmers feel about John> having their jobs "Open Sourced" to Poland in the name of Wikia's John> profit. Despite all your happy clappy cheer leading Jer, that is a John> sad betrayal of American programmers by Wikia. You're talking about positions that are (I gather) not yet filled. It's not as though a bunch of Americans are happily working for Wikia and are about to be made redundant by a sudden decision to outsource their jobs to Poland. If what I've just written is accurate (I should know more about Wikia, I guess), then in what sense are those potential positions "their" jobs? Did investors Bessemer (with offices in Bangalore, Mumbai, and Shanghai) and Amazon, see a business plan that included hiring people globally? Do they regard their money as being reserved for the hiring of Americans? Do they feel betrayed? If they saw such a business plan, and nevertheless invested, maybe you'd claim that those investors betrayed American programmers too? In summary, I don't think those to-be-created jobs belong to Americans (i.e., are not "their jobs") in any sense, I don't see any betrayal, and I don't see anything at all sad. So to me this comment (unlike your others) feels like a bit of a knee-jerk reaction. Regards, Terry From newsmarkie at googlemail.com Sun Sep 30 14:19:24 2007 From: newsmarkie at googlemail.com (Wikinews Markie) Date: Sun, 30 Sep 2007 15:19:24 +0100 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <46FCEE0E.8030600@hackwatch.com> References: <46FB8B5C.4080207@hackwatch.com> <70b3cf150709271739y58a1ebb4uda104a0a2caa6b0@mail.gmail.com> <46FCEE0E.8030600@hackwatch.com> Message-ID: John wrote: But getting back to the core issues - what tools and services (beyond > the repackaged content and managed/targeted crawling) is Wikia going to > provide? Nutch, Mnogosearch, Datapark and a few others already offer the > search engine software. It is possible to get a relatively high spec > server for a few hundred Dollars a month these days and the price of > bandwidth has fallen dramatically in the last few years. So what will > Wikia offer that adds to what is already available? well if you look at this page - http://search.wikia.com/wiki/Lab_Servers you will see that there are servers hosted by wikia for use by people wishing to contriubte to the project. you can request access by adding your name to the list at the bottom and then emailing jer with a helpful hint to give you access. thanks Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20070930/31c78e12/attachment.html From jmcc at hackwatch.com Sun Sep 30 16:07:28 2007 From: jmcc at hackwatch.com (John McCormac) Date: Sun, 30 Sep 2007 17:07:28 +0100 Subject: [Search-l] Short interview with Jeremie Miller In-Reply-To: <18174.59579.173065.418132@terry.local> References: <46FB8B5C.4080207@hackwatch.com> <70b3cf150709271739y58a1ebb4uda104a0a2caa6b0@mail.gmail.com> <46FCEE0E.8030600@hackwatch.com> <18174.59579.173065.418132@terry.local> Message-ID: <46FFC9C0.4090108@hackwatch.com> Terry Jones wrote: >>>>>>"John" == John McCormac writes: > > John> Seth's point about Wikia programming jobs being outsourced to Poland > John> is an important one. I wonder how American programmers feel about > John> having their jobs "Open Sourced" to Poland in the name of Wikia's > John> profit. Despite all your happy clappy cheer leading Jer, that is a > John> sad betrayal of American programmers by Wikia. > > You're talking about positions that are (I gather) not yet filled. It's not > as though a bunch of Americans are happily working for Wikia and are about > to be made redundant by a sudden decision to outsource their jobs to > Poland. If what I've just written is accurate (I should know more about I guess it was all this talk of Open Source and things being done for the good of humanity that triggered it Terry, It is the gulf between the idea that all this search stuff is being done for the betterment of the human condition and the reality of jobs being outsourced because the same job can be done cheaper elsewhere just so that Wikia and its investors can make a profit. I don't have any objections to Open Source being used to make a profit. It would be better if it was made clear how this venture intended to make a profit. > In summary, I don't think those to-be-created jobs belong to Americans > (i.e., are not "their jobs") in any sense, I don't see any betrayal, and I > don't see anything at all sad. Perhaps. But all the other major search engines seem to grow outwards, starting small, hiring people locally and then expanding into other markets and hiring people there. Wikiasearch seems to wrap itself in the flag of Open Source and community while its primary objective is profit. It seems to want to use the work of the community of enthusiasts as unpaid sweatshop labour to make itself and its investors rich or richer. The sad part is that the decision seems to have already been made and the (notional) American programmers hadn't even a chance. A lot of people here in Europe would know people who have lost their jobs through outsourcing. It can be quite an emotive topic. The ironic thing is that with the falling US Dollar and the rising Euro, the Polish programmers may turn out to be more expensive than American programmers. But that is business. My core objection is to the hypocrisy of claiming that this is all being done for the betterment of humanity when it is really just a business that intends to use the time, wisdom and expertise of the crowd for its own enrichment and, most importantly, for free. And those Polish programmers might, in turn, find themselves being replaced because Indian programmers could be even cheaper still to employ. Regards...jmcc -- ****************************************************** John McCormac * e-mail: jmcc at whoisireland.com MC2 * voice: +353-51-873640 22 Viewmount * web: http://www.whoisireland.com/ Waterford * blog: http://blog.whoisireland.com Ireland * Irish Domain Stats & Market Research ******************************************************