From jeremie at jabber.org Mon Oct 1 06:28:27 2007 From: jeremie at jabber.org (jer) Date: Mon, 1 Oct 2007 01:28:27 -0500 Subject: [Search-l] the concept of a wiki mini article for search results Message-ID: <1DC13198-C70D-4F94-AD16-066C42856D48@jabber.org> I take no credit for this idea, it's actually Jimmy's and has been bouncing around in my head since he mentioned it: The concept is quite simple, what if there were a placeholder for miniature wiki articles above any search result? These would be very small and could serve as a very simple human-powered search function. The idea isn't to create a summary article for every single popular keyword, in fact it's almost the opposite, to create articles for only the search terms that are the most difficult and generally don't work well. These mini articles would be treated like any normal wiki text and managed via the same customs everyone is already familiar with. Since they are just a short guide there would only need to be a few lines per article, and there may a class of common ones that become almost templates (like misspellings an automated system misses, or easy double-meaning disambig articles). It's also important that these articles aren't search results, they are just a special fixture intended to guide a searcher to the right results. Therefore the only real restriction is that they can't link to anything but other search terms. It's a big restriction though, and one well worth debating, as it would lower the attraction as a spamming target, but also lower the value when there is no search term to link to that will really help the searcher. I hope I didn't do the idea any injustice by explaining it poorly and Jimmy do jump in if so. I was hoping to have a little test area to play with this concept while we discuss is, but I think anything resembling a search in any kind of experimental form right now might get the wrong kind of attention :) Jer From emili.sapena at gmail.com Mon Oct 1 09:21:29 2007 From: emili.sapena at gmail.com (Emili) Date: Mon, 1 Oct 2007 11:21:29 +0200 Subject: [Search-l] the concept of a wiki mini article for search results In-Reply-To: <1DC13198-C70D-4F94-AD16-066C42856D48@jabber.org> References: <1DC13198-C70D-4F94-AD16-066C42856D48@jabber.org> Message-ID: <6573c9290710010221q6b58466ax874f5f5f2f75b09d@mail.gmail.com> I like the idea and I agree with the linking policy to avoid spam. However, some links to wikipedia or other referring sites (open street map for example for geological searches, some other site for images or videos, etc.) could be good. On 10/1/07, jer wrote: > I take no credit for this idea, it's actually Jimmy's and has been > bouncing around in my head since he mentioned it: > > The concept is quite simple, what if there were a placeholder for > miniature wiki articles above any search result? These would be very > small and could serve as a very simple human-powered search > function. The idea isn't to create a summary article for every > single popular keyword, in fact it's almost the opposite, to create > articles for only the search terms that are the most difficult and > generally don't work well. > > These mini articles would be treated like any normal wiki text and > managed via the same customs everyone is already familiar with. > Since they are just a short guide there would only need to be a few > lines per article, and there may a class of common ones that become > almost templates (like misspellings an automated system misses, or > easy double-meaning disambig articles). > > It's also important that these articles aren't search results, they > are just a special fixture intended to guide a searcher to the right > results. Therefore the only real restriction is that they can't link > to anything but other search terms. It's a big restriction though, > and one well worth debating, as it would lower the attraction as a > spamming target, but also lower the value when there is no search > term to link to that will really help the searcher. > > I hope I didn't do the idea any injustice by explaining it poorly and > Jimmy do jump in if so. I was hoping to have a little test area to > play with this concept while we discuss is, but I think anything > resembling a search in any kind of experimental form right now might > get the wrong kind of attention :) > > Jer > > > > _______________________________________________ > Search-l mailing list > Search-l at wikia.com > http://lists.wikia.com/mailman/listinfo/search-l > Change options or unsubscribe: http://lists.wikia.com/mailman/options/search-l > From tsuckow at gmail.com Mon Oct 1 16:01:48 2007 From: tsuckow at gmail.com (Thomas Suckow) Date: Mon, 1 Oct 2007 09:01:48 -0700 Subject: [Search-l] the concept of a wiki mini article for search results In-Reply-To: <41dbfd970710010900q2ce89caelfce2baddb4a0be7c@mail.gmail.com> References: <1DC13198-C70D-4F94-AD16-066C42856D48@jabber.org> <6573c9290710010221q6b58466ax874f5f5f2f75b09d@mail.gmail.com> <41dbfd970710010900q2ce89caelfce2baddb4a0be7c@mail.gmail.com> Message-ID: <41dbfd970710010901j7e6a724bmb294ebf8e65b97fa@mail.gmail.com> I like the idea, especially the idea of pointing to like a wikipedia article for encyclopedic search terms and maps for cities etc... Also, what would be awesome is if the search engine could use words in that area to improve the search results. On 10/1/07, Emili wrote: > > > > I like the idea and I agree with the linking policy to avoid spam. > > However, some links to wikipedia or other referring sites (open street > > map for example for geological searches, some other site for images or > > videos, etc.) could be good. > > > > > > On 10/1/07, jer wrote: > > > I take no credit for this idea, it's actually Jimmy's and has been > > > bouncing around in my head since he mentioned it: > > > > > > The concept is quite simple, what if there were a placeholder for > > > miniature wiki articles above any search result? These would be very > > > small and could serve as a very simple human-powered search > > > function. The idea isn't to create a summary article for every > > > single popular keyword, in fact it's almost the opposite, to create > > > articles for only the search terms that are the most difficult and > > > generally don't work well. > > > > > > These mini articles would be treated like any normal wiki text and > > > managed via the same customs everyone is already familiar with. > > > Since they are just a short guide there would only need to be a few > > > lines per article, and there may a class of common ones that become > > > almost templates (like misspellings an automated system misses, or > > > easy double-meaning disambig articles). > > > > > > It's also important that these articles aren't search results, they > > > are just a special fixture intended to guide a searcher to the right > > > results. Therefore the only real restriction is that they can't link > > > to anything but other search terms. It's a big restriction though, > > > and one well worth debating, as it would lower the attraction as a > > > spamming target, but also lower the value when there is no search > > > term to link to that will really help the searcher. > > > > > > I hope I didn't do the idea any injustice by explaining it poorly and > > > Jimmy do jump in if so. I was hoping to have a little test area to > > > play with this concept while we discuss is, but I think anything > > > resembling a search in any kind of experimental form right now might > > > get the wrong kind of attention :) > > > > > > Jer > > > > > > > > > > > > _______________________________________________ > > > Search-l mailing list > > > Search-l at wikia.com > > > http://lists.wikia.com/mailman/listinfo/search-l > > > Change options or unsubscribe: > > http://lists.wikia.com/mailman/options/search-l > > > > > _______________________________________________ > > Search-l mailing list > > Search-l at wikia.com > > http://lists.wikia.com/mailman/listinfo/search-l > > Change options or unsubscribe: > > http://lists.wikia.com/mailman/options/search-l > > > -- Thomas Suckow -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20071001/7991633e/attachment.html From wsurowiec at gmail.com Mon Oct 1 17:46:38 2007 From: wsurowiec at gmail.com (William Surowiec) Date: Mon, 01 Oct 2007 13:46:38 -0400 Subject: [Search-l] the concept of a wiki mini article for search results In-Reply-To: <1DC13198-C70D-4F94-AD16-066C42856D48@jabber.org> References: <1DC13198-C70D-4F94-AD16-066C42856D48@jabber.org> Message-ID: <4701327E.3070005@gmail.com> Jer, I would profit if you indicate how you see the interaction in the knugget food chain (factory, collector, broker) It sounds like this is something living at the broker but how strongly is it influenced by the factory and collector? Is it some type of add-on that depends on potential ambiguity discovered in query interpretation or result grouping (and who does the heavy lifting)? Bill jer wrote: ... > The concept is quite simple, what if there were a placeholder for > miniature wiki articles above any search result? These would be very > small and could serve as a very simple human-powered search > function. > ... > It's also important that these articles aren't search results, they > are just a special fixture intended to guide a searcher to the right > results. Therefore the only real restriction is that they can't link > to anything but other search terms. It's a big restriction though, > and one well worth debating, as it would lower the attraction as a > spamming target, but also lower the value when there is no search > term to link to that will really help the searcher. > > > ... From jeremie at jabber.org Mon Oct 1 18:34:44 2007 From: jeremie at jabber.org (jer) Date: Mon, 1 Oct 2007 13:34:44 -0500 Subject: [Search-l] the concept of a wiki mini article for search results In-Reply-To: <4701327E.3070005@gmail.com> References: <1DC13198-C70D-4F94-AD16-066C42856D48@jabber.org> <4701327E.3070005@gmail.com> Message-ID: <6DBFC71C-6BBB-473B-9610-B6FC5A343AEF@jabber.org> I see this as an external utility, entirely outside of the the Atlas stack. It's a feature of a search result listing and tied only to the search terms, not to the pages, index, or ranking in any way. There's two potential discussion threads here, one is purely the idea itself - what would a mini wiki article above any search result be like, and the other is distribution - how do lots of different entities support the idea together. Getting a good sense on the first one is necessary to know if it's worth figuring out the second :) Jer On Oct 1, 2007, at 12:46 PM, William Surowiec wrote: > Jer, > > I would profit if you indicate how you see the interaction in the > knugget food chain (factory, collector, broker) It sounds like this > is something living at the broker but how strongly is it > influenced by the factory and collector? Is it some type of add-on > that depends on potential ambiguity discovered in query > interpretation or result grouping (and who does the heavy lifting)? > > Bill > > jer wrote: > ... >> The concept is quite simple, what if there were a placeholder for >> miniature wiki articles above any search result? These would be >> very small and could serve as a very simple human-powered search >> function. > ... >> It's also important that these articles aren't search results, >> they are just a special fixture intended to guide a searcher to >> the right results. Therefore the only real restriction is that >> they can't link to anything but other search terms. It's a big >> restriction though, and one well worth debating, as it would >> lower the attraction as a spamming target, but also lower the >> value when there is no search term to link to that will really >> help the searcher. >> > ... From jason at calacanis.com Tue Oct 2 17:37:31 2007 From: jason at calacanis.com (Jason Calacanis) Date: Tue, 2 Oct 2007 10:37:31 -0700 Subject: [Search-l] the concept of a wiki mini article for search results In-Reply-To: <6DBFC71C-6BBB-473B-9610-B6FC5A343AEF@jabber.org> References: <1DC13198-C70D-4F94-AD16-066C42856D48@jabber.org> <4701327E.3070005@gmail.com> <6DBFC71C-6BBB-473B-9610-B6FC5A343AEF@jabber.org> Message-ID: <70b3cf150710021037w77b2a3c7v7bbc7f21743feed9@mail.gmail.com> On 10/1/07, jer wrote: > There's two potential discussion threads here, one is purely the idea > itself - what would a mini wiki article above any search result be > like, and the other is distribution - how do lots of different > entities support the idea together. Getting a good sense on the > first one is necessary to know if it's worth figuring out the second :) We've been doing this at Mahalo with our "Guide Notes" for the past year. We're totally open to licensing these to folks for the standard link back/credit. Some examples: http://www.mahalo.com/Darfur http://www.mahalo.com/Inter-Korean_Summit http://www.mahalo.com/In_Rainbows Jer: How would we participate in the Atlas stack? Just add this info and license terms in a Freebase style way? best j --------------------- Jason McCabe Calacanis CEO, http://www.Mahalo.com Mobile: 310-456-4900 My blog: http://www.calacanis.com AOL IM/Skype: jasoncalacanis My admin: admin at calacanis.com From beesley at gmail.com Wed Oct 3 03:06:54 2007 From: beesley at gmail.com (Angela) Date: Wed, 3 Oct 2007 13:06:54 +1000 Subject: [Search-l] the concept of a wiki mini article for search results Message-ID: <8b722b800710022006j20a40603sfb635f908b53af93@mail.gmail.com> I'm forwarding this for Hua Fang. Please remember to remove the content of the digest when replying to posts else the post will be blocked for being too large. ---------- Forwarded message ---------- From: "Hua Fang" To: search-l at wikia.com Date: Tue, 2 Oct 2007 17:21:59 -0400 Subject: Re: Search-l Digest, Vol 11, Issue 1 To all, This Jimmy's mini article idea does look like "half-way" of Codonology, which gives you "SubSum" of concepts and {P} in the terms of English language without clarification of identities of the members from Law{ }. What is missing is that you won't have spontanious reasoning capability. When two opposite search results are saying that each of them is right. The true search result should be able to reflect such conflict, for instance, saying "Two kind of results that are controdictory to each other", so on and so forth... ... Anyway, just an input from a codonologist (an expert in a specific field + IT programming service...) Thanks. Hua From jeremie at jabber.org Wed Oct 3 20:11:52 2007 From: jeremie at jabber.org (jer) Date: Wed, 3 Oct 2007 15:11:52 -0500 Subject: [Search-l] the concept of a wiki mini article for search results In-Reply-To: <70b3cf150710021037w77b2a3c7v7bbc7f21743feed9@mail.gmail.com> References: <1DC13198-C70D-4F94-AD16-066C42856D48@jabber.org> <4701327E.3070005@gmail.com> <6DBFC71C-6BBB-473B-9610-B6FC5A343AEF@jabber.org> <70b3cf150710021037w77b2a3c7v7bbc7f21743feed9@mail.gmail.com> Message-ID: <9948789A-3AD9-4DC1-8074-512FBE20DBFA@jabber.org> Actually Jason, the "Also try:" notes that Mahalo articles have are much closer to what I was thinking for the purpose of mini articles, just simple small blurbs that link to other keywords. The Guide Notes look like they are very much real content, small editorials on the topic. That's the kind of stuff that should be in the normal search results for keywords. Have you considered licensing dumps of your content under a CC license that still protects your commercial use (which I know you've expressed needs to be protected in your case) like: http:// creativecommons.org/licenses/by-nc-sa/2.0/ ? Jer On Oct 2, 2007, at 12:37 PM, Jason Calacanis wrote: > On 10/1/07, jer wrote: >> There's two potential discussion threads here, one is purely the idea >> itself - what would a mini wiki article above any search result be >> like, and the other is distribution - how do lots of different >> entities support the idea together. Getting a good sense on the >> first one is necessary to know if it's worth figuring out the >> second :) > > We've been doing this at Mahalo with our "Guide Notes" for the past > year. We're totally open to licensing these to folks for the standard > link back/credit. Some examples: > > http://www.mahalo.com/Darfur > http://www.mahalo.com/Inter-Korean_Summit > http://www.mahalo.com/In_Rainbows > > Jer: How would we participate in the Atlas stack? Just add this info > and license terms in a Freebase style way? > > best j > --------------------- > Jason McCabe Calacanis > CEO, http://www.Mahalo.com > Mobile: 310-456-4900 > My blog: http://www.calacanis.com > AOL IM/Skype: jasoncalacanis > My admin: admin at calacanis.com From jason at calacanis.com Wed Oct 3 20:27:50 2007 From: jason at calacanis.com (Jason Calacanis) Date: Wed, 3 Oct 2007 13:27:50 -0700 Subject: [Search-l] the concept of a wiki mini article for search results In-Reply-To: <9948789A-3AD9-4DC1-8074-512FBE20DBFA@jabber.org> References: <1DC13198-C70D-4F94-AD16-066C42856D48@jabber.org> <4701327E.3070005@gmail.com> <6DBFC71C-6BBB-473B-9610-B6FC5A343AEF@jabber.org> <70b3cf150710021037w77b2a3c7v7bbc7f21743feed9@mail.gmail.com> <9948789A-3AD9-4DC1-8074-512FBE20DBFA@jabber.org> Message-ID: <70b3cf150710031327v25499f9apea471821d08c3d1c@mail.gmail.com> On 10/3/07, jer wrote: > Have you considered licensing dumps of your content under a CC > license that still protects your commercial use (which I know you've > expressed needs to be protected in your case) like: http:// > creativecommons.org/licenses/by-nc-sa/2.0/ ? > Jer I think the license you're pointing to is where we will wind up (still having internal discussion about which license best suits our needs). If anyone wants to play with data now just ping me and feel free to do so in a non-commercial way (or if you want to do commercial let me know and we can discuss). We're open to anything right now. Question: Is it possible to pull search results from Grub currently? We would consider putting them on pages where we don't have a result. Not sure they're ready for primetime yet. Suggestion: I think the formation of a global spam URL/not-trusted URL system between Wikia, Mahalo, and other interested parties would be a great project. Basically have trusted parties insert untrusted links into a database and parties can then use that data to block things like comment spam on their blogs, submitted URLs on social news/bookmarking sites (digg,propeller, delicious), or a search engine/service (Mahalo, Ask, etc). best j --------------------- Jason McCabe Calacanis CEO, http://www.Mahalo.com Mobile: 310-456-4900 My blog: http://www.calacanis.com AOL IM/Skype: jasoncalacanis My admin: admin at calacanis.com From jeremie at jabber.org Wed Oct 3 21:03:57 2007 From: jeremie at jabber.org (jer) Date: Wed, 3 Oct 2007 16:03:57 -0500 Subject: [Search-l] the concept of a wiki mini article for search results In-Reply-To: <70b3cf150710031327v25499f9apea471821d08c3d1c@mail.gmail.com> References: <1DC13198-C70D-4F94-AD16-066C42856D48@jabber.org> <4701327E.3070005@gmail.com> <6DBFC71C-6BBB-473B-9610-B6FC5A343AEF@jabber.org> <70b3cf150710021037w77b2a3c7v7bbc7f21743feed9@mail.gmail.com> <9948789A-3AD9-4DC1-8074-512FBE20DBFA@jabber.org> <70b3cf150710031327v25499f9apea471821d08c3d1c@mail.gmail.com> Message-ID: > I think the license you're pointing to is where we will wind up (still > having internal discussion about which license best suits our needs). > > If anyone wants to play with data now just ping me and feel free to do > so in a non-commercial way (or if you want to do commercial let me > know and we can discuss). We're open to anything right now. Awesome, it's cool to see that happening. > Question: Is it possible to pull search results from Grub currently? > We would consider putting them on pages where we don't have a result. > Not sure they're ready for primetime yet. Not quite ready, doing lots of learning and playing with hadoop and nutch yet to digest the data :) > Suggestion: I think the formation of a global spam URL/not-trusted URL > system between Wikia, Mahalo, and other interested parties would be a > great project. Basically have trusted parties insert untrusted links > into a database and parties can then use that data to block things > like comment spam on their blogs, submitted URLs on social > news/bookmarking sites (digg,propeller, delicious), or a search > engine/service (Mahalo, Ask, etc). Yeah, that'd rock. After seeing all of the issues that the various IP blacklisting group efforts have had, it's pretty daunting... not to say it isn't worth taking on though. It's definitely something that the wiki-style administration of Grub could become though, I just need to get some of those ideas half-way implemented so we can all start to see how it feels in use. Jer From aerik at thesylvans.com Wed Oct 3 23:55:32 2007 From: aerik at thesylvans.com (Aerik Sylvan) Date: Wed, 3 Oct 2007 16:55:32 -0700 Subject: [Search-l] the concept of a wiki mini article for search results In-Reply-To: References: <1DC13198-C70D-4F94-AD16-066C42856D48@jabber.org> <4701327E.3070005@gmail.com> <6DBFC71C-6BBB-473B-9610-B6FC5A343AEF@jabber.org> <70b3cf150710021037w77b2a3c7v7bbc7f21743feed9@mail.gmail.com> <9948789A-3AD9-4DC1-8074-512FBE20DBFA@jabber.org> <70b3cf150710031327v25499f9apea471821d08c3d1c@mail.gmail.com> Message-ID: <355a36af0710031655m599b0cd4s36e8c5d2a71ca1e5@mail.gmail.com> On 10/3/07, jer wrote: > > > Suggestion: I think the formation of a global spam URL/not-trusted URL > > system between Wikia, Mahalo, and other interested parties would be a > > great project. Basically have trusted parties insert untrusted links > > into a database and parties can then use that data to block things > > like comment spam on their blogs, submitted URLs on social > > news/bookmarking sites (digg,propeller, delicious), or a search > > engine/service (Mahalo, Ask, etc). > > Yeah, that'd rock. After seeing all of the issues that the various > IP blacklisting group efforts have had, it's pretty daunting... not > to say it isn't worth taking on though. It's definitely something > that the wiki-style administration of Grub could become though, I > just need to get some of those ideas half-way implemented so we can > all start to see how it feels in use. > > Jer > > Have you already read the stuff at MeatBall ( http://www.usemod.com/cgi-bin/mb.pl?SharedAntiSpam) and the chongqed list ( http://blacklist.chongqed.org/)? I really like the idea of a decentralized P2P type system. I'd love to see a list like Chongqed's as a standard but maybe with some more meta-data (date added, maybe? as described in "data format" on the meatball page) Then an aggregator script could collect them and apply the regex's as needed, subscribing only to trusted providers. Maybe this is already being done in some places. It looks like chongqed's format is the de-facto standard (I think wikipedia is the same format - don't know or care who did it first). Perhaps the next step is a list of people who publish their spam blacklists. Also, modifying the blacklist code so it tells you whose list is causing the block would be helpful, in case of false positives. Aerik -- http://www.wikidweb.com - the Wiki Directory of the Web -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20071003/ca7be4db/attachment.html From renaud at oslutions.com Thu Oct 4 13:07:24 2007 From: renaud at oslutions.com (Renaud Richardet) Date: Thu, 04 Oct 2007 15:07:24 +0200 Subject: [Search-l] the concept of a wiki mini article for search results In-Reply-To: References: <1DC13198-C70D-4F94-AD16-066C42856D48@jabber.org> <4701327E.3070005@gmail.com> <6DBFC71C-6BBB-473B-9610-B6FC5A343AEF@jabber.org> <70b3cf150710021037w77b2a3c7v7bbc7f21743feed9@mail.gmail.com> <9948789A-3AD9-4DC1-8074-512FBE20DBFA@jabber.org> <70b3cf150710031327v25499f9apea471821d08c3d1c@mail.gmail.com> Message-ID: <4704E58C.1020004@oslutions.com> jer wrote: >> Question: Is it possible to pull search results from Grub currently? >> We would consider putting them on pages where we don't have a result. >> Not sure they're ready for primetime yet. >> > > Not quite ready, doing lots of learning and playing with hadoop and > nutch yet to digest the data :) > (How) can I help with Nutch/Hadoop? thanks, Renaud From b0ef at esben-stien.name Wed Oct 10 15:59:04 2007 From: b0ef at esben-stien.name (Esben Stien) Date: Wed, 10 Oct 2007 17:59:04 +0200 Subject: [Search-l] GIT/BZR Repository Message-ID: <87abqrrmev.fsf@esben-stien.name> Wouldn't a GIT or BZR repository be more sensible for this project?. -- Esben Stien is b0ef at e s a http://www. s t n m irc://irc. b - i . e/%23contact sip:b0ef@ e e jid:b0ef@ n n From vptes11 at gmail.com Fri Oct 19 02:14:49 2007 From: vptes11 at gmail.com (Peter) Date: Thu, 18 Oct 2007 19:14:49 -0700 Subject: [Search-l] Getting Search Results from Existing Search Engines Message-ID: <88e38f390710181914p1c3f22d7gde1a09b140120d19@mail.gmail.com> Hi everyone - I've been a lurker on this list for quiet a while...just thought I'd shoot out a couple of questions... First, for Jason...I've noticed how Mahalo includes Google results on every results page. How does this work? Did you have to do a private deal with Google, are you using some API that they've made available to the public, or are you simply sending queries to their webservers and scraping the results off their results pages? Also, do you guys envision using search results generated by existing search engines, such as Google, in Search Wikia? (or at least in the very beginning, before the community has a chance to develop the complex algorithms necessary to generate good search results in-house). Thanks! -Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20071018/3a936fe4/attachment.html From jason at calacanis.com Fri Oct 19 02:22:36 2007 From: jason at calacanis.com (=?utf-8?B?SmFzb24gTWNDYWJlIENhbGFjYW5pcw==?=) Date: Fri, 19 Oct 2007 02:22:36 +0000 Subject: [Search-l] Getting Search Results from Existing Search Engines In-Reply-To: <88e38f390710181914p1c3f22d7gde1a09b140120d19@mail.gmail.com> References: <88e38f390710181914p1c3f22d7gde1a09b140120d19@mail.gmail.com> Message-ID: <1482243501-1192760604-cardhu_decombobulator_blackberry.rim.net-531559842-@bxe126.bisx.prod.on.blackberry> We don't have google on every page, we default to them when we *don't* have a hand crafted page. When wikia search gets better results than google we'll switch to that. I'm thinking wikia will take another 90 days to beat google... So soon. :) Best J --------------- Jason at Calacanis.com | Mobile: 310-456-4900 http://www.calacanis.com | http://www.mahalo.com Executive Assistant: admin at calacanis.com -----Original Message----- From: Peter Date: Thu, 18 Oct 2007 19:14:49 To:"Search Wikia Mailing List" Subject: [Search-l] Getting Search Results from Existing Search Engines Hi everyone - I've been a lurker on this list for quiet a while...just thought I'd shoot out a couple of questions... First, for Jason...I've noticed how Mahalo includes Google results on every results page. How does this work? Did you have to do a private deal with Google, are you using some API that they've made available to the public, or are you simply sending queries to their webservers and scraping the results off their results pages? Also, do you guys envision using search results generated by existing search engines, such as Google, in Search Wikia? (or at least in the very beginning, before the community has a chance to develop the complex algorithms necessary to generate good search results in-house). Thanks! -Peter _______________________________________________ Search-l mailing list Search-l at wikia.com http://lists.wikia.com/mailman/listinfo/search-l Change options or unsubscribe: http://lists.wikia.com/mailman/options/search-l From borboleta at gmail.com Fri Oct 19 02:56:09 2007 From: borboleta at gmail.com (Bani) Date: Thu, 18 Oct 2007 23:56:09 -0300 Subject: [Search-l] Getting Search Results from Existing Search Engines In-Reply-To: <88e38f390710181914p1c3f22d7gde1a09b140120d19@mail.gmail.com> References: <88e38f390710181914p1c3f22d7gde1a09b140120d19@mail.gmail.com> Message-ID: <35b94d690710181956x306a7530o2248564c70ea2670@mail.gmail.com> > First, for Jason...I've noticed how Mahalo includes Google results on > every results page. How does this work? Did you have to do a private > deal with Google, are you using some API that they've made available > to the public, or are you simply sending queries to their webservers > and scraping the results off their results pages? I am not Jason, but since he decided to skip the technical part of the question here is what I think: Google provides several APIs to allow people to embed their results into their sites. Mahalo probably uses the SOAP* or AJAX* APIs * http://code.google.com/apis/soapsearch/ * http://code.google.com/apis/ajaxsearch/ Now, about Wikia Search using Google results at first, that doesn't seem to make sense for me considering the direction development seems to be taking. Vanessa From jason at calacanis.com Fri Oct 19 03:25:06 2007 From: jason at calacanis.com (=?utf-8?B?SmFzb24gTWNDYWJlIENhbGFjYW5pcw==?=) Date: Fri, 19 Oct 2007 03:25:06 +0000 Subject: [Search-l] Getting Search Results from Existing Search Engines In-Reply-To: <35b94d690710181956x306a7530o2248564c70ea2670@mail.gmail.com> References: <88e38f390710181914p1c3f22d7gde1a09b140120d19@mail.gmail.com><35b94d690710181956x306a7530o2248564c70ea2670@mail.gmail.com> Message-ID: <2095772896-1192764354-cardhu_decombobulator_blackberry.rim.net-357242095-@bxe126.bisx.prod.on.blackberry> Yes, there are a bunch of ways to syndicate google/ask/yahoo/etc. Sorry, overlooked that. From what I've read they are flexible about results up until the point you edit them or do metasearch (two big issues for them from what I've heard). So, the idea of voting up and down google, or things like aftervote will get shut down if they get any traction (which they don't due to the fact that they not noticeably different than google/yahoo). Best J --------------- Jason at Calacanis.com | Mobile: 310-456-4900 http://www.calacanis.com | http://www.mahalo.com Executive Assistant: admin at calacanis.com -----Original Message----- From: Bani Date: Thu, 18 Oct 2007 23:56:09 To:search-l at wikia.com Subject: Re: [Search-l] Getting Search Results from Existing Search Engines > First, for Jason...I've noticed how Mahalo includes Google results on > every results page. How does this work? Did you have to do a private > deal with Google, are you using some API that they've made available > to the public, or are you simply sending queries to their webservers > and scraping the results off their results pages? I am not Jason, but since he decided to skip the technical part of the question here is what I think: Google provides several APIs to allow people to embed their results into their sites. Mahalo probably uses the SOAP* or AJAX* APIs * http://code.google.com/apis/soapsearch/ * http://code.google.com/apis/ajaxsearch/ Now, about Wikia Search using Google results at first, that doesn't seem to make sense for me considering the direction development seems to be taking. Vanessa _______________________________________________ Search-l mailing list Search-l at wikia.com http://lists.wikia.com/mailman/listinfo/search-l Change options or unsubscribe: http://lists.wikia.com/mailman/options/search-l From jwales at wikia.com Fri Oct 19 04:57:56 2007 From: jwales at wikia.com (Jimmy Wales) Date: Thu, 18 Oct 2007 21:57:56 -0700 Subject: [Search-l] Getting Search Results from Existing Search Engines In-Reply-To: <1482243501-1192760604-cardhu_decombobulator_blackberry.rim.net-531559842-@bxe126.bisx.prod.on.blackberry> References: <88e38f390710181914p1c3f22d7gde1a09b140120d19@mail.gmail.com> <1482243501-1192760604-cardhu_decombobulator_blackberry.rim.net-531559842-@bxe126.bisx.prod.on.blackberry> Message-ID: <47183954.7070505@wikia.com> Jason McCabe Calacanis wrote: > When wikia search gets better results than google we'll switch to > that. I'm thinking wikia will take another 90 days to beat google... > So soon. :) Nah, no sooner than 100 days... don't forget the holidays. ;-) From dhart at atlantisblue.com.au Fri Oct 19 05:45:43 2007 From: dhart at atlantisblue.com.au (David Hart) Date: Fri, 19 Oct 2007 15:45:43 +1000 Subject: [Search-l] =?windows-1252?q?blog_bit=3A_Radar=92s_Twine=3A_A_sema?= =?windows-1252?q?ntic_Google_killer=3F?= Message-ID: <60c9cc1c0710182245s72c9d4b5ke6e5e2fb18489f0e@mail.gmail.com> Radar's Twine: A semantic Google killer? from VentureBeatby Chris Morrison [image: twine-logo.jpg]More than a year of secrecy spawned rumors about Radar Networks . The most popular: It's a "Google killer." Tomorrow morning, Radar will surprise a few people by launching Twine , a tool for collecting and organizing information that's very different from Google. But it's potentially just as ambitious. An example of how Twine works: A user uploads a text document to their Twine account. Twine then parses the document to find the words with meaning ? names, places, concepts and so forth. Those terms become tags, which the person can use to access related information. Twine's underlying technology gives the computer a measure of intelligence. Using tags, a computer can distinguish between, say, a reference to the kind of bird that flies and the kind that flips people off. Once it has, it can give users a wealth of other information, drawn from their own accumulated knowledge base, other users and the outside internet. Where Google crawls the entire web and ultimately pollutes your search results with different kinds of "birds," Radar picks from a smaller universe of sources and tries to automatically discard the ones you don't want. That could help a marketer collect all the information about a particular product, or a group of analysts to aggregate information on a subject. The "documents" gathered will include, among many others, text, PDFs, or even videos on YouTube (Twine simply draws on pre-existing tags and description of visual media to do its tagging work). The information that helps Twine make decisions on its own about what content to pull in for you comes both from a users' accumulated information as well as their actions, which means that, as the user pulls more info into their account on their own, Twine will begin to work cooperatively, providing more content where it's needed and even assisting groups or teams of people with collaborative research and knowledge-building. Young companies with a limited ability to do similar selection tricks ? for instance, Jiglu, which we posted about a few days ago ? are increasingly common, and tend to obscure the companies that truly have a chance of becoming market leaders. That's too bad, because there's no question that intelligent computer handling of data ? a first step toward artificial intelligence ? will be an important part of the internet in coming years. Helping Radar is the breadth of its underlying technology and the strong scientific and engineering team, now 30 strong, that has been working on the platform for years. Radar does, however, have competitors. The winning bet will boil down to which company will be able to throw enough scientific brilliance at the difficult problem of teaching computers to understand human information. The winner will likely dominate, as Google does with search. To explain the differences between these competing startups, it's easiest to separate them by the particular types of technology they utilize. Broadly speaking, those technologies fall into three categories. The first is statistical analysis, in which Google reigns supreme. Terms are examined for their frequency, placement and outside links to determine their apparent relevancy, and then ranked. Google's algorithms have gotten better over the years, and it has incrementally added on other technologies and services. Natural language search is the second category. Teaching computers to understand human language is a complex process which involves breaking sentences down to their component parts ? nouns, verbs, adjectives and so forth ? which can then take on symbolic meaning for computers. Powerset (previous coverage), which is dribbling out its technology in stages, is a prime example of this approach. The third, semantic search, is much-hyped, but little understood. Simply put, people attach markers to human-generated content, whether a paragraph of text or a picture, to outright tell computers in a special machine language what's meaningful. In these databases of these companies, for example, I might be identified as "Chris Morrison," with the markers "writer," "venturebeat," "male," "technology," "charming" and "goodlooking." (All true, of course.) If applied to the entire internet, the result could be thought of as a giant, interrelated Wikipedia. Metaweb, which recently launched Freebase, is attempting to create just that. For the most part, each company is betting on its own core technology to win the race. Radar hopes its own special combination of all three will take the day ? much like another secretive startup, Franz Inc. To be fair, there's also a fourth, less glamorous approach which relies almost entirely on humans. ChaCha and a forthcoming startup from Wikipedia founder Jimmy Wales are two examples. First, though, the viability of any technology must be proven. To return to Twine, it's the horse Radar is betting on, just as Powerset hopes to take the approach of slowly beating out Google at searching the internet. What matters is how well Twine can perform at helping humans organize the avalanche of information that is modern life. So while there are other features we could mention, from adding content through an innovative bookmarklet to finding related content through a "social graph" of similar users, it's more useful to give our reaction to Twine. Having sat through a demo by founder Nova Spivack, we can say that we're excited to try out Twine. The interface is simple, yet powerful. While the use of tagging resembles tag-lists that have been around for years, their application is clearly more useful. And Twine was obviously capable of completing some complex tasks, like distinguishing the person's name J.P. Morgan from the company with the same name. The site is just as obviously still in development. A wealth of other features could obviously be useful, from a more full array of choices for communicating with other users (Spivack says instant messaging is coming) to adding more possibilities for linking information. However, the Twine team won't have to do the work alone. Sometime after the current beta launch, which will be limited to a few thousand people, Twine plans on opening up several APIs to allow outside developers to work with the platform. For now, the site is geared toward people who use the internet heavily ? primarily knowledge professionals, like the marketers and analysts mentioned above. Students, prosumers (people with a strong interest in a particular thing) and companies will also likely find uses for Twine. For more discussion of Radar's idea of the future ? including what could go wrong ? we'll post a Q&A with Nova Spivack on Saturday [image: twine-greentech.jpg] -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20071019/181b0d56/attachment.html From jmcc at hackwatch.com Fri Oct 19 08:42:23 2007 From: jmcc at hackwatch.com (John McCormac) Date: Fri, 19 Oct 2007 09:42:23 +0100 Subject: [Search-l] Getting Search Results from Existing Search Engines In-Reply-To: <47183954.7070505@wikia.com> References: <88e38f390710181914p1c3f22d7gde1a09b140120d19@mail.gmail.com> <1482243501-1192760604-cardhu_decombobulator_blackberry.rim.net-531559842-@bxe126.bisx.prod.on.blackberry> <47183954.7070505@wikia.com> Message-ID: <47186DEF.4010806@hackwatch.com> Jimmy Wales wrote: > Jason McCabe Calacanis wrote: > >>When wikia search gets better results than google we'll switch to >>that. I'm thinking wikia will take another 90 days to beat google... >>So soon. :) > > > Nah, no sooner than 100 days... don't forget the holidays. ;-) 100 Days To Wikiasearch - sounds like a movie title. :) It might be interesting to do a few podcasts as it progresses. Regards...jmcc -- ****************************************************** John McCormac * e-mail: jmcc at whoisireland.com MC2 * voice: +353-51-873640 22 Viewmount * web: http://www.whoisireland.com/ Waterford * blog: http://blog.whoisireland.com Ireland * Irish Domain Stats & Market Research ****************************************************** From ray.slakinski at gmail.com Fri Oct 19 13:00:59 2007 From: ray.slakinski at gmail.com (Ray Slakinski) Date: Fri, 19 Oct 2007 09:00:59 -0400 Subject: [Search-l] Getting Search Results from Existing Search Engines In-Reply-To: <88e38f390710181914p1c3f22d7gde1a09b140120d19@mail.gmail.com> References: <88e38f390710181914p1c3f22d7gde1a09b140120d19@mail.gmail.com> Message-ID: As a side note, on some of our (Mahalo's) pages we use Google News RSS feeds. But that is up to the guide to figure out what terms to query google with and placement of those RSS items. Ray On 18-Oct-07, at 10:14 PM, Peter wrote: > Hi everyone - > I've been a lurker on this list for quiet a while...just thought I'd > shoot out a couple of questions... > > First, for Jason...I've noticed how Mahalo includes Google results on > every results page. How does this work? Did you have to do a private > deal with Google, are you using some API that they've made available > to the public, or are you simply sending queries to their webservers > and scraping the results off their results pages? > > Also, do you guys envision using search results generated by existing > search engines, such as Google, in Search Wikia? (or at least in the > very beginning, before the community has a chance to develop the > complex algorithms necessary to generate good search results > in-house). > > Thanks! > > -Peter > _______________________________________________ > Search-l mailing list > Search-l at wikia.com > http://lists.wikia.com/mailman/listinfo/search-l > Change options or unsubscribe: http://lists.wikia.com/mailman/ > options/search-l -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.wikia.com/pipermail/search-l/attachments/20071019/312fe0a6/attachment.html From jwales at wikia.com Fri Oct 19 19:59:42 2007 From: jwales at wikia.com (Jimmy Wales) Date: Fri, 19 Oct 2007 12:59:42 -0700 Subject: [Search-l] Getting Search Results from Existing Search Engines In-Reply-To: <47186DEF.4010806@hackwatch.com> References: <88e38f390710181914p1c3f22d7gde1a09b140120d19@mail.gmail.com> <1482243501-1192760604-cardhu_decombobulator_blackberry.rim.net-531559842-@bxe126.bisx.prod.on.blackberry> <47183954.7070505@wikia.com> <47186DEF.4010806@hackwatch.com> Message-ID: <47190CAE.7060407@wikia.com> John McCormac wrote: > Jimmy Wales wrote: >> Jason McCabe Calacanis wrote: >> >>> When wikia search gets better results than google we'll switch to >>> that. I'm thinking wikia will take another 90 days to beat google... >>> So soon. :) >> >> >> Nah, no sooner than 100 days... don't forget the holidays. ;-) > > 100 Days To Wikiasearch - sounds like a movie title. :) > It might be interesting to do a few podcasts as it progresses. Just to be realllllly clear in case my humor was too much... I took Jason to be sarcastic, and I was just going along with the joke good naturedly. For the record, Wikia search will not be google quality in 90 days or 100 days... or probably a year or three. This is a big project, and a long slog. Wikia Search on day 1 will be like Wikipedia on day 1... it will be *free* in the sense of GNU, but it will be basically just a starting point for further work and a vision that I hope can unite people and organizations of various kinds. --Jimbo