From jeremie at jabber.org Wed Jan 30 08:12:53 2008 From: jeremie at jabber.org (jer) Date: Wed, 30 Jan 2008 02:12:53 -0600 Subject: [search-ui] first steps beyond the current re.search.wikia.com Message-ID: <576EC55A-82C1-44FA-AFE8-F8E1F74D8E32@jabber.org> I have my own priorities and things I'd like to do here, but don't let that keep anyone else from speaking up or suggesting their own ideas and voicing what they feel is important :) One of the first things I want to figure out before I start re- structuring the existing javascript and html, is what are the needed layers, what parts need to be isolated from each other, and what needs to be solidified so that extensions have something stable to work against. In thinking about this I've come up with four major groups: Themes, styling, customization Macros, interacting with the search terms and logic in getting answers Widgets, given a search term display something in a box Apps, deep access, different result visualizations A rough outline of the parts of the page to start identifying what's all going to be involved: Header Logo Input area + search button Link bar, social links, login Results header (number of) Mini Article Search Results Title Clipping URL - cache - score - star rating Right side People Matching Other Indexes Footer More results Another input area Links (about/contact/terms/etc) Looking at that list, the easiest thing to tackle is the "widgets" since they are self contained. Almost as non-committal would be saying that Apps have access to everything however it's done :) The more immediate questions are around defining how themes are done and what macros are. What parts are theme-able? Just style, or layout? I think I'll tackle my ideas around macros and some examples of them in another thread tomorrow. I'd love input on anything here though, otherwise I'll just dive in and start scrambling the html/js, *grin*. Jer From tat.wright at googlemail.com Wed Jan 30 18:24:21 2008 From: tat.wright at googlemail.com (Tom Wright) Date: Wed, 30 Jan 2008 18:24:21 +0000 Subject: [search-ui] first steps beyond the current re.search.wikia.com In-Reply-To: <576EC55A-82C1-44FA-AFE8-F8E1F74D8E32@jabber.org> References: <576EC55A-82C1-44FA-AFE8-F8E1F74D8E32@jabber.org> Message-ID: <2813c4970801301024x107ebf0dn80e2be7607afd280@mail.gmail.com> Hi. I pretty much agree with your analysis. Here is some random thinking aloud: hope it isn't too long. ==MACROS== ===From a users perspective:=== What is a macro? Ways of changing the output that is specified. I quite like the idea of a "command line for internet" Classification of types of macros: 1. Commands - perform this calculation, find definitions of these words, do such and such. a) Commands that present additional information b) Commands that fundamental alter what the search does. 2. Additional options for searching - example search only this site, order in a different way, search only for these file types. Qualities that the syntax should have from the users point of view: (i) Human readable and in plain english. (ii) Redundant - i.e most ways of asking for a particular command should work. (iii) Possible to use without reading any documentation to as large an extent as possible. (I like the idea of being able to guess at a syntax - try it - and have it work) I would favour treating certain search terms as commands rather than having a specific command syntax. So one might write: "search wikipedia for maxwell" or "find videos about open source software on youtube and google vidoes". Obviously, there is a limit to how good a computer can be at reading english. We might also want "statistical" commands - i.e rather than seeing if a command matches a particular regular expression see if the command contains certain words and carry out the command. Potential problems: (a) Collisions between macros (might be exceedingly unlikely) (b) Collisions between macros and actual search terms: (c) German people don't as a rule speak plain english. For (b), it might suffice to provide an "escape" any commands after this point command like "search:". But the user would have to be told about this without having to be asked. ====A Strawman Model==== (a) All commands specified as natural language regular expressions that have verbs in them. Each command will have as many possible regular expressions as one can think of that describe this command. (b) Options for searching are expressed at the end of the search term in plain english, again matched using regular expressions with as much latitude as possible. Example: transparent proxy on www.squid.org of file type pdf created before 2001 modified last tuesday. (c) The default behaviour for a macro specified in plain english is "if I don't understand my input assume I am not meant to be called". (d) Some macros can be statistical - but only ever macros that present additional information. (e) Each language recognises an "english macro language" as well as a "local macro language." ===From the engineering perspective:=== ====Strawman model:==== We have separate "query handlers" that are given the query term and are responsible for checking if they query is to be handled by them. If so they perform the relevant actions and present results directly. Normal searching is just seen as a particular type of query handler. Which happens to swallow any sort of input. ====Potential Problems:==== (a)How slow is matching 1000 regular expressions? 10 for each command that exists. Matching distinct regular expressions to see if each type of command is being used represents a massive amount of additional computation as compared to having a clearly defined syntax. Then again (i) computer power is made to be used (ii) This is clientside so who cares. (iii) As long as it isn't noticeably slow then it shouldn't matter. (iv) Computers get faster. (Aside: I wonder if anyone has done work on simultaneously matching regular expressions... I suppose there is indirect work from parsing theory...) ===Limits that should be placed on macros === There is probably a natural tendencies for one to want to create macros to do everything that can possibly be done - as a search engine is trying to be a tool to answer any question that can possibly be asked. Questions: What sort of limits would be advisable to place upon what the search engine should and shouldn't do? When isn't it worth presenting content from other websites rather than provided a link to this website or using content rendered from this website? ==THEMES== One possible approach for themes would be to push as much of the presentation into javascript is possible. The resposibility of search.html would then only present boxes where javascript would place things. A theme would then consist of an html page specifying the layout of the main objects, as well as a css specifying styling. == Handling of the DOM model== The dom model is functioning as a kind of database being shared between all the different widgets. So it seems like some care should be taken to protect it in some way - if only by a set of conventions. == Problems with cross-site scripting == Lots of the natural widgets that one would want to write involve calling apis from other websites. But at the moment this isn't really supported in browsers. (If you can make it work it probably shouldn't work.) For example, XMLHttpRequest in firefox doesn't connect to resources from other IP addresses or other ports. There are currently plans to extend XMLHttpRequest to support cross site scripting if the URL your script is trying to talk to accepts connections from your domain name, so it would seem like this is the correct mechanism to use in the long term - but what should be done in the middle term? == Random Ideas (to be read with skepticism): == (i) One thing that is missing here is the clarification / disambiguation of search terms - which I think might be important. The idea here is that after entering a search term popular options are offered to you to make the search term more specific. The suggested search terms might be generated by looking at the popularity of search terms, the existence of mini-articles and the general click through rate together with some manually entered human content. Example: travelling to holland ---> air flights to holland, accommodation in holland, travel guides in holland, tourist information holland. accommodation in holland ----> hotels in holland, bed and breakfast in holland. This also might open the door to specialized disambiguation. accommodation in holland ----> Where in holland would you like to visit? (Dropbox of cities in holland.) The idea here is to remove come of the thought process involved in creating a search term, instead the search engine does this for you. My normal process of searching is: Enter a short search term ---> Look at the first page of results ----> try to improve alter the search term ---> Look at first page of results ----> continue for a while ---> reeneter first search term ---> look at more results ----> try a different method of searching. This approach would mean that I don't have to think about what needs to altered myself. This is also useful because it directs me to mini-articles. It might be argued that this functionality should just be provided in mini-articles. (ii) Some sort of cache of recently search terms. With the ability to create mini-articles for them with a single click. Rationale: If a search term didn't work but you found the items that you wanted by clarifying the search term, you want to add information to the mini-article, but this is noramlly rather slow to do. (iii) Third party macros / apps/ data. "Find cheapest flights to france using easyjet in the next two months." On Jan 30, 2008 8:12 AM, jer wrote: > I have my own priorities and things I'd like to do here, but don't > let that keep anyone else from speaking up or suggesting their own > ideas and voicing what they feel is important :) > > One of the first things I want to figure out before I start re- > structuring the existing javascript and html, is what are the needed > layers, what parts need to be isolated from each other, and what > needs to be solidified so that extensions have something stable to > work against. In thinking about this I've come up with four major > groups: > > Themes, styling, customization > Macros, interacting with the search terms and logic in getting answers > Widgets, given a search term display something in a box > Apps, deep access, different result visualizations > > A rough outline of the parts of the page to start identifying what's > all going to be involved: > > Header > Logo > Input area + search button > Link bar, social links, login > Results header (number of) > Mini Article > Search Results > Title > Clipping > URL - cache - score - star rating > Right side > People Matching > Other Indexes > Footer > More results > Another input area > Links (about/contact/terms/etc) > > Looking at that list, the easiest thing to tackle is the "widgets" > since they are self contained. Almost as non-committal would be > saying that Apps have access to everything however it's done :) The > more immediate questions are around defining how themes are done and > what macros are. What parts are theme-able? Just style, or layout? > > I think I'll tackle my ideas around macros and some examples of them > in another thread tomorrow. I'd love input on anything here though, > otherwise I'll just dive in and start scrambling the html/js, *grin*. > > Jer > > > _______________________________________________ > Search-UI mailing list > Search-UI at wikia.com > http://lists.wikia.com/mailman/listinfo/search-ui > From balinny at gmail.com Wed Jan 30 22:38:58 2008 From: balinny at gmail.com (Balinny) Date: Wed, 30 Jan 2008 23:38:58 +0100 Subject: [search-ui] first steps beyond the current re.search.wikia.com In-Reply-To: <2813c4970801301024x107ebf0dn80e2be7607afd280@mail.gmail.com> References: <576EC55A-82C1-44FA-AFE8-F8E1F74D8E32@jabber.org> <2813c4970801301024x107ebf0dn80e2be7607afd280@mail.gmail.com> Message-ID: <47A0FC82.5010408@gmail.com> Tom Wright wrote: > Hi. I pretty much agree with your analysis. Here is some random > thinking aloud: hope it isn't too long. > > ==MACROS== > > ===From a users perspective:=== > > What is a macro? > Ways of changing the output that is specified. I quite like the idea > of a "command line for internet" > > Classification of types of macros: > 1. Commands - perform this calculation, find definitions of these > words, do such and such. > a) Commands that present additional information > b) Commands that fundamental alter what the search does. > 2. Additional options for searching - example search only this site, > order in a different way, search only for these file types. > At the end, it lies on what the backend provides. If there's no syntax to restrict search for pdf files, it's not feasible to do that client side. So ther should be a "master" grammar to talk with the server, to which macros would translate and preferably as simple as possible. IMHO the best way would be using the key:value aproach. > == Handling of the DOM model== > > The dom model is functioning as a kind of database being shared > between all the different widgets. So it seems like some care should > be taken to protect it in some way - if only by a set of conventions. > And some explanation of what each id does. It can be figured out, but would be much easier having a wiki page linking each id to "On this place it is shown Foo if X and Y were chosen". All those empty divs with ids are a bit scarying. > == Problems with cross-site scripting == > > Lots of the natural widgets that one would want to write involve > calling apis from other websites. But at the moment this isn't really > supported in browsers. (If you can make it work it probably shouldn't > work.) > > For example, XMLHttpRequest in firefox doesn't connect to resources > from other IP addresses or other ports. There are currently plans to > extend XMLHttpRequest to support cross site scripting if the URL your > script is trying to talk to accepts connections from your domain name, > so it would seem like this is the correct mechanism to use in the long > term - but what should be done in the middle term? > There're ways to do that using frames on both domains and interchanging information. A bit obscure, though. > == Random Ideas (to be read with skepticism): == > > (i) One thing that is missing here is the clarification / > disambiguation of search terms - which I think might be important. The > idea here is that after entering a search term popular options are > offered to you to make the search term more specific. The suggested > search terms might be generated by looking at the popularity of search > terms, the existence of mini-articles and the general click through > rate together with some manually entered human content. > > Example: > travelling to holland ---> air flights to holland, accommodation in > holland, travel guides in holland, tourist information holland. > accommodation in holland ----> hotels in holland, bed and breakfast in holland. > > This also might open the door to specialized disambiguation. > accommodation in holland ----> Where in holland would you like to > visit? (Dropbox of cities in holland.) > This is a good place for miniarticles. But syntax for them should probably be simplified. Currently these equivalent search terms will lead to different miniarticles: *accommodation to holland *accommodation to Holland *accommodation at holland *accommodation at Holland *ACCOMODATION TO HOLLAND *holland accomodation *accommodation Holland *accommodation holland