[Search-l] Wikia - Global focus or country level search?
Dennis Kubes
kubes at apache.org
Sun Aug 12 03:38:46 UTC 2007
Localization on the front end so the same search website can be used and
appropriate content pages can be retrieved, yes. But being able to
restrict query and search results to specific languages requires two
components:
1) A very good language identifier. This is usually run on content
while it is being fetched and stored because you want to make it
available to processing jobs including indexing jobs. There are
different algorithms for this but many current approaches use character
ngrams and distributions to identify language from text.
2) Fields within the index to store language and restrict queries, or a
better option is completely separate indexes and search servers based on
language. One of the benefits of separate indexes for languages is the
ability to scale capacity for a given language.
Some numbers we have seen is about 35% of webpages crawled are
non-english. I would agree though that language needs to be designed in
from the start even if we only start with english.
Dennis Kubes
Jimmy Wales wrote:
> I think good quality demands localization, and so it needs to be
> designed in from the start.
>
> John McCormac wrote:
>> Will Wikia have an more US focus with .com/net/org/biz/info being the
>> primary search targets? Or will each country have its own SE as in the
>> "pages from $country" thing with Google etc? Has such a question been
>> considered or is it way down the list?
>>
>> Regards...jmcc
>
>
> _______________________________________________
> Search-l mailing list
> Search-l at wikia.com
> http://lists.wikia.com/mailman/listinfo/search-l
> Change options or unsubscribe: http://lists.wikia.com/mailman/options/search-l
More information about the Search-l
mailing list