[Search-l] Phase of the moon Idea On Search/clients
Mark (Markie)
newsmarkie at googlemail.com
Mon Jan 14 16:42:05 UTC 2008
answers below :-)
thanks
mark
On Jan 14, 2008 2:49 PM, Arne Müller <arne_c_mueller at web.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
> I've also got some probably very dumb questions. You guys here seem to
> be talking, like if you know how this search engine works, how the
> index is structured and so on. I don't know how anything of this works.
> As far as I understand it, there is something called grub, which
> follows links and adds the sides it finds to some kind of database.
> But this is actually everything I know about this thing.
>
> 1.) How does this grub thingi work? Can people download it and thus
> give wikia:search the ability to know what kinds of sites people are
> really looking at? I think that would give you loads of good data very
> fast (maybe too fast).
>
grub collects work units from a central server, and then crawls them
specially using the grub client (download from
http://grub.org/html/downloads.php :-). all pages are specially loaded and
does not currently crawl the sites that you go on. so currently i am
crawling theatlantic.com i can honestly say that ive never visted (let alone
heard of) that site. :-)
>
> 2.) How does this index work? Is e.g. regexp search really possible,
> or would such a search take just a much too long time searching the
> database.
im not sure about this sorry :-(
>
>
> 3.) Is the index only located at one place, or is it possible to
> distribute it over the whole net (maybe even via p2p), so that attacks
> can't do much damage. That would be a real nice feature :D
the index is currently (as far as i know) located in a (secret) underground
bunker somewhere in Ohio and on ISC servers (not sure where these are) (it
may also be on wikia servers im not sure
>
>
> so maybe there is some place, where I can read about all that stuff,
> then post a link somewhere (I havent found one at search.wikia.com)
im not currently aware of a page although there probs will be one soon :-)
>
>
> cheers, Arne
>
> aerik at thesylvans.com schrieb:
> > let me quickly ask a dumb question: in a typical small search engine,
> > with "average" usage and "average" hardware and bandwidth, what is the
> > major bottleneck? or maybe better, what costs the most as the engine
> > scales? bandwidth? I think as we have these discussions, we get into
> > some pretty nontypical territory where we need to guesstimate at
> > nontypical limits. (think of the regex search for example) and that
> > dynamic is good to keep in mind. And if we're putting forth ideas
> > that are likely to make the existing bottleneck even worse, we should
> > understand that too.
> >
> > but a really fun part of this process is challenging the current
> assumptions :-)
> >
> >
> >
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFHi3aT88dCJRGuqhoRApjtAJsGy6WqRL1WHbJbIIeMhL5qbw632wCdEeec
> wBh/KUSViN44Zk61sa3wFlM=
> =qEWS
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Wikia Search mailing list
> http://alpha.search.wikia.com/
> Change options or unsubscribe:
> http://lists.wikia.com/mailman/options/search-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikia.com/pipermail/search-l/attachments/20080114/cd87f0a8/attachment.html
More information about the Search-l
mailing list