[Search-l] Interesting Article

Dan Lewis dan at wikia-inc.com
Mon Jul 28 14:44:00 UTC 2008


Assuming it is right -- the 1 trillion pages, I mean -- I find it amazing
that Google only indexes, say, 25% of it.    What about the other 75%?  It
can't be all duplicate content and machine generated pages.  Not even close.

I figured I'd toss something on the blog about this, and ended up noticing
that Google does a pretty bad job of adding blog posts to the main search
engine.  If you run the domains of blog hosting companies through their blog
search, they tell you they see, e.g., almost 5 billion blogspot.com URLs.
Run the same domain through the main search engine?  340 million.

It's likely that the numbers they provide are inaccurate, but the difference
here is 7 billion (!) results when you include livejournal.com and
wordpress.com.

http://search.wikia.com/blog/2008/07/28/whats-the-other-75-percent-blogs/

Dan

On Sun, Jul 27, 2008 at 11:03 AM, Jimmy Wales <jwales at wikia.com> wrote:

> Dennis Kubes wrote:
> > http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html
>
> Weird.
>
> 1 trillion unique web pages.  I am skeptical.  That's 166 pages per
> person on earth.  Or, if we assume there are 1 billion people online,
> that's 1,000 pages for every person online.  I don't know about you, but
> I haven't written 1,000 web pages yet.
>
> If they are data-driven pages, that's interesting and all, but
> "counting" pages from a data-driven site is a bit silly.  Even the blog
> post acknowledges this, by talking about how a calendar site has,
> theoretically, an infinite number of pages.
>
> --Jimbo
>
> _______________________________________________
> Wikia Search mailing list
> http://re.search.wikia.com/
> Change options or unsubscribe:
> http://lists.wikia.com/mailman/options/search-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikia.com/pipermail/search-l/attachments/20080728/38ef38c5/attachment.html 


More information about the Search-l mailing list