When did Google last show 8 Billion Pages?

Discussion in 'Google' started by mvandemar, Jan 3, 2007.

  1. #1
    Anyone remember?

    Even though they aren't displaying it currently, it looks like (at the time of this post anyways) they currently have 21,440,000,000 pages indexed:

    http://www.google.com/search?num=100&hl=en&lr=&safe=off&q=***+OR+***+AND+***&btnG=Search

    Was wondering how much that would mean the internet had grown, and over what time period, and if anyone knew how that compared to Yahoo's index? Hard to figure Y's out, most I can come up with is 8,780,000,000.

    Also, anyone know what the "average" size of a webpage is, excluding images, and approx how much harddrive space it would take to hold just one copy of 21,440,000,000 pages? :p

    -Michael
     
    mvandemar, Jan 3, 2007 IP
  2. ScottFish

    ScottFish Peon

    Messages:
    952
    Likes Received:
    53
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Interesting, I know there was some discussion on some of the SEO blogs about this.
     
    ScottFish, Jan 3, 2007 IP
  3. adnan

    adnan Peon

    Messages:
    1,614
    Likes Received:
    82
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Average size of a webpage is around 30kb excluding images.

    The search you did doesn't mean google has 21 billion pages. It means that many matches were found.

    The 8 billion, pages that they used to mention was ages ago if I remember correctly. Like in 99 or 2000. I could be wrong.

    Hard disk space for 20 billion pages is easy to calculate.

    30kb X 20 Billion = 600 Billion Kilobytes.

    600 Billion Kilobytes = 600 Billion / 1024 / 1024 / 1024 = 558 Terrabytes

    So you would need 744 + a few more (for clustering and filesystem) of the seagate 750 Gb Sata drives.

    For a Raid, you would need the above figure x 3.
     
    adnan, Jan 3, 2007 IP
  4. mvandemar

    mvandemar Notable Member

    Messages:
    2,409
    Likes Received:
    307
    Best Answers:
    0
    Trophy Points:
    230
    #4
    With the search I did, the 2 should be synonymous. If there is error (which I am pretty damn sure there is at least some of) then it would be in how Google calculates result counts, not in the logic of the search.

    -Michael
     
    mvandemar, Jan 3, 2007 IP
  5. adnan

    adnan Peon

    Messages:
    1,614
    Likes Received:
    82
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Well its not

    did u look at google's help and see what the '*' char means?

    Plus does it make a difference if u use 1, 2, or 3 '*'

    Did u try that same query or a similar query on different agents like yahoo or msn.
     
    adnan, Jan 3, 2007 IP
  6. mvandemar

    mvandemar Notable Member

    Messages:
    2,409
    Likes Received:
    307
    Best Answers:
    0
    Trophy Points:
    230
    #6
    I didn't look it up, it's a wildcard search. What is it about "match {anything} OR {anything} AND {anything}" that makes you think it is substantially different from "match everything"? I mean, if I'm missing something, fine, but I don't see what it is. Like I said, if it's off then afaik it would be because of the way G was counting the results, not due to the logic of the search.

    Yahoo doesn't support wildcard searches, and MSN has cooties.

    I found what looks to be a good indicator of the index growth, by the way. Again, not sure how accurate the dates are, because I think sometimes archive.org shows the same cache for multiple dates, but it's the closest thing I've found so far:

    Oct 9, 2001: Searching 1,610,476,000 web pages.

    Aug 3, 2002: Searching 2,073,418,204 web pages.

    Feb 2, 2003: Searching 3,083,324,652 web pages.

    Oct 6, 2003: Searching 3,307,998,701 web pages.

    Oct 19, 2004: Searching 4,285,199,774 web pages.

    Dec 7, 2004: Searching 8,058,044,651 web pages.

    Aug 6, 2005: Searching 8,168,684,336 web pages.

    -Michael

    Edit: If you're saying that there might be more pages than that, then I'm not disagreeing with you. I'm saying there are at least that many (if the result count is accurate), but would have no clue how to elicit the rest.
     
    mvandemar, Jan 3, 2007 IP
  7. adnan

    adnan Peon

    Messages:
    1,614
    Likes Received:
    82
    Best Answers:
    0
    Trophy Points:
    0
    #7
    I'm saying that there are LESS pages then what ur wildcard query is showing.

    Going by the links u showed, according to the wayback machine, in Aug 2005 google claimed to be searching around 8 billion.

    21 billion is almost 3 times as much as 8 billion.
     
    adnan, Jan 3, 2007 IP
  8. adnan

    adnan Peon

    Messages:
    1,614
    Likes Received:
    82
    Best Answers:
    0
    Trophy Points:
    0
    #8
    *** OR *** AND ***

    this query is now down from 21 billion to 19 billion.

    * OR * AND * shows 16 billion.

    I think if google wanted to tell the people how many pages in their index they would do it outright.

    the query even with a wildcard is not like a select count(*) from google index type query.

    anyways,

    I don't even care
     
    adnan, Jan 4, 2007 IP
  9. mvandemar

    mvandemar Notable Member

    Messages:
    2,409
    Likes Received:
    307
    Best Answers:
    0
    Trophy Points:
    230
    #9
    Are you saying that the logic behind the query should show less pages, or just that 21 billion is just too big to be right?

    It is 2.625 times the size of what it last showed, and between 2001 and 2004 it increased over 5 times the starting size. Why do those numbers surprise you?

    -Michael
     
    mvandemar, Jan 4, 2007 IP
  10. adnan

    adnan Peon

    Messages:
    1,614
    Likes Received:
    82
    Best Answers:
    0
    Trophy Points:
    0
    #10
    I'm saying both.

    21 billion is too high

    there is no query that will give u the number of pages in google's index.

    Look at the index at the back of the book.

    certain words are indexed many times.

    count those --> You won't get the number of pages in the book, u will get far more.
     
    adnan, Jan 4, 2007 IP
  11. hextraordinary

    hextraordinary Well-Known Member

    Messages:
    2,171
    Likes Received:
    115
    Best Answers:
    0
    Trophy Points:
    105
    #11
    My query just came up with 22,650,000,000. I think you can game it with all sort of queries...
     
    hextraordinary, Jan 4, 2007 IP
  12. mihd

    mihd Peon

    Messages:
    136
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #12
    wonder how many of them are due to mfa sites?
     
    mihd, Jan 4, 2007 IP
  13. mvandemar

    mvandemar Notable Member

    Messages:
    2,409
    Likes Received:
    307
    Best Answers:
    0
    Trophy Points:
    230
    #13
    You are right, but I think you are looking at it wrong, or slightly so...

    If I do a query on the word "the", I'm not looking at how many times the word "the" appears in the index, I'm counting (in theory) how many pages contain the word the.

    If I do "the OR and", I should be looking at the number of pages that contain either of those words, not the number of pages that contain the first word plus the number of pages that contain the second word...

    You are right... doing what I did is not exactly like doing a count(*)...but it should be, according to design, almost exactly like doing a "WHERE {anything} LIKE '%'", don't you think?

    I know there are discrepancies, and I know this is because G uses an algo to estimate the number of results, not count them. This is why the site: operator was busted, they had the algo wrong, according to them. However, also according to them (or according to Matt, anyways) they have fixed that.

    Hey, Matt, you lurk here... feel free to jump in if you see this. :p Is this close, or would these diff results we get depending on how we query be one of those bugs you want us to report*? :D

    -Michael

    *Anyone know if he is still reading that list...?
     
    mvandemar, Jan 4, 2007 IP