Google losing thousands of pages?

Discussion in 'Google' started by Owlcroft, Oct 25, 2004.

  1. #1
    Does anyone know if the site-page counts returned by the site: inquiry are to be considered reliable indicators of what Google really has indexed?

    I have been checking ten datacenters every day at midnight Pacific Time to see how a large new site of mine is doing in getting indexed. For simplicity of presentation, I will here just track one of those datacenters, 216.239.59.104, but will also show the average for all ten centers:

    Sunday, 17 October: 78,100 [10-center average = 74,230]
    Monday, 18 October: 80,200 [10-center average = 76,840]
    Tuesday, 19 October: 80,200 [10-center average = 76,830]
    Wednesday, 20 October: 82,400 [10-center average = 78,070]
    Thursday, 21 October: 70,800 [10-center average = 67,230]
    Friday, 22 October: 70,800 [10-center average = 66,790]
    Saturday, 23 October: 64,300 [10-center average = 61,390]
    Sunday, 24 October: 71,900 [10-center average = 66,470]
    Monday, 25 October: 71,900 [10-center average = 67,760]​

    Notice that from Wednesday to Saturday--72 hours exactly--Google "lost" over 18,000 pages (22%) from this one site (and, as the 10-center averages indicate, this was not a phenomenon confined to some one datacenter).

    I can assure you 100% that the pages of the site were all fully present throughout this period. And this is not the first time this has happened, in just the two months or so that the site has been up, though its the first I have documented so exactly.

    Does anyone have any idea in the world what this is about? I feel like I'm trying to climb a glass mountain.
     
    Owlcroft, Oct 25, 2004 IP
  2. Smyrl

    Smyrl Tomato Republic Staff

    Messages:
    13,740
    Likes Received:
    1,702
    Best Answers:
    78
    Trophy Points:
    510
    #2
    Owlcroft, I have not been watching SERPs that closely in over a year but during big dance days I have seen Google revert to an old index while they shuffled the new data? I wonder if this could be happening now in your area?

    I hope that is what is going on in your case and new pages reappear. The fact that you are observing this on multiple data centers is problematic.

    I hope someone watching the site command recently can shead some light on their observations.

    Shannon

    -------------

    I just checked several of my sites and site command appears to be fairly accurate.

    s
     
    Smyrl, Oct 25, 2004 IP
  3. schlottke

    schlottke Peon

    Messages:
    2,185
    Likes Received:
    63
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Are you noticing a loss in traffic? I wonder if the report could just be leaving some out for whatever reason.
     
    schlottke, Oct 25, 2004 IP
  4. fryman

    fryman Kiss my rep

    Messages:
    9,604
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    370
    #4

    They aren't. I get loads of session urls indexed for a few days, and then they die. One day my site can have 2,500 pages indexed, later it will show 2,000...
     
    fryman, Oct 25, 2004 IP
  5. Michael

    Michael Raider

    Messages:
    677
    Likes Received:
    92
    Best Answers:
    0
    Trophy Points:
    150
    #5
    Not at all reliable and when you look at it across datacenters, which is essentially 'work in progress', the numbers are meaningless imo.

    If a client wants me to measure the long term trend of pages indexed in Google I use http://services.google.com/cobrand/free_select

    The results are as they say approximate but they seem to be less erratic than the site: command.

    - Michael
     
    Michael, Oct 25, 2004 IP
  6. john_loch

    john_loch Rodent Slayer

    Messages:
    1,294
    Likes Received:
    66
    Best Answers:
    0
    Trophy Points:
    138
    #6
    Owlcroft,

    I too see the same thing (I have a large site that typically saw a reduction in figures by up to 15,000 pages) - the site has around 130,000 pages indexed.

    I've found (for me) that the site command becomes more reliable when I use an arg with it - here's how i do it..

    'site:mydomain.com.au mydomain'

    I won't even try to explain why this works (I don't know or care), but it does provide more consistent results for me. The longest sampling I have for this is one month, sampled daily, one DC only (2 months back).

    There were variations, but @ reduced percentile.

    Cheers,

    JL.
     
    john_loch, Oct 25, 2004 IP
  7. jfontestad

    jfontestad Well-Known Member

    Messages:
    1,236
    Likes Received:
    20
    Best Answers:
    0
    Trophy Points:
    148
    #7
    If you have an AWS generated website then Google is dropping them COMPLETELY left and right like crazy.... if not disregard this... :\
     
    jfontestad, Oct 25, 2004 IP
  8. Owlcroft

    Owlcroft Peon

    Messages:
    645
    Likes Received:
    34
    Best Answers:
    0
    Trophy Points:
    0
    #8
    I am at least reassured that others have seen analogous patterns. A few comments on the many replies so far:


    "I have seen Google revert to an old index while they shuffled the new data"

    It has happened twice--at least, I wasn't monitoring closely at first--within a couple of months, and each time the numbers would only climb up slowly from the dropped-to level, rather than show a reverted jump-up at some point; so I doubt it's just index shuffling.


    "Are you noticing a loss in traffic?"

    The first time, there was a correspondingly terrible drop-off. This time, there was a drop-off that has, to my amazement, been followed by a huge increase (based on AdSense impressions); it's too recent for me to have good channel data yet--just a couple of days so far--but it does seem to be an increase in actual impressions (not CTR or CPM).


    "They [site: reports] aren't [reliable]. I get loads of session urls indexed for a few days, and then they die."

    These are not sessions, but they are PHP-generated dynamic pages. But the same pages are always available at the same (mod_rewritten static) URLs.

    "I've found (for me) that the site command becomes more reliable when I use an arg with it . . . ."

    I'll have to give that a try. Strange and wonderful are the ways of The Great And Powerful Google.


    "If you have an AWS generated website then Google is dropping them COMPLETELY left and right like crazy . . . ."

    Not this one. I do have a bunch of others that are, and I will have to look at those to see what is happening with them, but this one is quite unrelated to Amazon. (The AWS-derived URLs are all mod_rewritten to static; I wonder if that will matter . . . .)


    "Not at all reliable and when you look at it across datacenters, which is essentially 'work in progress', the numbers are meaningless imo."

    I wonder: the datacenters all seem to move in lockstep, or close to it: they don't have the same totals (though a few match up with one another), but they seem to gain--or lose--at almost the same pace.

    "If a client wants me to measure the long term trend of pages indexed in Google I use
    http://services.google.com/cobrand/free_select"


    Curiouser and curioser: that give about *half* the number that site: gives.


    I can understand why Google might want to be coy with PageRank values, but why this bizarre mystery about something so simple and innocuous as page counts?
     
    Owlcroft, Oct 26, 2004 IP
  9. lowrider14044

    lowrider14044 Raider

    Messages:
    260
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Interesting. I get aboit 11% less pages in the customizable search compared to site: search. Site: seens to be the least accurate in my case. Neither are right though.
     
    lowrider14044, Oct 26, 2004 IP
  10. Jan

    Jan Peon

    Messages:
    129
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    All my online books / pdf files dropped out of the G index.
     
    Jan, Oct 26, 2004 IP