The three levels of being the Google index

Discussion in 'Google' started by stereolab, Sep 13, 2005.

  1. #1
    I posted this question in another forum, but maybe it was the wrong forum.

    Still trying to learn the mechanics behind Google, but it seems there are 3 different levels for an indexed page:

    1.) Only the URL with no other info (nothing cached, no addition meta info, just the url link to the page

    2.) 'Supplemental Results' (if you do a 'site:domain.com' query, some pages have their titles indexed and some additional meta info from the page, but no cache and a term called 'supplimental results' next to it

    3.) Fully cached and indexed with everything in it

    now, i'm pretty sure i have seen pages go from step 1 to step 3. But what's up with step 2? is that a kind of pergatory for pages that Google will admit exist, but give no credence to? Or do all pages go slowly from one step to the other?
     
    stereolab, Sep 13, 2005 IP
  2. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,334
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #2
    1. Typically this means Google knows about the URL (from a link to it), but has not spidered/indexed it yet.

    2. This is Google's secondary index. Only Google knows for sure why some are there, but from what I've observed, in most cases, they have a cache date that is months old (so maybe when a page hasn't been respidered in a certain amount of time it moves there).

    So typically you will see pages go from 1 to 3 to 2 (if they never are reindexed).
     
    digitalpoint, Sep 13, 2005 IP
  3. stereolab

    stereolab Peon

    Messages:
    151
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #3
    how odd. i had a site not on the Digital Point that had all its pages created on the same day, googlebot did its thing and it seemed to create these two classes of pages immediately. some were a-class cached pages, and others were b-class supplemental pages.
     
    stereolab, Sep 13, 2005 IP
  4. disgust

    disgust Guest

    Messages:
    2,417
    Likes Received:
    133
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I've actually seen a huge rise in the number of supplemental results from newer domains as of late. I know for sure certain errors can get you in this seperate index, but it certainly isn't the only thing.

    it sure seems to me like supplemental results seem to spring up when there are issues of duplicate content or not enough content variation between pages (just a few things changing, rather than full paragraphs of text, etc).
     
    disgust, Sep 15, 2005 IP
  5. jlawrence

    jlawrence Peon

    Messages:
    1,368
    Likes Received:
    81
    Best Answers:
    0
    Trophy Points:
    0
    #5
    I know that many people will say that meta descriptions don't matter any more, but I've had under construction sites get large amounts of supplementaries when the pages have the same meta descriptions (or non at all). Change the descriptions so that they are unqiue and the pages became cached as 'normal' pages.
     
    jlawrence, Sep 15, 2005 IP
  6. Pammer

    Pammer Notable Member

    Messages:
    3,417
    Likes Received:
    397
    Best Answers:
    0
    Trophy Points:
    260
    #6
    Ohh.. Well ! Is that really Work. I just try it. :rolleyes:
     
    Pammer, Sep 15, 2005 IP
  7. aeiouy

    aeiouy Peon

    Messages:
    2,876
    Likes Received:
    275
    Best Answers:
    0
    Trophy Points:
    0
    #7
    I think factors like duplication and relevance can possibly put pages in the supplemental index.

    I had some pages on a few sites that ended up in the supplemental index, which after making some changes to reduce duplication from other less relevant pages moved to the regular index.

    Obviously nothing scientific about it, but all these pages were less than 6 weeks old at the time.
     
    aeiouy, Sep 16, 2005 IP
  8. stereolab

    stereolab Peon

    Messages:
    151
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #8
    ok, so it doesn't necessarily have to be a permanent state. weird. i've been looking on other boards, and nobody really has a good answer for what this is. i read the google reply, but that was far from useful. thanks
     
    stereolab, Sep 16, 2005 IP
  9. iconrate

    iconrate Well-Known Member

    Messages:
    457
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    138
    #9
    Supplemental results to me seem to be pages that 1. haven't been cached in a long time and 2. have no internal links or external. For example a page that you no longer link to from your site such as a links.php that is no longer in use. Or a page that no longer exists but is still in the index seems to make it into supplementals as well.
    I've got some 50,000+ pages in supplemental due to a restructuring of one of my sites :)
    Pages no longer exist but are still cached, and moved to supplemental.
     
    iconrate, Sep 18, 2005 IP
  10. nohaber

    nohaber Well-Known Member

    Messages:
    276
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    138
    #10
    Google's major data structures related to your page are:
    1. Google learns about the existence of your page. Your page gets in the list of URLs to crawl (crawling queue). The anchor text of the link may get in the index (probably depends on the quality of the link).
    2. Google reads your page and puts it in the repository. The repository is NOT the index. Google may keep more than 1 version of your page in the repository. The snippets in the SERPs are from the repository, NOT the index. If your snippets shows your latest page version, it does not mean it is indexed.
    3. Google puts your page in the index. The index is made up of hit lists for every word in the lexicon. Basically, a hit list for a word contains a list of all documents containing it and hit info like the position of the word, font size etc. The index contains on-page hits for only one version of your page. On-page hits are put in the index after your page gets crawled and put in the repository.
     
    nohaber, Sep 18, 2005 IP