The three levels of being the Google index

stereolab Peon

Messages:: 151

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#1

I posted this question in another forum, but maybe it was the wrong forum.

Still trying to learn the mechanics behind Google, but it seems there are 3 different levels for an indexed page:

1.) Only the URL with no other info (nothing cached, no addition meta info, just the url link to the page

2.) 'Supplemental Results' (if you do a 'site:domain.com' query, some pages have their titles indexed and some additional meta info from the page, but no cache and a term called 'supplimental results' next to it

3.) Fully cached and indexed with everything in it

now, i'm pretty sure i have seen pages go from step 1 to step 3. But what's up with step 2? is that a kind of pergatory for pages that Google will admit exist, but give no credence to? Or do all pages go slowly from one step to the other?

stereolab, Sep 13, 2005 IP

digitalpoint Overlord of no one Staff

Messages:: 38,334

Likes Received:: 2,613

Best Answers:: 462

Trophy Points:: 710

Digital Goods:: 29

#2

1. Typically this means Google knows about the URL (from a link to it), but has not spidered/indexed it yet.

2. This is Google's secondary index. Only Google knows for sure why some are there, but from what I've observed, in most cases, they have a cache date that is months old (so maybe when a page hasn't been respidered in a certain amount of time it moves there).

So typically you will see pages go from 1 to 3 to 2 (if they never are reindexed).

digitalpoint, Sep 13, 2005 IP

stereolab Peon

Messages:: 151

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#3

digitalpoint said:

1. Typically this means Google knows about the URL (from a link to it), but has not spidered/indexed it yet.

2. This is Google's secondary index. Only Google knows for sure why some are there, but from what I've observed, in most cases, they have a cache date that is months old (so maybe when a page hasn't been respidered in a certain amount of time it moves there).

So typically you will see pages go from 1 to 3 to 2 (if they never are reindexed).
Click to expand...

how odd. i had a site not on the Digital Point that had all its pages created on the same day, googlebot did its thing and it seemed to create these two classes of pages immediately. some were a-class cached pages, and others were b-class supplemental pages.

stereolab, Sep 13, 2005 IP

disgust Guest

Messages:: 2,417

Likes Received:: 133

Best Answers:: 0

Trophy Points:: 0

#4

I've actually seen a huge rise in the number of supplemental results from newer domains as of late. I know for sure certain errors can get you in this seperate index, but it certainly isn't the only thing.

it sure seems to me like supplemental results seem to spring up when there are issues of duplicate content or not enough content variation between pages (just a few things changing, rather than full paragraphs of text, etc).

disgust, Sep 15, 2005 IP

jlawrence Peon

Messages:: 1,368

Likes Received:: 81

Best Answers:: 0

Trophy Points:: 0

#5

I know that many people will say that meta descriptions don't matter any more, but I've had under construction sites get large amounts of supplementaries when the pages have the same meta descriptions (or non at all). Change the descriptions so that they are unqiue and the pages became cached as 'normal' pages.

jlawrence, Sep 15, 2005 IP

Pammer Notable Member

Messages:: 3,417

Likes Received:: 397

Best Answers:: 0

Trophy Points:: 260

#6

jlawrence said:

I know that many people will say that meta descriptions don't matter any more, but I've had under construction sites get large amounts of supplementaries when the pages have the same meta descriptions (or non at all). Change the descriptions so that they are unqiue and the pages became cached as 'normal' pages.
Click to expand...

Ohh.. Well ! Is that really Work. I just try it.

Pammer, Sep 15, 2005 IP

aeiouy Peon

Messages:: 2,876

Likes Received:: 275

Best Answers:: 0

Trophy Points:: 0

#7

I think factors like duplication and relevance can possibly put pages in the supplemental index.

I had some pages on a few sites that ended up in the supplemental index, which after making some changes to reduce duplication from other less relevant pages moved to the regular index.

Obviously nothing scientific about it, but all these pages were less than 6 weeks old at the time.

aeiouy, Sep 16, 2005 IP

stereolab Peon

Messages:: 151

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#8

aeiouy said:

I think factors like duplication and relevance can possibly put pages in the supplemental index.

I had some pages on a few sites that ended up in the supplemental index, which after making some changes to reduce duplication from other less relevant pages moved to the regular index.

Obviously nothing scientific about it, but all these pages were less than 6 weeks old at the time.
Click to expand...

ok, so it doesn't necessarily have to be a permanent state. weird. i've been looking on other boards, and nobody really has a good answer for what this is. i read the google reply, but that was far from useful. thanks

stereolab, Sep 16, 2005 IP

iconrate Well-Known Member

Messages:: 457

Likes Received:: 9

Best Answers:: 0

Trophy Points:: 138

#9

Supplemental results to me seem to be pages that 1. haven't been cached in a long time and 2. have no internal links or external. For example a page that you no longer link to from your site such as a links.php that is no longer in use. Or a page that no longer exists but is still in the index seems to make it into supplementals as well.
I've got some 50,000+ pages in supplemental due to a restructuring of one of my sites
Pages no longer exist but are still cached, and moved to supplemental.

iconrate, Sep 18, 2005 IP

nohaber Well-Known Member

Messages:: 276

Likes Received:: 18

Best Answers:: 0

Trophy Points:: 138

#10

Google's major data structures related to your page are:
1. Google learns about the existence of your page. Your page gets in the list of URLs to crawl (crawling queue). The anchor text of the link may get in the index (probably depends on the quality of the link).
2. Google reads your page and puts it in the repository. The repository is NOT the index. Google may keep more than 1 version of your page in the repository. The snippets in the SERPs are from the repository, NOT the index. If your snippets shows your latest page version, it does not mean it is indexed.
3. Google puts your page in the index. The index is made up of hit lists for every word in the lexicon. Basically, a hit list for a word contains a list of all documents containing it and hit info like the position of the word, font size etc. The index contains on-page hits for only one version of your page. On-page hits are put in the index after your page gets crawled and put in the repository.

nohaber, Sep 18, 2005 IP

Log in or Sign up

The three levels of being the Google index

stereolab Peon

digitalpoint Overlord of no one Staff

stereolab Peon

disgust Guest

jlawrence Peon

Pammer Notable Member

aeiouy Peon

stereolab Peon

iconrate Well-Known Member

nohaber Well-Known Member

Useful Searches