I know what you mean, but from what I can gather an awful lot of the sites affected by the Supplemental page problem are large database-driven stores using a standard template for thousands of pages. With very little change between the pages (outside of variations in price / title / and small description) you can see why such pages have been dropped from the main index. I have one content site that has gone 99% Supplemental. It is a database-driven site that uses a standard template across 150 (or so) pages, but the main components on each page aside from the main navigation are totally unique. I cannot understand why it is still in the Supplemental index, even after the heavy spidering it has been getting for the last couple of weeks. Did that help you Chris? Have you noticed whether the page / directory that had the new incoming link has come out of the Supplemental index?
For my largest site (500+ pages), most pages have a similar name+Sr no. format, though the content is unique. All these pages have been included in the supplemental results.
Couldn't it just be the case that you notice that these sites more because of the numbers of pages generated? For example, If i had a site with 30 pages of products and only 20 pages now show up, thats a 33% drop in the index. If I have a db driven site which has 10,000 pages and 3.5K pages are suddenly dropped then its the same ratio of dropped pages, but just sounds more serious because of the initial size. I'm also not a believer in the template theory and there has been quite a few posts on WMW about forums losing thousands of pages. You can't really get more unique, fresh content than forums where maybe 90% of the text is distinct to that page. Nearly all sites have similar navigational elements, so there would be no reason for penalising just on this. Microsoft.com uses a very similar navigational layout, and how many pages does that have in the index? 463 million. Personally I think its related to the big daddy changes where different bots were being used to calculate rankings for different datacenters and it all just got screwed up. There was rollbacks to cache dates from sept 2005 which signifies something wrong with google rather than the 1000's of webmasters sites which had done nothing different from the day before they were dropped. Although annoying and costly, I really do think its a case of sit and wait.
No Andy, the new page still has not been indexed yet. The other thing I have noticed is that the frequency of the crawl is less. It takes about 7 days between crawls. However, the code may not be catching the new bot and notifying me of it. I'll have to check that out.
For the first time in weeks, and after some extensive spidering, I've actually had 5 main category pages added to the index. However, the large number of Supplemental pages are still there. Interestingly only 2 of the new pages added to the index are visible using the site: command. To see the other 3 new pages added I have to repeat the search with omitted results included. It also looks like 72.14.207.107 is showing just the newly added pages with the site: command, which is a very recent development.
One of my newer pages finally got indexed. macdar shows this page indexed on 4 of 6 DCs. Hopefully it continues and hopefully improves. It is currently on page2 of the google serps for my keyword phrase. I also noticed the number of pages indexed is changing. It is dropping but that is ok. I am hoping some of the pages I put in robots.txt are being removed from the index. Trying to eliminate some of the duplicate content.
I've just done a site: search and found a supplemental with the oldest cache date I've seen so far, Oct 2004. It's a dinosaur! I keep thinking about what Doc might say in 'Back to the Future'... if the current version of that page encountered the 2004 version in google's index would it create a paradox opening up a rift in space & time, thus destroying the universe? ...Or does it just mean a dup content penalty and more sh*t results for searchers?
While monitoring the supplemental (yes still many many of my pages) I noticed something new today. I was checking my stats & error logs. I noticed a bunch of 404 errors in the log that show truncated URLs. I haven't checked yet if these are coming from google but I suspect that they are.
Any updates? I am still stuck with ~220 supplemetals, one even showing a cached version from 2004. No changes whatsoever, I´m pretty lost.
It's a bit sad to say, but I think that those who are still in supplemental need to start thinking that their sites are not well-adjusted to the current Google algo whatever that is. Maybe Google has a bug, maybe Google is not crawling right, but the fact of the matter is that their sites are not well-adjusted to the current Google algo. So it's about time to make some change, IMO ...
I have a site which droped from 630.000 > 22.000 indexed page in Google index. Of course, this fact affect also the visitors trafic. So, the users which need the information contained in this 600.000 pages droped from the index in the future will not find it/or will find it after deep research in google. This action take time > time is money > maybe other search engine will provide the information quickly in order to save money > me and my target visitors will use other search engine. And so on. If my considerations are correct, it may be the moment to sell our Google stoks ASAP.
GoogleGuy posted on WebmasterWorld with regards changes to Google's indexing with the new BigDaddy infrastructure. I think I'll be emailing them. Shoot them a message before they drop that email address, Matt Cutts had a similar one setup a while ago (the last SES maybe?) and it became obsolete after a few weeks.
I emailed them a day or so ago. Why not. I thought I was going to recover last night. I did a site:mydomainhere.com and it showed only 14 results total. I thought maybe it was reindexing. This morning I see 4380 results and page 7 through the end of the my google listings shows all supplemental results. I checked a couple keyword phrases and do see a small improvement for this site so maybe something is finally happening.
I posted and stated earlier that we need to make some adjustment on our end, but I'm not sure about that now. I briefly went over discussions related to this at WMW and I also checked several sites. It seems that so many more sites are affected by this supplemental/de-indexing problem than a couple of months ago, and I'm quite certain that so many of them have nothing to do with massive link exchange, Coop, Link Vault or paid link. It seems that Google does have some problem.