Why would a 301 redirect be interpreted as a duplicate penalty? If Google is doing that, it's just one more proof of a major screwup.
Most of the time you donno what google will with your website with next update. At least for small size sites
Yesterday I found that if I search for a certain phrase that I know I should find in my site, in first place in the SERPs appears the old URL as supplemental result and in the last position the new one. I redirected the site more than six month ago and now I have more old URLs indexed than new ones. I used to have 400,000 URLs indexed. Now I have 50,000 old URLs and 15,000 new ones.
By the way, I just verified that the cache of the old URLs showing in the SERPs are from last August. Almost one year old.
This is almost exactly the situation I find myself in. This simply should not be happening but it is. There is no way Google could have intended that year old pages that no longer exist should be supplanting newer pages but that has been true ever since Big Daddy. That is my point: Big Daddy was and is a failure. And it is the boondoggle that finally really did break Google.
Minstrel, I have three 'penalized' sites. Two were using co-op, but the three of them were redirected with 301, one from a domain to a subdomain and two from domain.com to www.domain.com. Now I am not so sure that the current problem is the co-op.
As I said in another thread, I don't think looking for "the cause" will work with Big Daddy. Big Daddy was a HUGE mistake... err... update. I don't think there was only one single issue they were trying to fix and I don't think there was one single thing that went wrong. I believe they tried to address several ongoing issues and in the process broke the index/SE in several different ways. Link schemes like the Coop were one of the things they tried to address. I don't know what the issue is with 301s but with some of what is happening currently I also find it difficult to believe there isn't something messed up there too.
Google is and has been storing duplicate content for the same URL for some time now. Look at the supplemental index almost like a "backup". There can be data in the RI pointing to www.mysite.com There can also be data in the SI pointing to www.mysite.com. The supplemental data is old data as you can see by the cache. There can also be data pointing to the URL only list of URL's that were yet to be crawled at some point in time. If for some reason Google is not able to "go live" with the data in the RI for a particular URL, AND they find that data in the SI, albeit old data, they will go live with it instead. It is also possible, depending on the number of results returned for a particular query, if they do not find data in the RI or the SI that a URL only result is returned for the the very same URL. Many of the problems folks are seeing is with Google not being able (willing?) to go live with data from the RI. Dave
Somehow, and for some godly dumb reason, google has screwed up big time this time and they do not know how to fix it! My site(s) went from being really good to almost nothing and the cached pages, supplemental, are from August of last year. I have said before, or tried to at least, that it seems like they are using these old pages to do their crawl because I can see that they crawl old pages with links to old pages that is still on my server but is obsolete. I have removed some of them (renamed) and then I get errors in my sitemap (which is down for the moment if I want to look at indexing!!!) saying Google can't find them or redirect errors or some other garbage!! Maybe we all should use NOCACHE in our meta tags? or GOOGLE NOARCHIVE? Another thing is that googles crawler do not care about the Robots.txt file! I have told them NOT to crawl subdirectories but they do not care!! Doing it over and over
I just did a site: search of my domain and found tons of cached supplemental pages from August last year. I clicked on the cache link and it took me to where it was going to show me the cached page but then it went to another page saying: Not Found The requested URL /index.html?http://64.233.167.104/search?q=cache:kAmx-h2jFooJ:www.domain.com/search/index_Letter.php?catid=40 So even if the search says it is cached it is not in the cache. The page still exists and should be indexed but is not. It's supplemental and cached according to google, but it's not.... I am confused If I do a site: search it seems that ALL my www.domain.com are supplemental right now
I found another of my sites behaving this funny way. In this case, it is a subdomain where I have never used a redirect. It is an affiliate catalog with over 100,000 products listed, having each one its own page. This affiliate catalog is duplicated in other sites, changing only header and footer. I checked other sites hosting this catalog and all of them are in the same boat. Duplicated content, for a reason or another, is what is causing this huge problem, in my opinion.
Unfortunately you're not alone. Doesn't make it any easier to take though. This has been a problem fron the very beginning with BFD. The inability, or unwillingness, to get freshly crawled data to "go live". In it's absense, old data gets served. Dave
The correct instruction is robots no archive (e.g. <META NAME="ROBOTS" CONTENT="NOARCHIVE">) not Google noarchive. I personally use this command on all of my sites. I don't like third party servers serving up cached versions of my pages. I've never had a problem with Google not obeying the robots.txt file. What you might be seeing is crawlers pretending to be Googlebot. Do a WHOIS on the IP addresses and see if the IP addresses belong to Google or if they belong to someone else. Also double check to make sure your robots.txt file is properly formated.
I have been checking other competitors sites and everyone that does NOT have database generated pages are fine. No supplemental pages at all. My site is generated since I have all my data in a database and it seems like all generated pages (i.e. they have a ?catid=whatever) are supplemental. The pages that do NOT have a "ridirect" are fine Morons Seems like they are trying to remove spam sites using "redirects" starting in August last year and all other sites are cought in the middle