One of my bigger sites dropped from 54,000 in weight to a 8900. PR remains the same Pages indexed remains the same I've refreshed validation a couple of times, but it's not budging. ???
Both, although the API shows about half what the www does - but it's been that way for about 3 weeks now. And this is sort of off-topic, but how does Google apply it's duplicate content filter? I decide to start selling widgets online. I go to Google and scope out the top sites for Widget. I build my site using the best practices, including some design and optimization that I lift from the top rankers. I embark on an aggressive yet above board link building and marketing campaign to drive IBLs and traffic. Maybe I have soem well established, high rankig sites I can link from. Maybe I pay for a few months to get IBLs from high PR sites with keyword rich anchor text. Maybe I participate in the co-op.... Six months from now, who's site is considered the duplicate? Go look for a Pacific Down Comforter. There are hundred of merchants selling them. I mean, how much differentation can you have when a bunch of people are all selling the same thing? In other words, how does Google decide A has too much content from B, rather then it being B having too much from A? It's all moot anyway, I guess, since 85% of my traffic comes from MSN and Yahoo anyway..... but Google's used to determine weight, so I need to be there in some capacity.
iShop, I think that you are probably seeing some flux in the API results (I had the same issue for the past two days with one site and today it is back to normal)...if I had to guess. With regards to your down comforter example. I think what G is starting to look for is a number of things when it comes to "duplicate content." One piece is how much "real" content a site has over all. If most of the pages are the same shell with little to no original text vs. the number of links on a page (for example), they don't consider that to be something worthy of being in their index. I have had one site that had thousands of pages removed from their index (the site is still in there and receiving G search traffic...just not as much as it used to). This is not a "big deal" to me as the site is a test bed. This is pretty much the conclusion I have arrived at as to why the pages were removed. IMHO, once a certail pattern develops on your site (over a certain % of pages let's say), G will start to say, "hey, this is junk content, it is going bye bye." The second area concerning duplicate content is essentially identical content. I don't think G minds this as much as some people (who are hyping the dup content issue) think they do. Google has very smart people working there, they are well aware that many online retailers are selling the same products. What it seems to me is becoming more important is that your site has it's own content added to the broiler plate info about the products. Or, maybe there need to be a certain percentage of pages on your site that are unique and not replicated elsewhere. Third, and this is really what I believe the "duplicate content filter" is addressing, is where different cnames (before the first period ie: www.domain, secure.domain, domain, etc) are used to point to the same exact data (or multiple domain names). So, you have domain .com and www.domain .com pointing to the same info (not 301 redirecting domain .com to www.domain .com). I lost all of our search volume to a site from G (for what appears to be this...time will tell as we have remedied it now). The site displayed the same info for the three examples I used above and can be see in the G index. All of the pages are still in the G index in this example, no search traffic from G comes though. Appending &filter=0 to the end of a search query where the site ranked before shows the site ranked #4....without it, it is nowhere to be found. I have also experienced this with two domain names pointing to the same site. After 301'ing one to the other domain, the site came back to life after a couple of months. BTW, this information is simply my opinion based on what I have seen and concluded on our own sites.
Thanks for the well-reasoned reply, chachi. You make some good points. I would still contend that's there's only some much original and compelling content one can write about 'widgets' if that's what one is trying to sell online. And if you have a whole complement of widgets, each one having its own product page, and you happen to stock a lot of styles of widgets, you're going to end up with a large mass of product detail pages that are essentially identical except for a few descriptive phrases. My site is a perfect example. 400,000+ products from 40+ stores and one detail page that pulls product info from a database: Name Description Price Store Link Compared to each other, I would image my product detail pages looks 75%-80% identical to each other no matter what product is being shown. Is that duplicate content to Google? I do think filtering based on xxx.widget and yyy.widget and zzz.widget, when they all go to the same place, is a good idea. I find it hard to believe that Google has yet to figure out xxx. com and www.xxx. com are the same thing. Nevertheless, off I go to do my 301....
did you check to see if there are pages for your site with and without the www in the G index? I would be curious if there were. Also, I completely understand your issue with the huge number of products you have on your site. That is why I am not sure G sees the same information on 200 websites as being duplicate. I think there are other factors involved and why I think adding a blog or some other area of your site that can serve as a "fresh content area" is a good idea. I think that consitently fresh and relevant content (maybe just a paragraph a day) is probably going to play a bigger role in the future search algos. G has their news service and MSN just announced that they improved their newsbot service inclusion before releasing MSN Search today. Might be something for you to look into as well.
Good suggestions, chachi. xxx. om and www.xxx. com for my sote show different results. Close, but different. That's why tonight I'll probably do that 301.....
well, hopefully that will clear things up for you. You may want to do the 301 and then email them and let them know what you did and that you hope that did not cause any problems for your site.