I have read the story. I have seen this happen with some directories and toplists also. Sites that do not use actual link to sites or redirections sometimes get thier url in google and the sites url taken out. I think google will have to fix this very soon or alot of sites will have this problem. It seems to show google has not worked out the duplicate content problem and should be take those sites off not the actual site with the content.
Interesting read. Noticed this with a couple of my sites, one in particular, where proxified pages are being indexed while more and more of the real pages are being dropped from the index.
I discovered this exact thing a few months ago - I reported it to Google immediately. I noticed it happening for one site I'm running, and Google fixed it for the site - but they didn't do it for the other sites I gave as examples. I asked why and they gave a long boring explanation about the algorithm (that was probably taken right off their FAQ). End Story: Yes you can do this and it works, but it has to be done quickly before that site can be properly indexed. edit~ after reading the article, I'm pretty certain that a competitor may have done it. I'm certain this is happening in most cases, because there's no way google could just pick up on a proxied link to your page by accident.
This is sort of shocking, really. They are supposed to be the "top" SE with the most "honest" search results. They pay their top people literally hundreds of thousand to find bugs like this, and they should have been on top of it already.
This seem like an answer to why all of my pages disappeared from main index then come back a few weeks later in SERPs, then gone again. By tracing back, i discovered the proxy site with subdomain belong to onlinehomeDOTus. -Blocking the proxy site data center seem solved for now. Any one got a better way?
That method seem too complicated for a non-coder, plus it developed for wordpress only. Either Google do something about it (nah they wont) or some one out there came up a better way to defense ourselve.
Yeah I agree, too much coding for me. I have tried simply blocking proxies using htaccess but so far that hasn't worked at all.
To me the most troublesome part of this story is the fact Google was informed a year ago and the problem still persists to this day. In fact, it is getting worse and will continue to until Google fixes it.
Yes it's indeed getting a lot worse, as the number of proxies just keeps growing like crazy and very few proxy owners use robots.txt to prevent indexing of proxified pages.
Google has never fixed problems, the 302 hijacks still exist, proxies and duplicate content remove websites (more easily now), canonical triggers can still hurt. The only semi-fix was the webmaster tools which will only work if you know to sign up to use it and the factors still exist for novice webmasters which I assume is a very large number. I will give them the rare fact that this issue only happened to a small number of sites. I would not hold you breath for a fix as I cannot recall one ever. Can You?
Hi, I'm a proxy owner, and I was wondering why my proxied pages were being indexed. This must be the reason I guess. How can I edit my robots.txt to remove the proxied pages - do I have to exclude them manually? Cheers, Paz.