I have found a site that seems to have hijacked (almost 3,000 sites right now) and is listed in Google. I have checked a lot of sites that have been hijacked and most of the ones I have checked have been deindexed during this update. Right now I am checking to see if they are doing something with AdSense fraud. to see if your site is hijacked do http://www.google.com/search?q=site:cob-web.org&hl=en&lr=&start=420&sa=N&filter=0this search in Google. Can't use the link thingie so do a search for site:cob-web.org in Google. If your site is hijacked then you'll see www.yourdomain.com.cob-web.org in the results set.
http://www.zone-h.org/component/option,com_topatt/Itemid,48/ then you see how many sites get screwwed over from defacers etc etc.
I Think it is causing Google to think it is duplicate content and that's why sites have been removed lately. Sites that used to be in the top spots are not anymore and the sites that are not in the top spot are sites I see on the cob-web site
submit a complaint to google, they have a complains form somewhere on teh Google webmasters center thingy
I have over 1000+ pages of unique content we've published over the past five years......all duplicated with cob-web.org....we're all sick to our stomachs over this. bad news....today all of our pages dropped by 40+ in google. duplicate problem oh yeah. Well done my question is now what. how would you go about contacting google and would they even help? northweb.
I did not find their site when I searched for cob-web. I am curious concerning what they are doing and why they are doing it. It must be a project of Cornell that has gone astray. if you do go to cob-web.org it redirects. Now off to run a server header check. Header check returns: HTTP Status Code: HTTP/1.1 200 OK
It seems like they are some kind of proxy server with cached content and it seems that Google are using the results from this cache somehow. Yes, even Google themselves are there and if you look at the adsense code then the links are rewritten to the google adserver. I don't know yet, if this might be our problems but it looks like a duplicate content penalty to me. I haven't seen this site before and now it has almost 3,000 pages indexed. If Googles crawler finds on of these sites and they don't have fully qualified paths on their site, it will all look like the pages belong to a subdirectory to cob-web.org Googles crawler will follow the links thinking they belong to that site. then it follows other lnks out to other sites that also doesn't have fully qualified paths.... and so on.....
take a look at the robots.txt of the site: User-agent: * Disallow: / I dont think the sites are hijacked... There are worse proxy server sites that allow to index the sites (and thats really not good): E.g http://www.google.com/search?hl=en&lr=&q=site:250.com.au/&btnG=Search http://www.google.com/search?hl=en&lr=&q=site:ir-proxy.com&btnG=Search http://www.google.com/search?hl=en&lr=&q=site:turboprox.com&btnG=Search http://www.google.com/search?hl=en&q=site:myspaceproxy.gr&btnG=Google+Search
Maybe hijacked was the wrong word to use but if you look at the SERP's then all sites looks like they are subdomains to that site. Also, if you don't have a fully qualified path on your site then it looks like all pages belong to their subdomain. Looking at their robots text file, then Google shouldn't have them indexed at, should they? All user agents are disallowed.....
I dont believe google would deindex sites that easily! If they do, it is so open to abuse its unreal! If not, someone go and try an get the BBC news site off of the front 4 pages of google as a proof of concept Maybe it is just coincidence that some sites have dropped down the index as well as getting mirrored by this site?
Some people believe this is a blackhat method on how to delist a url from the index. There again the same problem appeared last year on WMW and one of the proxy owners came forward stating he had no clue that this was happening. If your site is suffering you can contact Google who has removed the proxy sites from the index before. If you can also find that the proxy site is crawling your site you can redirect the IP back to its own home page.
The Sites only show up if you search for them ;-) Google dosnt have a "cache" of these sites. As soon as you enter a site, a connection will be opened to: www.starwishing.com.cob-web.org as the save the logs there: http://www.starwishing.com.cob-web.org/summary.php Google will follow and index the links but not the content. And all of this belongs to: http://www.coralcdn.org/ and the work with university's.... http://planetlab1.cs.st-andrews.ac.uk/ so dont be worried ;-)
I don't know what's going on there either, but with a quick look I can see some of those sites are commercial sites with shopping carts. If you were one of those merchants and suddenly potential or existing customers started reporting that their credit card information was being hijacked, how happy would you be?