That is probably just a mistake in the coding, where the url is injected into the links. Maybe it thinks there is a mistake on the page, or it is trying not to get caught in a loop.
I got an email from a guy saying that he contacted them and they told him he could "opt out" and they would block his URL. I don't think this is right and they might get a lot of problems with gov sites and other sites link FBI Update myself: guess fbi.gov has an Access denied page already..... So that raises the question. Do they KNOW they'll be in deep shit if fbi.gov showed up? whitehouse.gov is till there.....
I noticed the same earlier. Some of those urls have been removed and I was thinking it was at the request from the sites owner.
What is more interesting about the links to the left and the Title of it is that it says: TOP SUB DOMAINS and then other sites are listed as sub domains to cob web. So it SEEMS that quantcast is seeing it as sub domains!
yeah, i'll quote myself The above search bring up the cob web "sub directory" of a site that is not even mentined in Googles index! Wonder if this guy/girl is happy about that or not? Do they get more hits? Did they get hit by a penaly?
If we're looking at the same thing (cleanclothesconnection.org?) that is cache of a domain that is parked with a 302 redirect to what looks to be a server for an internet consulting/service company. There is no content, and the proxy follows the redirect too. It maybe a site that was taken down: http://www.google.com/search?q=site:cleanclothesconnection.org The cache of those pages is pretty old. 'south pole clothes' shows as the title because that is the anchor text on a page somewhere to the site, see the 'try it' example in my other post... I don't think they're worried.
I did a post about this after reading through the thread and was interested to see how Google is actually a partner of these projects. Feel free to take a look at the Google Blunder. If I am missing anything drop me a line or blog it and send me the URL as I have lots more reading up to do on the subject.
Last time I contacted an edu about a student utilizing thier servers to do well with a college affiliate program it was shocking to get a responce that the webmaster saw no issue with it ... I finnaly gave up. At least the taxpayers dollars was put to good worK?
This must be something new and I don't know if the new Google downgrade was a result of them indexing that site or that sites showed up because of the downgrade? Hen and the egg, huh? Still wondering if Google treat the site as having tons of sub-domains and of Google treats it as duplicate content and penalized sites for it......
Google can and will see sites with this issue as duplicate content. There are people out there that have been slammed from Google because of the proxy site. I still have more to read up on so hopefully I can find as much info as possible and make more sense of it all. Google is very much involved in those projects but how their bots are acting with the nodes is another thing.
STOP! Good Lord! Have any of you looked to http://www.cob-web.org and read, at least the front page? It's a project WITH Google, (and Yahoo!, and more) through Cornell. It's is 'just' a distributed "web of caching proxy servers"... Not scrapers, not hackers, not aliens with green skin and lousy voting record trying to steal your 2 pence of Adsense. Please, for the love of G*d, read up on stuff before start running in circles and screaming, hands in the air...
Could you give examples, please? I cannot see how this would happen without Google caching the site...
are they indexing caching servers? Is their algo clever enough to exclude "sites" from a caching server and don't penalize site for duplicate content? I have seen links where people actually use the cob web url instead of the real url. How's this going to affect how Google and other SE's index it? I haven't seen things like this before so I am wondering if this is an experiment that just went wrong or is it a start of something new where Google won't crawl your site anymore? All they need to do is to go to a caching server and get your pages. Just speculating.... Anyway, I don't expect to see results from a caching server in the SERP's. Do you?
I can tell from your article we have read some of the same material. I am not saying that proxies are not a problem, I am saying that THIS proxy is not a problem... I would be more worried about this: http://72.14.209.104/search?q=cache:theopaqueproject.com/event/nph-proxy.pl/010110A/http/www.tv.com/ But Google has been better about such things of late. I have found one cached cob-webbed page. My best guess is that there was a configuration error that allowed it. Since it seems to be so rare, my bet is that the error was on the target server, and not the proxy server. Now, I did mention that some of these urls have taken the anchor text as the title rather than the url. The site: search from the OP shows some of these. I can see where they may lead to an extra entry for a keyword phrase because of the way Google is handling things. But again, without the cache, it cannot be seen as duplicate content, it is not on the same domain, so no penalty to the site due to the cob-webbed url. The using of the anchor text by Google as the title of the page does show a sign of Google reversing direction on the 302 hijack fixes, however. This fact bothers me, it is a sign of that redirect bug. I can't find one ranking for any keywords of real value, but here is an example of a cob-web url ranking: http://www.google.com/search?q=tight+shiny+clothes I can't see the advantage of the phrase tight shiny clothes to that site, or why it may have been anchored with those words. It would not rank for the phrase otherwise, and merely mentioning tight shiny clothes in this thread would probably cause this thread to outrank it. It is basically an extra entry. The proxy breaks forms, I assume it breaks adsense, I don't understand the advantage to making that url rank on purpose. I don't think it would rank at all if the phrase were more competitive, since the text does not include all the words, there is no cache, no description, no keywords, etc. for Google to go by...
This would be interesting. I wonder how they would deal with staleness, as you mentioned. Yes. I have asked the Beehive project to give me some feedback to concerns mentioned here.