Im not sure what your comment means...Google doesn't ban people for duplicate content. They only "penalize" one of the copies (Mad4 hit it on the head in his post) as far as the Spam point, how is it Spam? Certainly you can see someone as big as Google having the need to also have the subdomain. If people want to get to Google Base, they are going to do a type in of two variations. google.com/base and base.google.com They aren't creating the two to manipulate their rankings
I don't recall saying anything about "banning"? They penalize by "de-indexing" the duplicate content. What do you do when every single page is identical? Define this for me... Identical content being served up from a folder off the root AND a subdomain off the same root. Since when does anything have to be intended to manipulate rankings for it to be SPAM? How many identical sites in the index are permissable by your standards? How exactly does having 2 identical sites benefit the searcher? The need to have to have two sites of more than 2.3 million pages each containing identical content in the index? Take a look at craigslist.org. Do a Google search for "wedding forum" (minus the quotes) and jump to page 5 or 6. At what point does this become SPAM? Never? As far as my comment's meaning... If Google should decide that subdomains are going to to be viewed differently then they are currently (ie as a separate site rather than an internal page), then they have an identical site already in the index as folder. Dave
De-indexing is the same as a ban. If a site is removed from the index its banned. This is NOT what happens with the duplicate content filter. Sites filtered from the serps are NOT banned or de-indexed. They are still present in the index but are not shown in the results for certain queries.
No they are not. Pages considered to be "duplicate", probably better said "identical", can and are removed from the index altogether all the time. Dave
Duplicate content does not equal automatic removal. This is an old theory that got twisted from one person's assumption like 2 years ago, that everyone started repeating. Seriously, it's a theory that was proposed to someone who asked why Google wasn't indexing their site, and someone on a another board replied with something along the lines of: "All your pages have very little content and look pretty much the same, why would Google want all of those copies?" This made sense, and people started to believe it. However, there are tons of examples of Google indexing the same content multiple times. The dup content theory refers to a possible cause why some people have trouble getting cookie cutter sites, or sites with many pages that all have the same header and footer and little middle content, indexed well in Google. Used to be they wouldn't get indexed. Then it was they'd get indexed but supplemental. Then supplementals started happening to non-duplicate pages. Then non-duplicate pages started to get deindexed as well. Even though these last two facts strongly point to the possibility that the supp and deindexing has nothing to do with page content period, people still start spouting about dupp content penalty. Even though the dupp content theory made sense and did have a certain logic to it, it was then and remains today a theory, and one that seems to make less sense than it used to. If there was some sort of auto-penalty, then you wouldn't get this kind of stuff indexed. Just mho. -Michael
Michael I don't disagree. Please note that I'm careful to term it "identical". It's no secret that "duplicated" content gets indexed. But "identical content" gets de-indexed all the time. The example I used with Google is a case of "identical content" and if anyone of us were to do the same thing... POOF!... at least one of the sites/folders would be gone for spamming. There are folks serving "identical" content on a .com and a .co.cuk that experience this first hand. What Google is doing is no different. Side note about duplicate content... http://www.google.com.my/support/bin/answer.py?answer=6805&query=duplicate+content&topic=0&type=f Now we both know, what they say and what they do are two different things... at least part of the time. Dave
Right, I know what you meant, but I'm saying that they do index identical content all the time. The examples I gave were of identical content. Look at craigs list. Tons and tons of pages with next to no content. Look at any site that has had 403 errors indexed. Honestly, it doesn't get much closer than that. Any site that has both the www and non-www versions indexed in Google has identical content indexed. It started as a rumor, and apparently got widespread enough that someone at Google support repeated it. I mean, honestly, it's bad SEO to try and rank 2 sites with identical content, and technically not something that's needed... but it's just not going to have the effect that people keep saying it is. The fact that it's easier to get unique content indexed has lead to a widespread obsession with how close is too close. Here, check this article out, it's very concise on the subject. -Michael
Thanx Michael. Actually, had already seen it. What folks keep forgetting, is the republished articles are never identical content pagewise. Is the article duped, certainly, but every page is different. Different nav's etc. And you're right, the game is how much content actually needs to be unique. The 403 pages you linked are not identical. Actually, as a percentage of total content displayed (available), they're 15% unique in comparison to each other. Craigslist is a problem. Another problem with subdomains. I pointed to them a month or so ago as a deficiency with BD. A recent problem BTW. But even all their forum pages, are not identical despite the body of the content being identical. And yes, I consider those pages of duplicate content SPAM. As far as Google serving up identical pages in a folder and a subdomain, AFAIC that's SPAM. Slice it, dice it, serve it up any way you want. Still the same stuff. Dave
Ofcourse that is. There is no point having identical pages at two places at the same site. but may be google doesnt know how to use 301 redirect
LMAO!!! Should we email them instructions... Dear Google In case you were unaware, you seem to be spamming yourself. The proper thing to do would be to use a 301 redirect. I'd be happy to send you detailed instructions if you would find it helpful. You're Welcome! Dave
This topic is pretty useless. They don't exactly penalise anybody for duplicate pages -- they just use ONE of the urls (assumingly, the most referenced url). Google sees my default.asp and directory / as the exact same page.. same pagerank, same backlinks. They are not spamming -- that's absurd. Spammers are people who send out unsolicited advertisements... not webmasters who LET search engines spider their sites.
He, he you guys r killen me!!! If anyone can offer a reasonable response to this: "what is spam" I am not sure, truly. I have many websites that are new and or 5+ years old. With the new Google I almost think everyone of them are spammy , even though I don't spam and all content is handwriiten. So why then have they all suffered so severly in the recent updates? Hmmm, must be spam...no, no possibly dup content, wait maybe its suspicious link bursts or ummmmm, well I give up it's spam, ya, that's it . WTF is on-page SPAM?? Cheers H
Your issue is canonicalization. Totally different thing. Yes, Google has gotten better at recognizing and correcting this. If you have your www.yoursite.com AND www.yoursite.com/default.asp AND www.yoursite.com/directory.asp indexed as your homepage, then it is because you have those links on your site all pointing to your homepage. All the SE's are doing is following links you have put on your site. You should use a 301 redirect to correct this because it can hurt your indexing and ranking. You should also make sure that you use only 1 URL for your homepage throughout your site. The preferred one would be www.yoursite.com Here's a link for you... Search Engine SPAM Dave