Hi, I have an articles directory that has quite a few articles in and I am getting a little traffic to this section of the site. I then have another site which is on the same server as the articles directory and this site is using some of them articles. Infact it is using around 3,000 - 5,000 of them, which means that I have duplicate content on both of my sites. I then have another site that is using around 200 of them from the articles directory. What do you think I should do. Do you think it may be best to stop the search engine robots from crawling these article pages on both of these sites and only allow them on the articles directory using the robots.txt file. I ask this because these articles are for my visitors and not meant to increase my hits from the search engines. I also don't want to get banned from duplicate content. What do you think the best approach is? The site that has around 3,000-5,000 articles on are business related articles only and the other site that has 200 articles on are hosting related articles only. I have put these articles on for the visitors and not the search engines, but the search engines may think differently about this and ban me or penalize me in some way. What do you think I should do about this situation? Thanks!
Yeah, but by having this duplicate content then all I am doing is making google's robots do more work. The more work they do then the more chance I am likely to get banned or penilized from having this duplicate useless content. It is useful content to my visitors, but useless to the search engines as it is duplicate and they do mention in their webmaster guidelines that duplicate content on different domain names are not allow. Would it be best to use a robot file. I have had this duplicate content on my site for about over 1 year now. But I just don't want to get banned as they change the way they do things all the time. I don't want to leave it and find that my sites are gone from Google one day just because of it. What do you think I should do? Use a robots.txt file or just leave it? Would it be alright for me to just leave the robots crawl it.
I have just checked and the actual articles are not listed. I guess they must be in supplemental results or not listed at all. Do you think I should just leave it this way or still use a robots.txt file on the actual articles pages just incase.
Wooops! Sorry, I have just checked an all the pages with the actual articles listed on are in Google results and they are not supplemental results. What shall I do. Shall I just leave it and don't worry or shall I try and get them out using the robots.txt file. So I have articles on this site and the main articles directory that are listed in Google and are not supplemental results. These are the same articles aswell.
There are no penalties, but duplicate content just doesn't help. In most cases google will just ignore the duplicate content or it just won't rank for anything.
So I am alright then. I don't need a robots.txt file to block the robots knowing about this duplicate content. I notice that you mention that it just won't rank for anything. Does this mean that it will affect my rankings for the articles on my articles directory just because I have the some of the same articles on another site? If so then I am getting penilized here if my rankings are getting affectted.
Worst case scenario just redirect Googlebot via robots.txt & keep the articles listed in the other engines.
The site that had the articles will rank higher then second site with the same articles... duplicate articles will rank lower or just be ignored (just the articles not the whole site). In some cases the second sites articles could rank higher if the subject matter of the site is related to the article. This is what I have experienced.
Twins often get penality. Cloaking often gets penality. If only duplicate content might be ok. But better do not duplicate too much.
There isn't a duplicate content penalty. All that happens is that one of the pages gets dropped from the SERPs - usually the page with the lower PR or lowest number of links pointing to it.