I have a few pages on my website with duplicate content. These pages are not yet indexed and i have blocked the access to those particular pages for crawlers in robots.txt. Is there a chance that Google will still look into the file and flag my website?? I am confused. Please help!
Googlebots will follow the rules you have set in your robots.txt file. Make sure the duplicate content is not included in your sitemap.xml file and all should be well! Be sure your robots.txt file is set correctly! http://tool.motoricerca.info/robots-checker.phtml
I suggest you to follow Google guidelines and avoid having duplicate content in your website. It is not sure your pages will not be indexed if its blocked using robots. Use "noindex, nofollow" meta tags in the pages.
Can you explain why you have duplicate content in general? Why not just delete the duplicate content before it gets indexed? If you can't delete the duplicate pages.. then I would suggest adding the canonical tag to them in the header, telling Google which one is the one you want indexed.
If duplicate content is there for indexing of pages. There are multiple ways to "spin" your content and make old content "as new". Make sure to not take entire texts, but only sections form each and spin them entirely.
use noindex and nofollow as if the crawlers are coming to your web page from a third party source they may skip the robots.txt of your website and might index the duplicate pages.
To be 100% sure I recommend add the meta tag robots on these pages. <META name="Robots" content="NOINDEX,NOFOLLOW"> I recommend delete these pages ( problematic to future )
Yes you can hide the duplicate content by using the robots.txt, but if any search engine crawl your content and check it is duplicate then your site never be high page rank, it will goes down in search engine. So dont use this type of risky work.
Hi, having duplicate content on site is not good for site it will affect the site very badly in search engine....
duplicate content in a site reduces the search engine rankings.... you may avoid it by doing: <META name="Robots" content="NOINDEX,NOFOLLOW"> this will go between the <head> </head> portion of ur html page.... and u can also put a screenshot of your text (if it does not contain any links )....this is because <nofollow> tag is not liked much by google..... again if u want to hide ur links ie you dont want search engine crawlers to go to another site following your link then just add fel="nofollow" in the link < a href="siteurl" rel="nofollow" >link</a>
Thank you atomicpage for give this useful robots check link.i checked my site and got some error. thank you again.
Google isn't going to "flag your website", all that will happen is one version will be picked over the other. Use rel="canonical" to tell which version you want google to pick.