I have this website for example www.example.com I added a blog to it www.example.com/etet/date/blog And I added duplicate articles to it, will the search engine penalize www.example.com ? Even through www.example.com has no link directed to www.example.com/etet/date/blog .
No, your site won't be penalized since it's all on the same domain name; however the duplicated web pages probably won't be indexed and if they are they'll never show up in search results. Just block your blog via the robots.txt file.
Yes, duplicate blog content does dilute your PR. It's a big problem. Here's my wordpress 2.5 robots.txt, which I use to block my duplicate content. Works like a charm. Just copy the entire contents of the file below into a blank text file called robots.txt and upload it to the root directory (the main directory) of your site: User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-content/ Disallow: /trackback/ Disallow: /feed/ Disallow: /tag/ Disallow: /author Disallow: /comments/ Disallow: /category/*/* Disallow: /trackback Disallow: /*trackback Disallow: /*trackback* Disallow: /*/trackback Disallow: /*?* Disallow: /*.html/$ Disallow: /*.php$ Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*feed* Disallow: /wp-register.php Disallow: /wp-login.php Disallow: /2007/ Disallow: /2008/ Disallow: /stats/ # Google Image User-agent: Googlebot-Image Disallow: Allow: /* # Google AdSense User-agent: Mediapartners-Google* Disallow: Allow: /* # Internet Archiver Wayback Machine User-agent: ia_archiver Disallow: / # digg mirror User-agent: duggmirror Disallow: /
I do have robots.txt file, the real domain is http://www.bestcreditrates.net/ and blog is http://www.bestcreditrates.net/b2evolution/blogs/blog5.php So I need to block that.
http://www.bestcreditrates.net/robots.txt It gives me an error and this is where the robots.txt should be located at. You want to block the /b2evolution/ directory... Disallow: /b2evolution/ Code (markup):
Actually I didn't have robots.txt but I added it, so just create black text and add Disallow: /b2evolution/ to it. http://www.bestcreditrates.net/robots.txt
User-agent: * Disallow: /b2evolution/ sitemap: <absolute path to XML Sitemap> Code (markup): Remove the sitemap: line if you don't have a XML sitemap, actually instead leave it, create a XML sitemap and upload it to the same directory as the robots.txt, see my XML Sitemap FAQ thats stickied in the sitemap sub-forum on digitalpoint.