Hello, I am having problems with blocking auto generated pages that create duplicate content on my site. I have a question and answer feature on my site that generates a new page for each question and answer. However, it also adds several addition copies of that page. For example, the following is created for the same page: www.mysite.com/ask-a-trustee/6300/rights-unsecured-creditor-court-appointed-bankruptcy-company/ www.mysite.com/ask-a-trustee/6300/rights-unsecured-creditor-court-appointed-bankruptcy-company?show=6301 I would like to make sure I block this properly with robots.txt. Of course, I want www.mysite.com/ask-a-trustee/6300/rights-unsecured-creditor-court-appointed-bankruptcy-company/ to be allowed / not disallowed and BLOCK www.mysite.com/ask-a-trustee/6300/rights-unsecured-creditor-court-appointed-bankruptcy-company?show=6301 If I add Disallow: /ask-a-trustee/?show to my robots.txt file will that properly block the extra added page with ?show=6301, while also still allowing /ask-a-trustee/6300/rights-unsecured-creditor-court-appointed-bankruptcy-company/ to be crawled by Google and other search engines? Thanks in advance!
Hi, it is simple with Google (and Bing too) - see https://support.google.com/webmasters/answer/156449?hl=en and http://moz.com/learn/seo/robotstxt Google support * wildcard: To match a sequence of characters, use an asterisk (*). For instance, to block access to all subdirectories that begin with private: User-agent: Googlebot Disallow: /private*/ In your case, that would be Disallow: /ask-a-trustee/*?show=* Note that you can actually test your robots.txt settings using Google Webmaster Tools: : Test a site's robots.txt file: On the Webmaster Tools Home page, click the site you want. Under Crawl, click Blocked URLs. If it's not already selected, click the Test robots.txt tab. Copy the content of your robots.txt file, and paste it into the first box. In the URLs box, list the site to test against. In the User-agents list, select the user-agents you want.