1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Blocking auto generated pages with robots.txt

Discussion in 'robots.txt' started by imaginexhacked, Jun 9, 2014.

  1. #1
    Hello,

    I am having problems with blocking auto generated pages that create duplicate content on my site. I have a question and answer feature on my site that generates a new page for each question and answer. However, it also adds several addition copies of that page.

    For example, the following is created for the same page:
    www.mysite.com/ask-a-trustee/6300/rights-unsecured-creditor-court-appointed-bankruptcy-company/
    www.mysite.com/ask-a-trustee/6300/rights-unsecured-creditor-court-appointed-bankruptcy-company?show=6301

    I would like to make sure I block this properly with robots.txt. Of course, I want www.mysite.com/ask-a-trustee/6300/rights-unsecured-creditor-court-appointed-bankruptcy-company/ to be allowed / not disallowed and BLOCK www.mysite.com/ask-a-trustee/6300/rights-unsecured-creditor-court-appointed-bankruptcy-company?show=6301

    If I add Disallow: /ask-a-trustee/?show to my robots.txt file will that properly block the extra added page with ?show=6301, while also still allowing /ask-a-trustee/6300/rights-unsecured-creditor-court-appointed-bankruptcy-company/ to be crawled by Google and other search engines?

    Thanks in advance!
     
    imaginexhacked, Jun 9, 2014 IP
  2. O-D-T

    O-D-T Member

    Messages:
    180
    Likes Received:
    10
    Best Answers:
    3
    Trophy Points:
    43
    #2
    Hi,

    it is simple with Google (and Bing too) - see https://support.google.com/webmasters/answer/156449?hl=en and http://moz.com/learn/seo/robotstxt

    Google support * wildcard:
    • To match a sequence of characters, use an asterisk (*). For instance, to block access to all subdirectories that begin with private:
      User-agent: Googlebot
      Disallow: /private*/
    In your case, that would be
    Disallow: /ask-a-trustee/*?show=*



    Note that you can actually test your robots.txt settings using Google Webmaster Tools: :

    Test a site's robots.txt file:
    1. On the Webmaster Tools Home page, click the site you want.
    2. Under Crawl, click Blocked URLs.
    3. If it's not already selected, click the Test robots.txt tab.
    4. Copy the content of your robots.txt file, and paste it into the first box.
    5. In the URLs box, list the site to test against.
    6. In the User-agents list, select the user-agents you want.
     
    O-D-T, Jun 9, 2014 IP