blocking duplicate content, wildcards

Discussion in 'robots.txt' started by droog, Aug 12, 2008.

  1. #1
    Hi All,

    I'm using Drupal as a CMS. My partner and I blog on it. Our blog posts appear together on the index page and are also duplicated together at "/blog".

    I would like to block "/blog" along with it's pagination links:
    /blog?page=1
    /blog?page=2
    /blog?page=3
    etc..

    However my blog and her blog appear separately at these urls:
    /blogs/me
    /blogs/me?page=1
    /blogs/me?page=2
    /blogs/me?page=3
    etc.

    /blogs/her
    /blogs/her?page=1
    /blogs/her?page=2
    /blogs/her?page=3
    etc.

    So I don't want to block our individual blogs which is what I imagine would happen if I just put:

    Disallow: /blog

    Since the way I understand it is this would also block "/blogs" "/blogx" etc.

    So is this the best solution:

    Disallow: /blog$
    (the way I understand the "$" is that in this case it would only block "/blog" but not "/blogs" "/blogx" etc.)

    Plus this to block the pagination links of "/blog":

    Disallow: /blog?page=
    (this would block any string like /blog?page=1, /blog?page=2, etc... or is /blog?page=* necessary with the asterisk to block all the strings)

    I understand the robots.txt standard does not recognize "$" or "*" but it looks like google does:
    http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40367

    Am I correct or do I have to go do my homework again?

    Thanks!
     
    droog, Aug 12, 2008 IP
  2. catanich

    catanich Peon

    Messages:
    1,921
    Likes Received:
    40
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Currently, I don't think wild cards have been implemented yet.
     
    catanich, Aug 21, 2008 IP