Help with using wildcard for dynamic URL

Discussion in 'robots.txt' started by AlfaNet, Apr 29, 2008.

  1. #1
    Hi all,
    I run a script which generates plenty of pages with mtdomain.tld/?g_XXXXX, which are dupe contents.

    I think these cold be blocked with the following lines in robot.txt file:
    
    User-agent: *
    Disallow: /*?
    
    Code (markup):
    But I need to Allow bots to crawl pages starts only with /?g_page=XX. So I'm thinking of robot.txt as below:
    
    User-agent: *
    Disallow: /*?
    Allow: /?g_pages=*
    
    Code (markup):
    is the above order correct? or I need to put "Allow" first?

    Will the above lines tell bots to follow ONLY urls having "/?g_page" in them and not any other urls with /?g ?

    Any suggestion will be appreciated.
     
    AlfaNet, Apr 29, 2008 IP
  2. mistoovrool

    mistoovrool Banned

    Messages:
    202
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #2
    In google webmaster tool one facility available called
    Analyze Robots.txt
    From there you can place your robots.txt code and check whether crawler crawl that page or not.
     
    mistoovrool, May 13, 2008 IP
  3. mamina

    mamina Active Member

    Messages:
    316
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    60
    #3
    From what I understand wildcards cannot be used in Robots.txt files. If I am wrong please let me know but that is what I have read.

    Zelo
     
    mamina, Jun 12, 2008 IP
  4. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #4
    Wildcards can be used in robots.txt as Google and yahoo bot supports and follow the wildcards in robots.txt..but MSN doesn't support the wildcards in robots.txt...:)
     
    manish.chauhan, Jun 13, 2008 IP