How to match patterns in robots.txt

Discussion in 'robots.txt' started by kiransarv, Nov 2, 2008.

  1. #1
    Hi all,

    I have two dynamic URL pages;

    1.http://mydomain.com/index?id=(.*)&query=(.*)
    2.http://mydomain.com/index?id=(.*)&query=(.*)&start=10&pager.offset=(.*)

    I want to allow robots to crawl the first page but i don't want robots to crawl the page with "&start"...How can i do this.

    If I use

    "Disallow: /index?id" will block both the URL patterns. So How can i be specific..

    Please help me..

    regards
    kiran
     
    kiransarv, Nov 2, 2008 IP
  2. Aldo

    Aldo Peon

    Messages:
    99
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    If I am correct, I belive you can use * for wild cards, so I think:
    
    Disallow: /index.php?*&start=*
    
    Code (markup):
    However, someone correct me if I am wrong. I do know if you have your website setup with Google Webmasters Google will allow you to enter a URL and tell you whether or not it can index it from the data it got from the robots.txt :)
     
    Aldo, Nov 8, 2008 IP