Questions about asterisks in robots.txt

Discussion in 'robots.txt' started by Mark Mac, Feb 19, 2009.

  1. #1
    Two questions -

    1) Does an asterisk in a disallow string include slashes as characters that it considers a match? In other words, would the line

    Disallow: /*/test

    restrict access to "/abc/def/test.html"?

    It seems like it should to me, but I've seen some robots.txt files with "Disallow: /*/*/foobar" which makes me wonder.

    2) Does an asterisk match on a blank string? Would the line -

    Disallow: /*?

    match on "xyz.com/?var=blah"?

    Thanks.
     
    Mark Mac, Feb 19, 2009 IP
  2. shailendra

    shailendra Peon

    Messages:
    1,225
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    0
    #2
    1) Disallow: /*/*/test will restrict the robots from crawling the URLs which have the word "test" in them

    2) For restricting "xyz.com/?var=blah" you will have to use Disallow: /?
     
    shailendra, Feb 19, 2009 IP
  3. Mark Mac

    Mark Mac Peon

    Messages:
    2
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Umm... I'm pretty sure your response to 1) isn't quite right. For instance, xyz.com/test has the word "test" in it, but "Disallow: /*/*/test" wouldn't match on it.

    And I realize that the best way to restrict "xyz.com/?var=blah" would be to use "Disallow: /?", but that's not what I asked. I want to know whether "Disallow: /*?" would match on it too.

    I'm coming from the perspective of someone trying to write some code that will do a robots obedience check, not someone just trying to write a robots.txt file.

    To be more specific, what I'd really like to know is if you were to translate a disallow line from a robots file to a regular expression, would you change a * to a (.*?) or a (.+?)? Can a * represent nothing at all or does it have to represent at least one character?
     
    Mark Mac, Feb 20, 2009 IP