Two questions - 1) Does an asterisk in a disallow string include slashes as characters that it considers a match? In other words, would the line Disallow: /*/test restrict access to "/abc/def/test.html"? It seems like it should to me, but I've seen some robots.txt files with "Disallow: /*/*/foobar" which makes me wonder. 2) Does an asterisk match on a blank string? Would the line - Disallow: /*? match on "xyz.com/?var=blah"? Thanks.
1) Disallow: /*/*/test will restrict the robots from crawling the URLs which have the word "test" in them 2) For restricting "xyz.com/?var=blah" you will have to use Disallow: /?
Umm... I'm pretty sure your response to 1) isn't quite right. For instance, xyz.com/test has the word "test" in it, but "Disallow: /*/*/test" wouldn't match on it. And I realize that the best way to restrict "xyz.com/?var=blah" would be to use "Disallow: /?", but that's not what I asked. I want to know whether "Disallow: /*?" would match on it too. I'm coming from the perspective of someone trying to write some code that will do a robots obedience check, not someone just trying to write a robots.txt file. To be more specific, what I'd really like to know is if you were to translate a disallow line from a robots file to a regular expression, would you change a * to a (.*?) or a (.+?)? Can a * represent nothing at all or does it have to represent at least one character?