We recently deployed a drupal based site in which drupal is the document root. In order to serve up some non-drupal content, I've had to use aliases so that drupal doesn't try to commandeer everything. When I updated the robots.txt file, which lives under drupal, it occurred to me that I may have a problem. The robots.txt file should allow/disallow bots on the basis of web access, shouldn't it? So if I disallow something like /faq, even if /faq is aliased to a location outside of the document root, the bot should not follow, or do I have this wrong some how? I'm concerned because I'm seeing google bot in places it shouldn't be and I'm wondering if these aliases (of which I have many) are going to allow bots into everything.
The bots should be reading robots.txt off the root of the site, you can check it's there by loading it in the browser, yourdomain.com/robots.txt, as long as it's there it tells the bots what to ignore under your domain, how the server is setup to provide content for a folder or whatever is irrelevant as the bots simply ask for a URL unless it matches a deny rule in robots.txt. You do have to rely on a bot obeying the rules.