Does Apache alias effect robots.txt

Discussion in 'Apache' started by tsmori, Aug 19, 2010.

  1. #1
    We recently deployed a drupal based site in which drupal is the document root. In order to serve up some non-drupal content, I've had to use aliases so that drupal doesn't try to commandeer everything.

    When I updated the robots.txt file, which lives under drupal, it occurred to me that I may have a problem.

    The robots.txt file should allow/disallow bots on the basis of web access, shouldn't it? So if I disallow something like /faq, even if /faq is aliased to a location outside of the document root, the bot should not follow, or do I have this wrong some how?

    I'm concerned because I'm seeing google bot in places it shouldn't be and I'm wondering if these aliases (of which I have many) are going to allow bots into everything.
     
    tsmori, Aug 19, 2010 IP
  2. tolra

    tolra Active Member

    Messages:
    515
    Likes Received:
    36
    Best Answers:
    1
    Trophy Points:
    80
    #2
    The bots should be reading robots.txt off the root of the site, you can check it's there by loading it in the browser, yourdomain.com/robots.txt, as long as it's there it tells the bots what to ignore under your domain, how the server is setup to provide content for a folder or whatever is irrelevant as the bots simply ask for a URL unless it matches a deny rule in robots.txt.

    You do have to rely on a bot obeying the rules.
     
    tolra, Aug 20, 2010 IP