Simple robots.txt question.

Discussion in 'robots.txt' started by cpuhlp, Oct 23, 2008.

  1. #1
    I have forums installed to work with Drupal. There are two ways of calling the forums installation.

    1.) http://forums.computerquestionhelp.com
    2.) http://www.computerquestionhelp.com/modules/phpBB3/

    I want to block (modules/phpBB3/*) from showing, but I would like (forums.computerquestionhelp.com) to be spidered instead.

    I only have (1) robots.txt file in the root folder of the domain in question, and I just pointed the sub domain (forums) to the same document root.

    I was wanting to know how I should setup my robots.txt to block out the above URL's from being index unless it's going through the sub domain?

    Here is what I was thinking about doing if someone can confirm:

    User-agent: *
    Crawl-delay: 10
    # Directories
    Disallow: /database/
    Disallow: /includes/
    Disallow: /misc/
    Disallow: /modules/
    Disallow: /sites/
    Disallow: /themes/
    Disallow: /scripts/
    Disallow: /updates/
    Disallow: /profiles/
    # Files
    Disallow: /xmlrpc.php
    Disallow: /cron.php
    Disallow: /update.php
    Disallow: /install.php
    Disallow: /INSTALL.txt
    Disallow: /INSTALL.mysql.txt
    Disallow: /INSTALL.pgsql.txt
    Disallow: /CHANGELOG.txt
    Disallow: /MAINTAINERS.txt
    Disallow: /LICENSE.txt
    Disallow: /UPGRADE.txt
    # Paths (clean URLs)
    Disallow: /admin/
    Disallow: /comment/reply/
    Disallow: /contact/
    Disallow: /logout/
    Disallow: /node/add/
    Disallow: /search/
    Disallow: /user/register/
    Disallow: /user/password/
    Disallow: /user/login/
    # Paths (no clean URLs)
    Disallow: /?q=admin/
    Disallow: /?q=comment/reply/
    Disallow: /?q=contact/
    Code (markup):
    With the above setup, this should block out anything in /modules/ if you notice I listed in file, (Disallow: /modules/

    Maybe there is a better way of doing this, or I should add or subtract some of the information? Any suggestions would be very appreciated. I love giving rep to people who give accurate and good information, as it's well deserved! Thanks again.
     
    cpuhlp, Oct 23, 2008 IP
  2. Khadaji

    Khadaji Active Member

    Messages:
    279
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    58
    #2
    I don't see anything wrong with what you're doing - but I must confess that I struggle through creating robot.txt files, and must extensively Google what others have done.

    What you could do is go to your account on Google (you do have one, right?) and click on the "Webmaster Tools", then "Diagnostics", and "Web Crawl", it'll show you what the Google spider is allowed to view or not view.

    But I think you've got everything correct.
     
    Khadaji, Oct 30, 2008 IP
    uzair21 likes this.
  3. uzair21

    uzair21 Peon

    Messages:
    502
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #3
    That says everything.. +rep given.
     
    uzair21, Nov 6, 2008 IP
  4. zinghana

    zinghana Well-Known Member

    Messages:
    1,970
    Likes Received:
    23
    Best Answers:
    1
    Trophy Points:
    125
    #4
    thats for the read, i always see robot.txt and not know its point i understand now ty
     
    zinghana, Nov 8, 2008 IP