I have forums installed to work with Drupal. There are two ways of calling the forums installation. 1.) http://forums.computerquestionhelp.com 2.) http://www.computerquestionhelp.com/modules/phpBB3/ I want to block (modules/phpBB3/*) from showing, but I would like (forums.computerquestionhelp.com) to be spidered instead. I only have (1) robots.txt file in the root folder of the domain in question, and I just pointed the sub domain (forums) to the same document root. I was wanting to know how I should setup my robots.txt to block out the above URL's from being index unless it's going through the sub domain? Here is what I was thinking about doing if someone can confirm: User-agent: * Crawl-delay: 10 # Directories Disallow: /database/ Disallow: /includes/ Disallow: /misc/ Disallow: /modules/ Disallow: /sites/ Disallow: /themes/ Disallow: /scripts/ Disallow: /updates/ Disallow: /profiles/ # Files Disallow: /xmlrpc.php Disallow: /cron.php Disallow: /update.php Disallow: /install.php Disallow: /INSTALL.txt Disallow: /INSTALL.mysql.txt Disallow: /INSTALL.pgsql.txt Disallow: /CHANGELOG.txt Disallow: /MAINTAINERS.txt Disallow: /LICENSE.txt Disallow: /UPGRADE.txt # Paths (clean URLs) Disallow: /admin/ Disallow: /comment/reply/ Disallow: /contact/ Disallow: /logout/ Disallow: /node/add/ Disallow: /search/ Disallow: /user/register/ Disallow: /user/password/ Disallow: /user/login/ # Paths (no clean URLs) Disallow: /?q=admin/ Disallow: /?q=comment/reply/ Disallow: /?q=contact/ Code (markup): With the above setup, this should block out anything in /modules/ if you notice I listed in file, (Disallow: /modules/ Maybe there is a better way of doing this, or I should add or subtract some of the information? Any suggestions would be very appreciated. I love giving rep to people who give accurate and good information, as it's well deserved! Thanks again.
I don't see anything wrong with what you're doing - but I must confess that I struggle through creating robot.txt files, and must extensively Google what others have done. What you could do is go to your account on Google (you do have one, right?) and click on the "Webmaster Tools", then "Diagnostics", and "Web Crawl", it'll show you what the Google spider is allowed to view or not view. But I think you've got everything correct.