Hi, I need to use the robot.txt file to block the search engines from crawling apart of my site. How do I use the robot.txt when I have to block robots from crawling any page that uses the file articlescodingonline.php. Do I have to block the articlescodingonline.php file by listing that in the robot.txt file or do I actually block the robots from crawling the page by using the url instead. The url is: http://domainname.com/articles/articlescoding/1.html Would I using the articlescoding from the url and insert that into the robot.txt file or will I have to put the actual articlescodingonline.php into the text file. I want to get this right as I don't want to find out that all my pages have fanished from Google. Also from the url above I don't want any pages indexed that comes after articlescoding and also any pages indexed that uses the filename articlescodingonline.php
All you need is: User-agent: * Disallow: /articlescodingonline.php Disallow: /articlescodingonline Code (markup):
Do I need both of them or can I just have the one? So I don't use the one in the url then, I just use the one in the filename?
You need what I pasted. One to cover the directory and everything further down and one to cover the filename. With some fancy stuff you might be able to combine them but this is a simple and convenient way of achieving what you need.
OK! Thanks very much. Also, there is no point in putting filenames in the robot.txt file if your site doesn't link to it. Is that true. For example, I have some php files that are just scripts to use with cron. My site don't link to them in anyway so others also shouldn't really know about these files and they shouldn't get into the search engines with or without the robots.txt file. Is that true or should I still use the robots.txt file on these scripts.