Robots, including search indexing tools and intelligent agents, should check a special file in the root of each server called robots.txt, which is a plain text file (not HTML). Robots.txt implements the REP (Robots Exclusion Protocol), which allows the web site administrator to define what parts of the site are off-limits to specific robot user agent names. Web administrators can Allow access to their web content and Disallow access to cgi, private and temporary directories, for example, if they do not want pages in those areas indexed.
About the Robots.txt file The robots.txt file is divided into sections by the robot crawler's User Agent name. Each section includes the name of the user agent (robot) and the paths it may not follow. You should remember that robots may access any directory path in a URL which is not explicitly disallowed in this file: every path not forbidden is allowed. Note that disallowing robots is not the same as creating a secure area in your site, as only honorable robots will obey the directives and there are plenty of dishonorable ones. Anything you do not want to show to the entire World Wide Web, you should protect with at least a password. You can usually read this file by just requesting it from the server in a browser (for example, www.searchtools.com/robots.txt). If you click it, you'll see that it's a text file with many entries that I generated by looking at my server's error reports, because I wanted to avoid having those even occasionally requested by robots. The older version is documented in the REP (Robot Exclusion Protocol), and all robots should recognize and honor the rules in the robots.txt file. The New 2008 REP (Robot Exclusion Protocol) has additional features and may not be recognized by all robot crawlers. more here http://www.searchtools.com/robots/robots-txt.html
permissions for search engines... Like This User-Agent: * Allow: / Disallow: /admin/ Disallow: /other-folder-or-pages/
The part that deals with sitemaps allows you to link to your sitemap (even if it's hosted on another domain) from within your robots.txt file. The syntax for this is the following line, all on it's own line, replace the URL with the actual URL to your sitemap... Sitemap: http://www.example.com/sitemap.xml This will tell the search engines where they can find your XML sitemap.
robot.txt otherwise known as "The Robots Exclusion Protocol" is nothing but giving instructions about your site to web robots. You have to create robot.txt with all your allow and disallow paths and need to be placed on your web server(http://www.example.com/robots.txt) User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ the above mentioned example excludes all three mentioned folder.