Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called: The Robots Exclusion Protocol. It works likes this: a robot wants to visits a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt. Note: not all spiders aka robots will follow the inclusion / exclusions For example: User-agent: * Disallow / Would tell the robots / spiders not to index any part of a website This example User-agent: * Disallow /cgi-bin/ Disallow /temp/ Disallow /~images/ Would tell the robots / spiders to index the whole site except cgi-bin, temp and ~images