I think you should Robots.txt is a plain text file and uploading to the root directory of your site. Spiders seek first this file and processed prior to index your site.Robots.txt tells spiders which pages to crawl. example: 1. User-agent: * Disallow: 2. User-agent: * Disallow: /faq/ 3. User-agent: * Disallow: /faq/ Disallow: /info/about/ We prohibit the directories and even individual spiders to crawl our site. I hope I was helpful
Having one stops the error log filling up with 404s. Robots.txt is also the best way to announce your sitemap to all search engines at once.
Generally robots.txt file use to restrict Google's crawler to not cached your unneccsary files or folder from the server.
Yes it is necessary! Check out wikipedia and facebooks robots.txt files:- http://facebook.com/robots.txt http://en.wikipedia.org/robots.txt They can be as detailed as these or as simple as this:- User-agent: * Allow: / If you don't know how to do it you can just go to Google Webmaster Tools where you can easily generate a robots.txt file for your site.
If you don't have a robots.txt file, your web server will return a 404 error page to the engine instead. For those who have customized their 404 error document, that customised 404 page will end up being sent to the spider repeatedly throughout the day. Now, if you have customized your 404 page, chances are that it's bigger than the standard server error message "404 File Not Found" (since you will want your error page to say more than the default error message). In other words, failing to create a robots.txt will cause the search engine spider to use up more of your bandwidth as a result of its repeated retrieval of your large 404 error file. (How much more depends, of course, on the size of your 404 error page.)