Robot.txt is a file which you can generate by your self and upload it to your root folder to tell the bots what should they crawl and what not? if you want to disallow something to crawl them simply put disallow and rite the url name or directory name to be restricted by the robots. Rest the good way to apply it go to your webmaster tools and apply there in robots.txt file if n0t using Google webmaster tools then create a new one
The robots.txt file contains instructions for search engine spiders. These instructions tell the search engines to ignore directories, files, and even directories/files containing specific character strings. Althoug most people don't get involved enough to warrant explaining specific character strings.Think of the robots.txt file as a container for a set of instructions based upon what you might normally add to an individual webpage using the robots meta string. For example, if you don't want the search engines to follow links or index a particular webpage, you would add: if you were using HTML. If you were using XHTML or HTML5, add the / prior to the >.The advantages to the robots.txt file are:* Reduced work because you're not adding the robots meta tag to each webpage* Ability to tell the search engines to stay out of particular directories* The search engines typically request the robots.txt file as they enter your website each day. When they enter multiple times per day, they normally only ask the first time - not every time.The robots.txt file allows you to provide specific instructions to each spider. For example, you may want the image spiders to enter a particular directory and avoid all others. You may want the blog spiders to enter the blog directory and no others. You may want the standard spider to stay out of those areas.How you use the robots.txt file helps search engines understand how you want them to index your website.I hope this helps.Johnny Mazuma
robots.txt is the file to give access to bot to crawl your site, also if you have copy content then you can save those pages through the robots.txt by disallow robots
this file should be there in your site because it shows search engines what content to be crawled and what not to be crawled
Please look into the link for better understanding - http://en.wikipedia.org/wiki/Robots_exclusion_standard
There are two important considerations when using /robots.txt: robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention. the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.
It is great when search engines frequently visit your site and index your content but often there are cases when indexing parts of your online content is not what you want. For instance, if you have two versions of a page , you'd rather have the printing version excluded from crawling, otherwise you risk being imposed a duplicate content penalty.
robot.txt file is a text file. This is used for search engine crawling.. Mainly used to improve ur website's score while crawling.
This is simple text file, saved with the name robots.txt to give instruction to web crawler which pages they should visit and which pages not. There is no disadvantages if it is correctly implement. However, there is big loss if it is incorrectly implemented. Search Engine Crawler will never visit your site if you have incorrectly Disallowed for whole pages. see for more details: http://www.robotstxt.org/robotstxt.html
The location of robots.txt is very important. It must be in the main directory because otherwise search engines will not be able to find it.