Robots.txt controls how Web bots/crawlers/spiders access and index your website. It uses what we in the trade call the Robots Exclusion Protocol. In short, before visiting one of your site's pages the bot looks it up in your robots.txt. If it finds something like User-agent: * Disallow: / -- this means that robots are not allowed to crawl your pages. Of course, this does not always work. For example, viruses and other malware ignore your robots.txt file. But it works for legitimate Web bots such as Googlebot and other search crawlers. The instructions in the file will depend on what you are trying to accomplish.
"Robots.txt" is a regular text file that through its name, has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth . "Robots.txt" lets you tell Google just that.
Basically Robot.txt file is used to protect out any web page not to be indexed out by the crawlers or bots of a search engines.
Sometimes, a content from your website can be copied to any blog submission pages. you can able to know that by checking. So you can disallow the duplicate copy of your content using robots.txt Then, there is no need to visit your cached pages to be visited by search engine bots. You can also disallow those pages using robots.txt. Make changes for all web spiders User-agent: * Disallow: /
You can create the robots txt file if you want any of your site web pages not to be indexed by search engines.
A robot.txt file is a file which gives instructions to the server about how to handle requests from robots ( means bots or crawlers ). You can set it to allow rebots or deny them or partially allow some of them. You can also add instructions directly to robots, if they understand it they will follow it. There is some format to write robot.txt files and this file exists at this location www.websitename.com/robot.txt only. If you want to create one for your website simply upload a file by this name at this location. For contents, you may refers to some online robot.txt generator tools. CMS based websites ( all blog websites, all forum websites etc including wordpress, blogger, joomla ) have automatically a virtual robot.txt file so you need not to create it separately.
Web site owners use the /robots.txt file to give instructions about their site to web robots. There are two important considerations when using /robots.txt: 1. robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention. 2. the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use. Read more on redalkemi dot com
you need to know first what are robots. Robots are the software which work on AI(artificial Intellegence) they check the all web pages and cotents and index the most relevant information with respect to the keyowrds. Robots jump 1 page to another page by anchor tag and follows the path to collect the information. If you would not like that robots will follow any page or folder then you need to use this robots.txt file which instruct the crawler to follow or not the page. this file will be robots.txt notepad file. syntax will be as follows: User-agent: * Allow: / Disallow: /Scripts/ Disallow: /HotelDetails/ Disallow: /flash/ Disallow: /FlashFiles/ for more clarification at robots.txt you need to go for google robots instruction.
You can create a robots.txt file to prevent search engine spisers from consuming excessive amounts of bandwidth on your server and also to prevent potential copyright infringements. A roborts.txt files provides the search engine spiders with information about which pages should be crawled and indexed and which should not. It is a text file that resides in the root directory of your Web server. If you do not provide a robot.txt file, search engines spiders assume that the entire site should be crawled and indexed.