One of a website showing this type of robots.txt. so please describe what is going on? should he delete anythings like Sitemap line? User-agent: * Allow: / Disallow: Sitemap: http://www.example.com/sitemap.xml
you are allowing all robots to your pages , if u want to hide some pages or php pages. Disallow : /[page].html /[file].pphp /[folder]
Robots.txt files (often erroneously called robot.txt, in singular) are created by webmasters to mark (disallow) files and directories of a web site that search engine spiders (and other types of robots) should not access. This robots.txt checker is a "validator" that analyzes the syntax of a robots.txt file to see if its format is valid as established by Robot Exclusion Standard (please read the documentation and the tutorial to learn the basics) or if it contains errors.
The ROBOTS referred to in the filename are web crawlers/spiders/bots. The robots.txt file is primarily used to ENCOURAGE two things; WHICH robots have access to WHICH folders and files on your website. Sometimes you may not wish to get all your website folders indexed by a search engine and so this file allows you to disallow access to those specific files and directories. Also sometimes you may not wish your website to get crawled by specific bots or any unknown or undesirable bots; you can control this too. It is important to understand that the robots.txt file can not CONTROL which bots scan your files as bots can simply choose to ignore your robots.txt file. Also the robots.txt file is freely visible to anyone wishing to read it and so may determine your site structure from it and use that for nefarious activities.
It is great when search engines frequently visit your site and index your content but often there are cases when indexing parts of your online content is not what you want. For instance, if you have two versions of a page (one for viewing in the browser and one for printing), you'd rather have the printing version excluded from crawling, otherwise you risk being imposed a duplicate content penalty. Also, if you happen to have sensitive data on your site that you do not want the world to see, you will also prefer that search engines do not index these pages (although in this case the only sure way for not indexing sensitive data is to keep it offline on a separate machine). Additionally, if you want to save some bandwidth by excluding images, stylesheets and javascript from indexing, you also need a way to tell spiders to keep away from these items.
First of all, I would like to thank you for creating this thread. User-agent: * (All Robots) Allow: / (Allowed everything) Disallow: (Nothing is disallowed) In above three lines it's telling all robots are allowed to the site. Their is nothing hidden or restricted for the robots. Robots can visit cgi-bin, trash files, pages, log files everything.