What is Robots.txt?

mpreyesmr Banned

Messages:: 175

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#1

What is robots.txt in sitemap? What exactly are their function?

mpreyesmr, Jul 26, 2009 IP

blue_angel Well-Known Member

Messages:: 1,174

Likes Received:: 8

Best Answers:: 0

Trophy Points:: 130

#2

Robots, including search indexing tools and intelligent agents, should check a special file in the root of each server called robots.txt, which is a plain text file (not HTML). Robots.txt implements the REP (Robots Exclusion Protocol), which allows the web site administrator to define what parts of the site are off-limits to specific robot user agent names. Web administrators can Allow access to their web content and Disallow access to cgi, private and temporary directories, for example, if they do not want pages in those areas indexed.

blue_angel, Jul 27, 2009 IP

blue_angel Well-Known Member

Messages:: 1,174

Likes Received:: 8

Best Answers:: 0

Trophy Points:: 130

#3

About the Robots.txt file

The robots.txt file is divided into sections by the robot crawler's User Agent name. Each section includes the name of the user agent (robot) and the paths it may not follow. You should remember that robots may access any directory path in a URL which is not explicitly disallowed in this file: every path not forbidden is allowed.

Note that disallowing robots is not the same as creating a secure area in your site, as only honorable robots will obey the directives and there are plenty of dishonorable ones. Anything you do not want to show to the entire World Wide Web, you should protect with at least a password.

You can usually read this file by just requesting it from the server in a browser (for example, www.searchtools.com/robots.txt). If you click it, you'll see that it's a text file with many entries that I generated by looking at my server's error reports, because I wanted to avoid having those even occasionally requested by robots.

The older version is documented in the REP (Robot Exclusion Protocol), and all robots should recognize and honor the rules in the robots.txt file. The New 2008 REP (Robot Exclusion Protocol) has additional features and may not be recognized by all robot crawlers.

more here
http://www.searchtools.com/robots/robots-txt.html

blue_angel, Jul 27, 2009 IP

freepost Peon

Messages:: 4

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#4

permissions for search engines...
Like This
User-Agent: *
Allow: /
Disallow: /admin/
Disallow: /other-folder-or-pages/

freepost, Aug 1, 2009 IP

MaxPowers Well-Known Member

Messages:: 264

Likes Received:: 5

Best Answers:: 1

Trophy Points:: 120

#5

The part that deals with sitemaps allows you to link to your sitemap (even if it's hosted on another domain) from within your robots.txt file.

The syntax for this is the following line, all on it's own line, replace the URL with the actual URL to your sitemap...

Sitemap: http://www.example.com/sitemap.xml

This will tell the search engines where they can find your XML sitemap.

MaxPowers, Aug 1, 2009 IP

wayneonweb Peon

Messages:: 131

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#6

mpreyesmr said: ↑

What is robots.txt in sitemap? What exactly are their function?
Click to expand...

robot.txt otherwise known as "The Robots Exclusion Protocol" is nothing but giving instructions about your site to web robots. You have to create robot.txt with all your allow and disallow paths and need to be placed on your web server(http://www.example.com/robots.txt)

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/

the above mentioned example excludes all three mentioned folder.

wayneonweb, Aug 5, 2009 IP

Log in or Sign up

What is Robots.txt?

mpreyesmr Banned

blue_angel Well-Known Member

blue_angel Well-Known Member

freepost Peon

MaxPowers Well-Known Member

wayneonweb Peon

Useful Searches