What is Sitemap & Robots.txt? Where are they located?

Tuhin Eternal Spring Active Member

Messages:: 30

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 61

#1

What is Sitemap & Robots.txt? Where are they located?

Tuhin Eternal Spring, Mar 28, 2015 IP

Mkcoy Well-Known Member

Messages:: 1,602

Likes Received:: 77

Best Answers:: 2

Trophy Points:: 195

#2

A site map (or sitemap) is a list of pages of a web site accessible to crawlers or users. It can be either a document in any form used as a planning tool for Web design, or a Web page that lists the pages on a Web site, typically organized in hierarchical fashion.

Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol. The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

Both are usually located at the root of your site.

Eg site.com/sitemap.xml & site.com/robots.txt

Some sites such as WordPress can create a "virtual" robots text file.

Now you know.

Mkcoy, Mar 28, 2015 IP

kingofking likes this.

braulio Active Member

Messages:: 70

Likes Received:: 7

Best Answers:: 1

Trophy Points:: 95

#3

robots.txt and sitemap.xml are located in your root folder.

robots.txt tells google, yahoo, bing crawlers what not to crawl ( take an inventory of ) on your pages. Robots.txt works in conjunction with your sitemap.xml file. The sitemap.xml tells the crawlers what pages you have and where they are located. If you do not have a robots.txt file, the crawler crawls all your site. If your robots.txt file has page A as a restriction and you declare page A on your sitemap.xml, you get a crawl error on your Google Webmaster Tools panel.

An example of how we use it is the following.

We have a test directory on our server where we upload sites to be tested before we go public. Obviously, we do not want the crawlers to take note ( inventory ) of this directory since it is a directory for testing only. Below is the robots.txt file contents.

# www.robotstxt.org/

User-agent: *

Disallow:/testing/

Good luck Tuhin.

Braulio

braulio, Mar 28, 2015 IP

kingofking likes this.

Log in or Sign up

What is Sitemap & Robots.txt? Where are they located?

What is Sitemap & Robots.txt? Where are they located?

What is Sitemap & Robots.txt? Where are they located?

What is Sitemap & Robots.txt? Where are they located?

Tuhin Eternal Spring Active Member

Mkcoy Well-Known Member

braulio Active Member

Useful Searches