robots.txt Analysis

jones1982 Active Member

Messages:: 492

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 55

#1

One of a website showing this type of robots.txt. so please describe what is going on? should he delete anythings like Sitemap line?

User-agent: *
Allow: /
Disallow:

Sitemap: http://www.example.com/sitemap.xml

jones1982, Jun 4, 2012 IP

Anil Strivastava Peon

Messages:: 102

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#2

you are allowing all robots to your pages ,

if u want to hide some pages or php pages.

Disallow :
/[page].html
/[file].pphp
/[folder]

Anil Strivastava, Jun 4, 2012 IP

valen123 Greenhorn

Messages:: 334

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 23

#3

Robots.txt files (often erroneously called robot.txt, in singular) are created by webmasters to mark (disallow) files and directories of a web site that search engine spiders (and other types of robots) should not access.

This robots.txt checker is a "validator" that analyzes the syntax of a robots.txt file to see if its format is valid as established by Robot Exclusion Standard (please read the documentation and the tutorial to learn the basics) or if it contains errors.

valen123, Jun 4, 2012 IP

seo-hosting.com Peon

Messages:: 55

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#4

The ROBOTS referred to in the filename are web crawlers/spiders/bots. The robots.txt file is primarily used to ENCOURAGE two things; WHICH robots have access to WHICH folders and files on your website. Sometimes you may not wish to get all your website folders indexed by a search engine and so this file allows you to disallow access to those specific files and directories. Also sometimes you may not wish your website to get crawled by specific bots or any unknown or undesirable bots; you can control this too. It is important to understand that the robots.txt file can not CONTROL which bots scan your files as bots can simply choose to ignore your robots.txt file. Also the robots.txt file is freely visible to anyone wishing to read it and so may determine your site structure from it and use that for nefarious activities.

seo-hosting.com, Jun 4, 2012 IP

liamreed89 Peon

Messages:: 48

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#5

It is great when search engines frequently visit your site and index your content but often there are cases when indexing parts of your online content is not what you want. For instance, if you have two versions of a page (one for viewing in the browser and one for printing), you'd rather have the printing version excluded from crawling, otherwise you risk being imposed a duplicate content penalty. Also, if you happen to have sensitive data on your site that you do not want the world to see, you will also prefer that search engines do not index these pages (although in this case the only sure way for not indexing sensitive data is to keep it offline on a separate machine). Additionally, if you want to save some bandwidth by excluding images, stylesheets and javascript from indexing, you also need a way to tell spiders to keep away from these items.

liamreed89, Jun 5, 2012 IP

jewelraz Active Member

Messages:: 285

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 90

#6

First of all, I would like to thank you for creating this thread.

User-agent: * (All Robots)
Allow: / (Allowed everything)
Disallow: (Nothing is disallowed)

In above three lines it's telling all robots are allowed to the site. Their is nothing hidden or restricted for the robots. Robots can visit cgi-bin, trash files, pages, log files everything.

jewelraz, Jun 5, 2012 IP

jones1982 Active Member

Messages:: 492

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 55

#7

what does this line means that is on robot.txt?

Sitemap: http://www.example.com/sitemap.xml

jones1982, Jun 13, 2012 IP

Log in or Sign up

robots.txt Analysis

jones1982 Active Member

Anil Strivastava Peon

valen123 Greenhorn

seo-hosting.com Peon

liamreed89 Peon

jewelraz Active Member

jones1982 Active Member

Useful Searches