What is the robots.txt file?

julietegecy Guest

Messages:: 10

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#1

Please can anyone explain me what is robots.txt file and its correct format?

julietegecy, Mar 24, 2010 IP

PhilipSEO Notable Member

Messages:: 467

Likes Received:: 48

Best Answers:: 4

Trophy Points:: 225

#2

Robots.txt controls how Web bots/crawlers/spiders access and index your website. It uses what we in the trade call the Robots Exclusion Protocol. In short, before visiting one of your site's pages the bot looks it up in your robots.txt. If it finds something like

User-agent: *
Disallow: /

-- this means that robots are not allowed to crawl your pages. Of course, this does not always work. For example, viruses and other malware ignore your robots.txt file. But it works for legitimate Web bots such as Googlebot and other search crawlers.

The instructions in the file will depend on what you are trying to accomplish.

PhilipSEO, Mar 24, 2010 IP

addie1 Guest

Messages:: 46

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#3

"Robots.txt" is a regular text file that through its name, has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth
. "Robots.txt" lets you tell Google just that.

addie1, Mar 24, 2010 IP

seo555 Peon

Messages:: 1,035

Likes Received:: 6

Best Answers:: 0

Trophy Points:: 0

#4

use this:

User-agent: *
Allow:

seo555, Mar 24, 2010 IP

james.parker Peon

Messages:: 631

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 0

#5

Basically Robot.txt file is used to protect out any web page not to be indexed out by the crawlers or bots of a search engines.

james.parker, Mar 24, 2010 IP

sopheap Peon

Messages:: 18

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#6

To be short, robot text is used to exclude any page that we dont want search engine to index.

sopheap, Mar 24, 2010 IP

bogs Active Member

Messages:: 2,142

Likes Received:: 16

Best Answers:: 0

Trophy Points:: 80

#7

read and learn more about it here: www.robotstxt.org

bogs, Mar 24, 2010 IP

datasol Active Member

Messages:: 122

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 53

#8

Robots.txt file is allowed to robots to crawling your site pages.

datasol, Mar 24, 2010 IP

smsinhindi Peon

Messages:: 561

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#9

julietegecy said: ↑

Please can anyone explain me what is robots.txt file and its correct format?
Click to expand...

type http://www.yoursite.com/robots.txt and you will get your robots.txt file detail..

smsinhindi, Mar 25, 2010 IP

Jeff Collision Peon

Messages:: 1,020

Likes Received:: 4

Best Answers:: 0

Trophy Points:: 0

#10

Sometimes, a content from your website can be copied to any blog submission pages. you can able to know that by checking. So you can disallow the duplicate copy of your content using robots.txt Then, there is no need to visit your cached pages to be visited by search engine bots. You can also disallow those pages using robots.txt. Make changes for all web spiders
User-agent: *
Disallow: /

Jeff Collision, Mar 25, 2010 IP

PhilipSEO Notable Member

Messages:: 467

Likes Received:: 48

Best Answers:: 4

Trophy Points:: 225

#11

Jeff Collision said: ↑

Sometimes, a content from your website can be copied to any blog submission pages. you can able to know that by checking. So you can disallow the duplicate copy of your content using robots.txt Then, there is no need to visit your cached pages to be visited by search engine bots. You can also disallow those pages using robots.txt. Make changes for all web spiders
User-agent: *
Disallow: /
Click to expand...

Gibberish of the week, makes no sense at all.

PhilipSEO, Mar 25, 2010 IP

meenka Peon

Messages:: 158

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#12

It is a text file where you tell the crawler which pages they can crawl and which they can't crawl

meenka, Mar 25, 2010 IP

rashida Active Member

Messages:: 1,429

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 80

#13

You can create the robots txt file if you want any of your site web pages not to be indexed by search engines.

rashida, Mar 25, 2010 IP

ap09.com Guest

Messages:: 199

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#14

check this article for more information
http://www.webconfs.com/what-is-robots-txt-article-12.php

ap09.com, Mar 26, 2010 IP

freshware Peon

Messages:: 427

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 0

#15

james.parker said: ↑

Basically Robot.txt file is used to protect out any web page not to be indexed out by the crawlers or bots of a search engines.
Click to expand...

Hello Friend,

Yes , I am agree with your view .

freshware, Mar 26, 2010 IP

psharma Prominent Member

Messages:: 1,955

Likes Received:: 85

Best Answers:: 4

Trophy Points:: 345

#16

A robot.txt file is a file which gives instructions to the server about how to handle requests from robots ( means bots or crawlers ). You can set it to allow rebots or deny them or partially allow some of them. You can also add instructions directly to robots, if they understand it they will follow it.

There is some format to write robot.txt files and this file exists at this location www.websitename.com/robot.txt only.
If you want to create one for your website simply upload a file by this name at this location. For contents, you may refers to some online robot.txt generator tools.
CMS based websites ( all blog websites, all forum websites etc including wordpress, blogger, joomla ) have automatically a virtual robot.txt file so you need not to create it separately.

psharma, Mar 26, 2010 IP

tessflores Peon

Messages:: 5

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#17

Web site owners use the /robots.txt file to give instructions about their site to web robots.
There are two important considerations when using /robots.txt:
1. robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
2. the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.
Read more on redalkemi dot com

tessflores, Apr 5, 2010 IP

neiljhonson Peon

Messages:: 315

Likes Received:: 4

Best Answers:: 0

Trophy Points:: 0

#18

you need to know first what are robots.
Robots are the software which work on AI(artificial Intellegence) they check the all web pages and cotents and index the most relevant information with respect to the keyowrds. Robots jump 1 page to another page by anchor tag and follows the path to collect the information.
If you would not like that robots will follow any page or folder then you need to use this robots.txt file which instruct the crawler to follow or not the page.

this file will be robots.txt notepad file.
syntax will be as follows:

User-agent: *
Allow: /
Disallow: /Scripts/
Disallow: /HotelDetails/
Disallow: /flash/
Disallow: /FlashFiles/

for more clarification at robots.txt you need to go for google robots instruction.

neiljhonson, Apr 5, 2010 IP

upshurcreative Guest

Messages:: 418

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#19

You can create a robots.txt file to prevent search engine spisers from consuming excessive amounts of bandwidth on your server and also to prevent potential copyright infringements. A roborts.txt files provides the search engine spiders with information about which pages should be crawled and indexed and which should not. It is a text file that resides in the root directory of your Web server. If you do not provide a robot.txt file, search engines spiders assume that the entire site should be crawled and indexed.

upshurcreative, Apr 5, 2010 IP

Log in or Sign up

What is the robots.txt file?

julietegecy Guest

PhilipSEO Notable Member

addie1 Guest

seo555 Peon

james.parker Peon

sopheap Peon

bogs Active Member

datasol Active Member

smsinhindi Peon

Jeff Collision Peon

PhilipSEO Notable Member

meenka Peon

rashida Active Member

ap09.com Guest

freshware Peon

psharma Prominent Member

tessflores Peon

neiljhonson Peon

upshurcreative Guest

Useful Searches