Help needed - robts.txt and sitemap

jack_sparrow Peon

Messages:: 314

Likes Received:: 11

Best Answers:: 0

Trophy Points:: 0

#1

Hi,

Can someone advice on robots.txt file. M site has a sitemap both xml and html which works fine with google, yahoo and msn. I do not have any robots.txt file. However some search engine repeatedly looks for this file.

I need help in a simple robots.txt file to direct all robots to the xml or html file.

Thanks in advance.

Jack

jack_sparrow, Aug 3, 2007 IP

trichnosis Prominent Member

Messages:: 13,785

Likes Received:: 333

Best Answers:: 0

Trophy Points:: 300

#2

pls visit robotstxt.org to learn more about robots.txt

trichnosis, Aug 6, 2007 IP

adone Peon

Messages:: 190

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#3

hi jack,

I think you need to create a robots.txt file for your sites. Every search engine first find your robots.txt file in you files.

As far as concerned about the xml and html sitemap both are important, but for the point of view of search engine, you must xml map because search engine crawl easily and will get your new pages indexed by xml sitemap.

bye

adone, Oct 15, 2007 IP

Ladadadada Peon

Messages:: 382

Likes Received:: 36

Best Answers:: 0

Trophy Points:: 0

#4

There are now two purposes for a robots.txt file. The first (and main) one is to tell robots which parts of your site they should NOT view.

The second purpose is a more recent addition to the robots.txt standard and is to let robots know where your sitemap file is. If the robots are finding your sitemap file already, then there isn't much need to add it's location to your robots.txt file, but it won't hurt.

Ladadadada, Oct 18, 2007 IP

visionfez Peon

Messages:: 84

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#5

# /robots.txt file for http://www.wallpaperweb.org
User-agent: *
Sitemap: http://www.wallpaperweb.org/sitemap.xml
Disallow: /system_error.asp

visionfez, Oct 21, 2007 IP

Mr_Kumar Notable Member

Messages:: 2,561

Likes Received:: 374

Best Answers:: 1

Trophy Points:: 265

Articles:: 4

#6

Forget robots.txt file. It is nothing important.

Learn more on sitemap specially if you site have thousands of pages. make more than one sitemaps if needed.

I guess I am bit late to reply here.

Mr_Kumar, Nov 13, 2007 IP

Kuldeep1952 Active Member

Messages:: 290

Likes Received:: 18

Best Answers:: 0

Trophy Points:: 60

#7

It is always a good practice to have a robots.txt file. If you have nothing
to enter in it, you can create a blank file. It will prevent the redundant
404 errors. Another file which you should have on the server to reduce
404 errors is favicon.ico.

Kuldeep1952, Nov 15, 2007 IP

reza_24 Member

Messages:: 98

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 41

#8

visionfez said: ↑

# /robots.txt file for http://www.wallpaperweb.org
User-agent: *
Sitemap: http://www.wallpaperweb.org/sitemap.xml
Disallow: /system_error.asp
Click to expand...

site is down?!

reza_24, Nov 15, 2007 IP

Ibrahim Al Mohanna Peon

Messages:: 101

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#9

I did not understand anything. Could you explain waht should I type in it?

Ibrahim Al Mohanna, Nov 16, 2007 IP

Michael2007 Guest

Messages:: 15

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#10

If you need every page to be indexed, you can use the following info in the txt.fie:
User-agent: *
Disallow:

Michael2007, Nov 26, 2007 IP

janwei Banned

Messages:: 161

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#11

trichnosis said: ↑

pls visit robotstxt.org to learn more about robots.txt
Click to expand...

He's right. robotstxt.org is realy good.

janwei, Dec 7, 2007 IP

prlinker Peon

Messages:: 18

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#12

robots.txt has no link with the sitemap file
your sitemap shd be sitemap.xml for google

for yahoo its a text file

prlinker, Dec 24, 2007 IP

pssolanki86 Well-Known Member

Messages:: 905

Likes Received:: 11

Best Answers:: 0

Trophy Points:: 135

#13

create simple robots.txt file and sitemap on ur website

If u want help then I can do for u

pssolanki86, Dec 26, 2007 IP

agrawat Banned

Messages:: 491

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 0

#14

i think sitemap.xml more acceptable and preferable for by most SE. robots.txt maninly prevent your site from bad boots who consume your bandwith but if bandwidth is not a issue for your website than you need not want robots.txt

agrawat, Dec 26, 2007 IP

shimon333 Guest

Messages:: 53

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#15

in robots.txt you tell tje search engin not to go to parts in you site, but we want that google will see all of our site so ' i dont put robots.txt anywhere

shimon333, Jan 7, 2008 IP

SwapsRulez Peon

Messages:: 32

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#16

Just create the robots.txt file in your root directory of the web space. & put the following code in that text file to allow all the robots to crawl your site..
User-agent: *
Disallow:
Code (markup):

SwapsRulez, Jan 12, 2008 IP

chriszz Peon

Messages:: 233

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 0

#17

Im not sure what robots.txt is used for

chriszz, Jan 19, 2008 IP

thetafferboy83 Active Member

Messages:: 312

Likes Received:: 72

Best Answers:: 0

Trophy Points:: 70

#18

It is used to exclude pages from bots, such as search engines. For instance, if you wanted to have a specific page not shown in the search engines.

You can normally get answers to simple questions like this by Googling

thetafferboy83, Jan 22, 2008 IP

catanich Peon

Messages:: 1,921

Likes Received:: 40

Best Answers:: 0

Trophy Points:: 0

#19

jack_sparrow said: ↑

Hi,

Can someone advice on robots.txt file. M site has a sitemap both xml and html which works fine with google, yahoo and msn. I do not have any robots.txt file. However some search engine repeatedly looks for this file.

I need help in a simple robots.txt file to direct all robots to the xml or html file.

Thanks in advance.

Jack
Click to expand...

Jack, you do not need a robots.txt file. We use it to tell the SEs NOT to index a directory or file. It is also used to tell some SEs where to fine the Site Map file.

This is mine:

# Robots.txt file created by 1/20/08
# For domain: http://www.catanich.com
#
# All other robots will spider the domain
User-agent: *
Disallow: /_common/
Disallow: /_private/
Disallow: /_ScriptLibrary/
Disallow: /_*/
Sitemap: http://www.catanich.com/sitemap.xml.gz

It also should be noted that a blank line in the robots.txt file will create an error.

catanich, Feb 2, 2008 IP

Ladadadada Peon

Messages:: 382

Likes Received:: 36

Best Answers:: 0

Trophy Points:: 0

#20

catanich said: ↑

It also should be noted that a blank line in the robots.txt file will create an error.
Click to expand...

Does it ? I have never heard anything about a blank line causing an error but if it does it certainly could explain some of the strange behaviour that some crawlers exhibit.

Presumably, when it causes an error the crawler will ignore the rest of the file below the blank line. I guess some crawlers may even throw the whole file out if they get an error.

Ladadadada, Feb 10, 2008 IP

Log in or Sign up

Help needed - robts.txt and sitemap

jack_sparrow Peon

trichnosis Prominent Member

adone Peon

Ladadadada Peon

visionfez Peon

Mr_Kumar Notable Member

Kuldeep1952 Active Member

reza_24 Member

Ibrahim Al Mohanna Peon

Michael2007 Guest

janwei Banned

prlinker Peon

pssolanki86 Well-Known Member

agrawat Banned

shimon333 Guest

SwapsRulez Peon

chriszz Peon

thetafferboy83 Active Member

catanich Peon

Ladadadada Peon

Useful Searches