Why do I need a robots.txt , and what to put in it?

Melissa2007 Active Member

Messages:: 21

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 86

#1

I'm looking into some SEO issues for my home business site, and keep seeing that Google expects to see a robots.txt file. For SEO?

But then I read that lots of robots just ignore it anyway, and of course we WANT Google to keep it indexed, right?

So why exactly do I need a robots.txt , and what should I put in it?

Melissa2007, Feb 23, 2019 IP

qwikad.com Illustrious Member Affiliate Manager

Messages:: 7,366

Likes Received:: 1,715

Best Answers:: 31

Trophy Points:: 475

#2

You're right most bots will ignore your robots.txt

If you want Google to crawl your entire website put this into your robots.txt:
User-agent: *
Disallow:
Code (markup):
The truth is your website will get crawled without robots.txt. It's mostly used to disallow Google (and other bots) to crawl certain parts of a website.

qwikad.com, Feb 23, 2019 IP

Melissa2007 Active Member

Messages:: 21

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 86

#3

I was going without one completely, but keep seeing that for SEO purposes, Google looks for one. So I just did this one, to exclude my test directory from being indexed, anyway:

User-agent: *
Disallow: /test/

Melissa2007, Feb 23, 2019 IP

mmerlinn Prominent Member

Messages:: 3,197

Likes Received:: 819

Best Answers:: 7

Trophy Points:: 320

#4

Melissa2007 said: ↑

I was going without one completely, but keep seeing that for SEO purposes, Google looks for one. So I just did this one, to exclude my test directory from being indexed, anyway:

User-agent: *
Disallow: /test/
Click to expand...

If NOTHING in your test directory is EVER linked to the outside world, NO robot can find it nor index it, in which case you do not need a robots.txt file for it.

However, be aware that ROGUE bots do NOT honor a robots.txt file, but instead index the links anyway. Having a robots.txt file for a test directory tells the rogue bots where the test file is located, AND THEY WILL INDEX IT IF THEY FIND YOUR robots.txt file.

I know this because it has happened to me. I had to move everything, then create a fake page with fake information for the rogue bots to index to repair the damage.

Best bet is to remove your test file from the robots.txt file before a rogue bot finds it and indexes your test file. Then make sure that you NEVER link your test file to the outside world. You can also put noindex and nofollow tags in your META tags for the test file which will stop all honorable bots from indexing it. Finally, you can password protect your test file which keeps EVERYONE out except those who know the password.

Last edited: Feb 24, 2019

mmerlinn, Feb 24, 2019 IP

Melissa2007 Active Member

Messages:: 21

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 86

#5

If I want to disallow several directories in the main domain root, is it done this way?:

User-agent: *
Disallow: /test/ /JPG/ /PHOT/ /VID/

Melissa2007, Feb 24, 2019 IP

Melissa2007 Active Member

Messages:: 21

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 86

#6

mmerlinn said: ↑

If NOTHING in your test directory is EVER linked to the outside world, NO robot can find it nor index it, in which case you do not need a robots.txt file for it.
Click to expand...

Not sure what you mean by linked. The test directory is a subdirectory of my main domain, so the main domain is indexed.

However, be aware that ROGUE bots do NOT honor a robots.txt file, but instead index the links anyway.
Click to expand...

I'm not familiar with these terms yet. Rogue bots? Are they looking for things to steal? My test directory is just a way to test what I intend to put in the indexed domain later. There's nothing that really needs securing, that I know of.

Having a robots.txt file for a test directory tells the rogue bots where the test file is located, AND THEY WILL INDEX IT IF THEY FIND YOUR robots.txt file.
Click to expand...

But for what purpose would they index it? I have no secret or security info in it.

I know this because it has happened to me. I had to move everything, then create a fake page with fake information for the rogue bots to index to repair the damage.

Best bet is to remove your test file from the robots.txt file before a rogue bot finds it and indexes your test file. Then make sure that you NEVER link your test file to the outside world. You can also put noindex and nofollow tags in your META tags for the test file which will stop all honorable bots from indexing it. Finally, you can password protect your test file which keeps EVERYONE out except those who know the password.
Click to expand...

As far as password protection, I made a directory like that years ago, eventually forgot the password - even what was in it, and it still lingers in my ISP's account for me. I don't even know if they can remove it. I've asked, to no avail.

Melissa2007, Feb 24, 2019 IP

qwikad.com Illustrious Member Affiliate Manager

Messages:: 7,366

Likes Received:: 1,715

Best Answers:: 31

Trophy Points:: 475

#7

Melissa2007 said: ↑

If I want to disallow several directories in the main domain root, is it done this way?:

User-agent: *
Disallow: /test/ /JPG/ /PHOT/ /VID/
Click to expand...

No, the correct way is this:

User-agent: *
Disallow: /test/
Disallow: /JPG/
Disallow: /PHOT/
Disallow: /VID/

qwikad.com, Feb 24, 2019 IP

Melissa2007 Active Member

Messages:: 21

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 86

#8

Thanks!

Melissa2007, Feb 24, 2019 IP

mmerlinn Prominent Member

Messages:: 3,197

Likes Received:: 819

Best Answers:: 7

Trophy Points:: 320

#9

Melissa2007 said: ↑

Not sure what you mean by linked. The test directory is a subdirectory of my main domain, so the main domain is indexed.
Click to expand...

Is there a link on ANY PUBLIC WEBPAGE ANYWHERE ON THE INTERNET that will open up your test pages? If so, you have outside links.

Melissa2007 said: ↑

I'm not familiar with these terms yet. Rogue bots? Are they looking for things to steal? My test directory is just a way to test what I intend to put in the indexed domain later. There's nothing that really needs securing, that I know of.
Click to expand...

A rogue bot is a bot that is rogue, which is a bot that DOES NOT FOLLOW THE RULES, which WILL index pages where you specifically DISALLOW indexing. Yahoo comes to mind here as Yahoo bots indexed my private pages EVEN THOUGH I DISALLOWED THEM and unfortunately my private information has spread across the web for anyone to use/abuse. Indexing bots generally are not looking to steal anything as they exist for only one purpose, to index pages.

The problem lies in the fact that once your private pages are indexed, the WHOLE world can now access your private pages and they can NEVER be unindexed. Should you have any private information on those pages (now or in the future), anyone else can see it or even steal it.

Melissa2007 said: ↑

But for what purpose would they index it? I have no secret or security info in it.
Click to expand...

Bots do not need a reason to index. They just index EVERYTHING that they find. If you do not want something indexed, you MUST prevent them from indexing it. Disallowing and noindexing is not enough.

Melissa2007 said: ↑

As far as password protection, I made a directory like that years ago, eventually forgot the password - even what was in it, and it still lingers in my ISP's account for me. I don't even know if they can remove it. I've asked, to no avail.
Click to expand...

If bots can never find your pages, you will never need to password them to stop the bots.

mmerlinn, Feb 26, 2019 IP

Log in or Sign up

Why do I need a robots.txt , and what to put in it?

Melissa2007 Active Member

qwikad.com Illustrious Member Affiliate Manager

Melissa2007 Active Member

mmerlinn Prominent Member

Melissa2007 Active Member

Melissa2007 Active Member

qwikad.com Illustrious Member Affiliate Manager

Melissa2007 Active Member

mmerlinn Prominent Member

Useful Searches