Problem with robots.txt

Coldfaun Active Member

Messages:: 57

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 91

#1

I have like this:

User-agent: *
Allow: /

Disallow: /page.html
Disallow: /page2.html

Basically I want to remove from its index the 2 pages.
However I've seen lots of pages removed from google index lately, so is my robots.txt wrong?

Coldfaun, Jul 12, 2010 IP

DoDo Me Peon

Messages:: 2,257

Likes Received:: 27

Best Answers:: 0

Trophy Points:: 0

#2

User-agent: *
Disallow: /page.html
Disallow: /page2.html
Allow: /

is better

btw, robots.txt is CaSe SenSiTive

check mine http://referer.us/robots.txt

DoDo Me, Jul 12, 2010 IP

Coldfaun Active Member

Messages:: 57

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 91

#3

Do I have to add Disallow: /page2.html/ (slash) after html?

Coldfaun, Jul 12, 2010 IP

maltadude Peon

Messages:: 24

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#4

The disallow command in the robots.txt file is not the best way of removing pages from Google's index. You will be simply instructing Google not to visit those pages again, not to remove them. They might be de-indexed over a long period of time, but it's definitely not the best solution.

I would simply add <meta name="robots" content="noindex, nofollow"> in the <head> section of that page, unless you want to manually remove them using Google Webmaster Tools. ONLY once they are out of the index, I would add the disallow command in the robots.txt.

Hope this helps

maltadude, Jul 12, 2010 IP

AirForce1 Peon

Messages:: 1,325

Likes Received:: 13

Best Answers:: 0

Trophy Points:: 0

#5

Personally, I would like to use <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> inside head part of those 2 pages.

Have a nice day,

AirForce1, Jul 12, 2010 IP

snow Member

Messages:: 208

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 33

#6

I want to know basically what is robots.txt

snow, Jul 12, 2010 IP

makeit easy Active Member

Messages:: 2,067

Likes Received:: 62

Best Answers:: 0

Trophy Points:: 90

#7

<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW"> is better.

As for the robots.txt;

This is not necessary:
User-agent: *
Allow: /

The correct syntax of your robots.txt:

User-agent: *
Disallow: /page.html
Disallow: /page2.html

Last edited: Jul 12, 2010

makeit easy, Jul 12, 2010 IP

ledb2b Guest

Messages:: 66

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#8

if you removed from google index, you can try to use google webmaster tool.

ledb2b, Jul 12, 2010 IP

Groovystar Peon

Messages:: 596

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#9

Best way is change the name of the page. You upload an identical page to the one you want to deindex, put the robots.txt to disallow that page then change the links to it elsewhere on your page. Then delete the original.

Groovystar, Jul 12, 2010 IP

Coldfaun Active Member

Messages:: 57

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 91

#10

Is there any difference if I use NoIndex or NOINDEX?
I know is not case sensitive but not sure if it works as NoInDeX or just case sensitive meaning NOINDEX and noindex.

Coldfaun, Jul 13, 2010 IP

makeit easy Active Member

Messages:: 2,067

Likes Received:: 62

Best Answers:: 0

Trophy Points:: 90

#11

Coldfaun said: ↑

Is there any difference if I use NoIndex or NOINDEX?
I know is not case sensitive but not sure if it works as NoInDeX or just case sensitive meaning NOINDEX and noindex.
Click to expand...

It differs for html or xhtml validation. I don't know if some search engines are case sensitive. I always write all of my codes with small letters. I think small letters must be accepted by all of the search engines.

makeit easy, Jul 13, 2010 IP

rentacampervan Greenhorn

Messages:: 20

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 13

#12

You can use this meta code into the separate page of page.html and page2.html <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> or you can use robots.txt, you can write..
# /robots.txt for your website name

User-agent: *
Allow: /
Disallow: /page.html/
Disallow: /page2.html/

sitemap: your website name/sitemap.xml

gud luck....

rentacampervan, Jul 13, 2010 IP

COLO Peon

Messages:: 12

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#13

Changing the robots file will apply to crawlers hitting your site and navigating/indexing the content.
But if a crawler comes from an external referrer (a deep link from an off domain page to your page.html, or page2.html it'll still get indexed).

You'll want to also add the <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> to the page source on those pages...

COLO, Jul 14, 2010 IP

kooner001 Peon

Messages:: 265

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#14

go to google webmaster tools and remove and de index them simply or delete them and make a little change in the slug and apply noindex command above mentioned that will help u lot bcoz if ur not looking it to get indexed in google means u dont want pepole to see you delete it will not create any problem

kooner001, Jul 17, 2010 IP

Groovystar Peon

Messages:: 596

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#15

I thought robots.txt used "disallow" not "noindex"?

Groovystar, Jul 17, 2010 IP

Log in or Sign up

Problem with robots.txt

Coldfaun Active Member

DoDo Me Peon

Coldfaun Active Member

maltadude Peon

AirForce1 Peon

snow Member

makeit easy Active Member

ledb2b Guest

Groovystar Peon

Coldfaun Active Member

makeit easy Active Member

rentacampervan Greenhorn

COLO Peon

kooner001 Peon

Groovystar Peon

Useful Searches