Problem with robots.txt file

vasvigupt Member

Messages:: 13

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 31

#1

Hi, I have create the robots.txt file for one of my website. I have create rule for block some pages.One of the line I have added to block user page is
User-agent: *
Disallow: /user/
Code (markup):
This line should block the user page also but Google has crawled this page also. Can any one tell me why Google has indexed this page. what is wrong in this code. The url of website is http://cp.wainhouse.com/robots.txt.

Solved! View solution.

vasvigupt, Aug 26, 2013 IP

patco Well-Known Member

Messages:: 2,035

Likes Received:: 47

Best Answers:: 17

Trophy Points:: 100

#2

Are you sure Google didn't crawl the URL before you create this robots.txt file? I mean if it already visited and indexed this page, it would be visible in search engines...

patco, Aug 26, 2013 IP

vasvigupt Member

Messages:: 13

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 31

#3

Yes I am sure. I have blocked all the urls before due to maintenance of website. Nothing was showing on Google. Then I have changed and after 10 days these urls are listed. Do you have any idea why that page is listed. Have I done in correct way?

vasvigupt, Aug 26, 2013 IP

heliumc Greenhorn

Messages:: 60

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 16

#4

May be Google has been crawled your page before you update robots.txt file. Wait for next crawling of Google and see what happens? Or you can use Google webmaster tool to remove your page from Google. Just go into Google webmaster tool and there is an option of Google Index there is the option to remove URL's just add your that page after this your page will be de listed from google.

heliumc, Aug 26, 2013 IP

sailvanetwork Greenhorn

Messages:: 13

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 23

#5

try this, Disallow: /user remove the last slash.

and crawl ≠ index completely

sailvanetwork, Aug 26, 2013 IP

Arick unirow likes this.

Arick unirow Acclaimed Member Best Answer

Messages:: 719

Likes Received:: 298

Best Answers:: 30

Trophy Points:: 500

#6

vasvigupt said: ↑
Hi, I have create the robots.txt file for one of my website. I have create rule for block some pages.One of the line I have added to block user page is
User-agent: *
Disallow: /user/
Code (markup):
This line should block the user page also but Google has crawled this page also. Can any one tell me why Google has indexed this page. what is wrong in this code. The url of website is cp.wainhouse.com/robots.txt.
Click to expand...
Actually there is nothing wrong with Google's crawl and index mechanism.
It seems your robots.txt is what cause the 'unexpected result'. If you look carefully, Your robots.txt only block '/user/' and not '/user'. Please understand if '/user/' and '/user' are two different things.
To solve the problem, follow the advice as what already said by comment above me.

Solution:
Add more line
Disallow: /user
HTML:
Add rel="noindex" in your 'not for indexed' pages.
If you did this, there would be completely no 'links' in Search Engine. If you do only the first method, Whenever SE would say:
A description for this result is not available because of this site's robots.txt – learn more.
HTML:
I prefer the second method than the first as it would be more efficient and all follow-able links could be crawled (more authority while keeping the pages 'not being indexed'). However, you should apply both methods to get the best possible result.
For your indexed pages, you can request to 'delete' that page in 'Webmaster Tools'.

Last edited: Aug 27, 2013

Arick unirow, Aug 26, 2013 IP

sailvanetwork Greenhorn

Messages:: 13

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 23

#7

Arick unirow said: ↑

Add rel="noindex" in your 'not for indexed' pages.
Click to expand...

it should be <meta name="robots" content="NOINDEX" />

more info,
meta tags is define what the page is, rel is explanation what is the relation with other pages.

sailvanetwork, Aug 26, 2013 IP

Arick unirow Acclaimed Member

Messages:: 719

Likes Received:: 298

Best Answers:: 30

Trophy Points:: 500

#8

sailvanetwork said: ↑

it should be <meta name="robots" content="NOINDEX" />

more info,
meta tags is define what the page is, rel is explanation what is the relation with other pages.
Click to expand...

Once again, you were correct about meta tags and the use of 'rel' (relation between link).
For OP, you can use meta tag created by @sailvanetwork above or this tag:
<meta name="robots" content="noindex,follow"/>
HTML:
Using 'Rel' to 'NoIndex' page while keeping it 'Crawl-able' and Follow-able' in each link is recommended.

Arick unirow, Aug 26, 2013 IP

vasvigupt Member

Messages:: 13

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 31

#9

Thanks sailvanetwork and Arick. Now I have corrected the error. Hope it will work.

vasvigupt, Aug 27, 2013 IP

Log in or Sign up

Problem with robots.txt file

vasvigupt Member

patco Well-Known Member

vasvigupt Member

heliumc Greenhorn

sailvanetwork Greenhorn

Arick unirow Acclaimed Member Best Answer

sailvanetwork Greenhorn

Arick unirow Acclaimed Member

vasvigupt Member

Useful Searches