Hi, I have create the robots.txt file for one of my website. I have create rule for block some pages.One of the line I have added to block user page is User-agent: * Disallow: /user/ Code (markup): This line should block the user page also but Google has crawled this page also. Can any one tell me why Google has indexed this page. what is wrong in this code. The url of website is http://cp.wainhouse.com/robots.txt.
Are you sure Google didn't crawl the URL before you create this robots.txt file? I mean if it already visited and indexed this page, it would be visible in search engines...
Yes I am sure. I have blocked all the urls before due to maintenance of website. Nothing was showing on Google. Then I have changed and after 10 days these urls are listed. Do you have any idea why that page is listed. Have I done in correct way?
May be Google has been crawled your page before you update robots.txt file. Wait for next crawling of Google and see what happens? Or you can use Google webmaster tool to remove your page from Google. Just go into Google webmaster tool and there is an option of Google Index there is the option to remove URL's just add your that page after this your page will be de listed from google.
Actually there is nothing wrong with Google's crawl and index mechanism. It seems your robots.txt is what cause the 'unexpected result'. If you look carefully, Your robots.txt only block '/user/' and not '/user'. Please understand if '/user/' and '/user' are two different things. To solve the problem, follow the advice as what already said by comment above me. Solution: Add more line Disallow: /user HTML: Add rel="noindex" in your 'not for indexed' pages. If you did this, there would be completely no 'links' in Search Engine. If you do only the first method, Whenever SE would say: A description for this result is not available because of this site's robots.txt – learn more. HTML: I prefer the second method than the first as it would be more efficient and all follow-able links could be crawled (more authority while keeping the pages 'not being indexed'). However, you should apply both methods to get the best possible result. For your indexed pages, you can request to 'delete' that page in 'Webmaster Tools'.
it should be <meta name="robots" content="NOINDEX" /> more info, meta tags is define what the page is, rel is explanation what is the relation with other pages.
Once again, you were correct about meta tags and the use of 'rel' (relation between link). For OP, you can use meta tag created by @sailvanetwork above or this tag: <meta name="robots" content="noindex,follow"/> HTML: Using 'Rel' to 'NoIndex' page while keeping it 'Crawl-able' and Follow-able' in each link is recommended.