I have like this: User-agent: * Allow: / Disallow: /page.html Disallow: /page2.html Basically I want to remove from its index the 2 pages. However I've seen lots of pages removed from google index lately, so is my robots.txt wrong?
User-agent: * Disallow: /page.html Disallow: /page2.html Allow: / is better btw, robots.txt is CaSe SenSiTive check mine http://referer.us/robots.txt
The disallow command in the robots.txt file is not the best way of removing pages from Google's index. You will be simply instructing Google not to visit those pages again, not to remove them. They might be de-indexed over a long period of time, but it's definitely not the best solution. I would simply add <meta name="robots" content="noindex, nofollow"> in the <head> section of that page, unless you want to manually remove them using Google Webmaster Tools. ONLY once they are out of the index, I would add the disallow command in the robots.txt. Hope this helps
Personally, I would like to use <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> inside head part of those 2 pages. Have a nice day,
<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW"> is better. As for the robots.txt; This is not necessary: User-agent: * Allow: / The correct syntax of your robots.txt: User-agent: * Disallow: /page.html Disallow: /page2.html
Best way is change the name of the page. You upload an identical page to the one you want to deindex, put the robots.txt to disallow that page then change the links to it elsewhere on your page. Then delete the original.
Is there any difference if I use NoIndex or NOINDEX? I know is not case sensitive but not sure if it works as NoInDeX or just case sensitive meaning NOINDEX and noindex.
It differs for html or xhtml validation. I don't know if some search engines are case sensitive. I always write all of my codes with small letters. I think small letters must be accepted by all of the search engines.
You can use this meta code into the separate page of page.html and page2.html <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> or you can use robots.txt, you can write.. # /robots.txt for your website name User-agent: * Allow: / Disallow: /page.html/ Disallow: /page2.html/ sitemap: your website name/sitemap.xml gud luck....
Changing the robots file will apply to crawlers hitting your site and navigating/indexing the content. But if a crawler comes from an external referrer (a deep link from an off domain page to your page.html, or page2.html it'll still get indexed). You'll want to also add the <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> to the page source on those pages...
go to google webmaster tools and remove and de index them simply or delete them and make a little change in the slug and apply noindex command above mentioned that will help u lot bcoz if ur not looking it to get indexed in google means u dont want pepole to see you delete it will not create any problem