I need to stop the search engines indexing certain pages containing sensitive information that is not for general public. Can someone please tell me how I can do this? Thanks
I would put sensitive material in password protected directory. ----------- mdvaldosta, I changed my advice regarding robots.txt since not all robots obey. Shannon
disallow the folders or pages in robots.txt like Shannon suggested, and I'd put nofollow condoms on the links pointing to the sensitive information.
Thanks for the prompt feedback. The pages are password protected but, if you carry out a search on MSN for example for a specific item that does not belong in the public domain, it comes straight through along with another number of items in that directory!! No you cant acces the directory but you can see a proportion of what is in there. As for robot.txt, its a bit beyond me that one and as for condoms, well I have a 6 month old boy at 42 so not sure what they are either!!
Well I would suggest you follow mdvaldosa's advice with robots.txt directive (do search for robots.txt tutorial) as well as using no follow links. I have had good luck keeping information private as long as there is absolutely no page online with link to web or page. I suggest you immediately change name of folder or pages so existing links in search engines will not work. I have read search engines will index password protected areas based on links into area. Good luck. Shannon
Sorry, but robots.txt is NOT the way to protect confidential files. Never try to use it to protect sensitive information. If you do not want your directories to be visible (and listed by search engines), add an index.html in these directories. The content of these index.html can be anything (for example, an invitation to go to the home page of the site). Jean-Luc
There's nothing to stop you from using ALL of the above methods, by the way - especially if yuou REALLY do not want those files indexed.
You can forcefully deny access to those pages by editing your .htaccess file on your Apache webserver.
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=93710&from=61050&rd=1 In short: <html> <head> <title>...</title> <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> </head>
Password protect your directory; and for peace of mind remove with robots.txt and in HTML. If it has been indexed - remove with web master central.
Why don't you restrict those pages with htaccess... allow your IPs so that you can access those pages and restrict all other IPs. This would be quite simple solution.
in fact only .htaccess restriction work to disallow any robots / any access. robots still crawl any links even you set noindex meta, dissallow it in robots.txt. they still sniff our sensitive data, they're collecting it. when they say they don't index it, it only means that they don't show it in the index BUT they still have it. watch your access logs once a while to get my point.
think you need to put robot.txt file to your site, this will disallow google bot to crawl your webpage which you don't want to get indexed. rest wait for more answers here, may be we will get some more information. [TABLE="width: 335"] [TR] [TD="width: 335"][/TD] [/TR] [/TABLE]