Hello, At the moment I think my site is overloaded with various bots. How to block all web spider and bots other than Google bot, Yahoo, Bing? And my second question is how to remove from the Google index a subdirectory that is already indexed?
You can also use your .htaccess to ban bad bots that ignore robots.txt. I have this in my .htaccess: BrowserMatchNoCase Ripper bad_bot Order Deny,Allow Deny from env=bad_bot You can also use this in your robots.txt: User-agent: * Crawl-Delay: 10 (substitute the number for what works best for you)
bots might not follow the instructions from the robots.txt file. your best best is coding to check the uger agent form the http referrer. if its not a visitor or certain bots allowed, redirect to somewhere.