I am new to this server administration art, so my apology if I am asking something very basic... Anyway, here is my dilemma. If I know the IP range that I want to block the best option is to block it with IPTABLES (thanks to @deathshadow for educating me on this). This works well when you want to block entire countries. But what happens when you want to block specific IPs rather than ranges? Is iptables still more effective than "deny from [IP]" in .htaccess? I read that you don't want iptables to grow too big as it slows performance, but I guess it is still more effective than having big .htaccess..? When it comes to blocking spam bots or referrers, robots.txt is just a suggestion for bots, when I looked at my traffic logs I noticed that most bots don't even look at robots.txt file. As far as I understand the only option here is to use .htaccess 1. I am currently using this in my .htaccess: SetEnvIfNoCase User-Agent *ahrefsbot* bad_bot=yes SetEnvIfNoCase Referer fbdownloader.com spammer=yes ... SetEnvIfNoCase Referer social-buttons.com spammer=yes Order allow,deny Allow from all Deny from env=spammer Deny from env=bad_bot 2. Apparently, there is another approach as per below: # Deny domain access to spammers RewriteEngine on RewriteBase / RewriteCond %{HTTP_USER_AGENT} queryseeker [OR] RewriteCond %{HTTP_REFERER} ^(www\.)?.*(-|.)?adult(-|.).*$ [OR] ... RewriteCond %{HTTP_REFERER} ^(www\.)?.*(-|.)?sex(-|.).*$ RewriteRule .* - [F,L] Which approach is better #1 or #2? Any better alternative? Finally, somebody suggested that you need to have both (as per example below). Is it true? RewriteEngine On RewriteCond %{HTTP_USER_AGENT} ^rogerbot [OR] RewriteCond %{HTTP_USER_AGENT} ^exabot [OR] RewriteCond %{HTTP_USER_AGENT} ^MJ12bot [OR] RewriteCond %{HTTP_USER_AGENT} ^dotbot [OR] RewriteCond %{HTTP_USER_AGENT} ^gigabot [OR] RewriteCond %{HTTP_USER_AGENT} ^AhrefsBot RewriteRule .* – [F] SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot SetEnvIfNoCase User-Agent .*exabot.* bad_bot SetEnvIfNoCase User-Agent .*mj12bot.* bad_bot SetEnvIfNoCase User-Agent .*dotbot.* bad_bot SetEnvIfNoCase User-Agent .*gigabot.* bad_bot SetEnvIfNoCase User-Agent .*ahrefsbot.* bad_bot SetEnvIfNoCase User-Agent .*sitebot.* bad_bot Order Allow,Deny Allow from all Deny from env=bad_bot
htaccess use is inefficient and if you care about performance (as you seem to), htaccess should be disabled entirely. Configuration settings can be made directly in the Apache configuration files (such as virtual hosts). Blocking at the firewall level is the preferred method. However, IP addresses of bots can change. So if you block by IP address, if that bot changes IPs you will no longer be blocking it. On shared hosting, I block (using htaccess) by user agent string. So if the bot changes IPs, it will still be blocked. That will not prevent bad bots from using fake user agent strings (like a scraper intent on harvesting your content). But most of the bots you will want to block, like Majestic and Brandwatch, don't do that. They identify themselves using their user agent strings and are easy to block by that method regardless of what IP they are coming from.
Resolved - see this thread here https://forums.digitalpoint.com/thr...th-htaccess-what-is-the-right-syntax.2752511/
The bad bot block in htaccess or apache config files works but you can also use iptables string match if they get way out of hand. It is also useful to do that when you need to analyze the logs, you wont have them constantly flooding your screen. You can do this by a command like this: iptables -I INPUT -p tcp -m string --string "baidu" --algo bm -j DROP If you use a firewall script like CSF you would put this in /etc/csf/csfpre.sh then restart. The string match is very useful for a variety of web server annoyances or attacks. For example, the last few months I have been dealing with wordpress brute forces like crazy and if you can get the client to change the admin page you could make a string match for wp-login.php and so on.
Thanks @itbypros I currently use vhost.conf instead of .htaccess (see my previous message). I guess my question is what is more efficient method: blocking with iptables (as you suggested) or with vhost? I currently use iptables to block rogue IPs and IP ranges and it's also used by fail2ban jails...
Interesting... I guess I can use iptables to block both bots and referrers with: iptables -I INPUT -p TCP -m string --string "bot or referrer pattern" --algo bm -j LOG --log-prefix "badguys" iptables -I INPUT -p TCP -m string --string "bot or referrer pattern" --algo bm -j DROP Any issues with this approach? I understand that the danger here is blocking legitimate requests matching the pattern, e.g. in case of vhost/.htaccess it checks only the UA, here it will check entire request for pattern match...