What is the best approach to block bots and spam referrers?

Jeffr2014 Active Member

Messages:: 254

Likes Received:: 18

Best Answers:: 0

Trophy Points:: 55

#1

I am new to this server administration art, so my apology if I am asking something very basic...

Anyway, here is my dilemma. If I know the IP range that I want to block the best option is to block it with IPTABLES (thanks to @deathshadow for educating me on this). This works well when you want to block entire countries. But what happens when you want to block specific IPs rather than ranges? Is iptables still more effective than "deny from [IP]" in .htaccess? I read that you don't want iptables to grow too big as it slows performance, but I guess it is still more effective than having big .htaccess..?

When it comes to blocking spam bots or referrers, robots.txt is just a suggestion for bots, when I looked at my traffic logs I noticed that most bots don't even look at robots.txt file. As far as I understand the only option here is to use .htaccess

1. I am currently using this in my .htaccess:
SetEnvIfNoCase User-Agent *ahrefsbot* bad_bot=yes
SetEnvIfNoCase Referer fbdownloader.com spammer=yes
...
SetEnvIfNoCase Referer social-buttons.com spammer=yes
Order allow,deny
Allow from all
Deny from env=spammer
Deny from env=bad_bot

2. Apparently, there is another approach as per below:
# Deny domain access to spammers
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} queryseeker [OR]
RewriteCond %{HTTP_REFERER} ^(www\.)?.*(-|.)?adult(-|.).*$ [OR]
...
RewriteCond %{HTTP_REFERER} ^(www\.)?.*(-|.)?sex(-|.).*$
RewriteRule .* - [F,L]

Which approach is better #1 or #2? Any better alternative?

Finally, somebody suggested that you need to have both (as per example below). Is it true?

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^rogerbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^exabot [OR]
RewriteCond %{HTTP_USER_AGENT} ^MJ12bot [OR]
RewriteCond %{HTTP_USER_AGENT} ^dotbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^gigabot [OR]
RewriteCond %{HTTP_USER_AGENT} ^AhrefsBot
RewriteRule .* – [F]

SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot
SetEnvIfNoCase User-Agent .*exabot.* bad_bot
SetEnvIfNoCase User-Agent .*mj12bot.* bad_bot
SetEnvIfNoCase User-Agent .*dotbot.* bad_bot
SetEnvIfNoCase User-Agent .*gigabot.* bad_bot
SetEnvIfNoCase User-Agent .*ahrefsbot.* bad_bot
SetEnvIfNoCase User-Agent .*sitebot.* bad_bot

Order Allow,Deny
Allow from all
Deny from env=bad_bot

Jeffr2014, Apr 10, 2015 IP

billzo Well-Known Member

Messages:: 961

Likes Received:: 278

Best Answers:: 15

Trophy Points:: 113

#2

Jeffr2014 said: ↑

But what happens when you want to block specific IPs rather than ranges? Is iptables still more effective than "deny from [IP]" in .htaccess?
Click to expand...

htaccess use is inefficient and if you care about performance (as you seem to), htaccess should be disabled entirely. Configuration settings can be made directly in the Apache configuration files (such as virtual hosts).

Blocking at the firewall level is the preferred method. However, IP addresses of bots can change. So if you block by IP address, if that bot changes IPs you will no longer be blocking it. On shared hosting, I block (using htaccess) by user agent string. So if the bot changes IPs, it will still be blocked. That will not prevent bad bots from using fake user agent strings (like a scraper intent on harvesting your content). But most of the bots you will want to block, like Majestic and Brandwatch, don't do that. They identify themselves using their user agent strings and are easy to block by that method regardless of what IP they are coming from.

billzo, Apr 11, 2015 IP

Jeffr2014 Active Member

Messages:: 254

Likes Received:: 18

Best Answers:: 0

Trophy Points:: 55

#3

Resolved - see this thread here https://forums.digitalpoint.com/thr...th-htaccess-what-is-the-right-syntax.2752511/

Jeffr2014, May 6, 2015 IP

DarkMatrix Active Member

Messages:: 310

Likes Received:: 14

Best Answers:: 0

Trophy Points:: 55

#4

another easy option is to block from cPanel

DarkMatrix, May 6, 2015 IP

itbypros Greenhorn

Messages:: 4

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 23

#5

The bad bot block in htaccess or apache config files works but you can also use iptables string match if they get way out of hand. It is also useful to do that when you need to analyze the logs, you wont have them constantly flooding your screen. You can do this by a command like this:

iptables -I INPUT -p tcp -m string --string "baidu" --algo bm -j DROP

If you use a firewall script like CSF you would put this in /etc/csf/csfpre.sh then restart.

The string match is very useful for a variety of web server annoyances or attacks. For example, the last few months I have been dealing with wordpress brute forces like crazy and if you can get the client to change the admin page you could make a string match for wp-login.php and so on.

itbypros, May 8, 2015 IP

Jeffr2014 Active Member

Messages:: 254

Likes Received:: 18

Best Answers:: 0

Trophy Points:: 55

#6

itbypros said: ↑

The bad bot block in htaccess or apache config files works but you can also use iptables string match if they get way out of hand.
Click to expand...

Thanks @itbypros I currently use vhost.conf instead of .htaccess (see my previous message). I guess my question is what is more efficient method: blocking with iptables (as you suggested) or with vhost? I currently use iptables to block rogue IPs and IP ranges and it's also used by fail2ban jails...

Jeffr2014, May 9, 2015 IP

billzo Well-Known Member

Messages:: 961

Likes Received:: 278

Best Answers:: 15

Trophy Points:: 113

#7

Blocking at the firewall level is generally preferred if possible.

billzo, May 9, 2015 IP

Jeffr2014 Active Member

Messages:: 254

Likes Received:: 18

Best Answers:: 0

Trophy Points:: 55

#8

Interesting... I guess I can use iptables to block both bots and referrers with:

iptables -I INPUT -p TCP -m string --string "bot or referrer pattern" --algo bm -j LOG --log-prefix "badguys"
iptables -I INPUT -p TCP -m string --string "bot or referrer pattern" --algo bm -j DROP

Any issues with this approach? I understand that the danger here is blocking legitimate requests matching the pattern, e.g. in case of vhost/.htaccess it checks only the UA, here it will check entire request for pattern match...

Jeffr2014, May 10, 2015 IP

Log in or Sign up

What is the best approach to block bots and spam referrers?

Jeffr2014 Active Member

billzo Well-Known Member

Jeffr2014 Active Member

DarkMatrix Active Member

itbypros Greenhorn

Jeffr2014 Active Member

billzo Well-Known Member

Jeffr2014 Active Member

Useful Searches