1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

What is the best approach to block bots and spam referrers?

Discussion in 'Apache' started by Jeffr2014, Apr 10, 2015.

  1. #1
    I am new to this server administration art, so my apology if I am asking something very basic...

    Anyway, here is my dilemma. If I know the IP range that I want to block the best option is to block it with IPTABLES (thanks to @deathshadow for educating me on this). This works well when you want to block entire countries. But what happens when you want to block specific IPs rather than ranges? Is iptables still more effective than "deny from [IP]" in .htaccess? I read that you don't want iptables to grow too big as it slows performance, but I guess it is still more effective than having big .htaccess..?

    When it comes to blocking spam bots or referrers, robots.txt is just a suggestion for bots, when I looked at my traffic logs I noticed that most bots don't even look at robots.txt file. As far as I understand the only option here is to use .htaccess

    1. I am currently using this in my .htaccess:
    SetEnvIfNoCase User-Agent *ahrefsbot* bad_bot=yes
    SetEnvIfNoCase Referer fbdownloader.com spammer=yes
    ...
    SetEnvIfNoCase Referer social-buttons.com spammer=yes
    Order allow,deny
    Allow from all
    Deny from env=spammer
    Deny from env=bad_bot


    2. Apparently, there is another approach as per below:
    # Deny domain access to spammers
    RewriteEngine on
    RewriteBase /
    RewriteCond %{HTTP_USER_AGENT} queryseeker [OR]
    RewriteCond %{HTTP_REFERER} ^(www\.)?.*(-|.)?adult(-|.).*$ [OR]
    ...
    RewriteCond %{HTTP_REFERER} ^(www\.)?.*(-|.)?sex(-|.).*$

    RewriteRule .* - [F,L]

    Which approach is better #1 or #2? Any better alternative?

    Finally, somebody suggested that you need to have both (as per example below). Is it true?

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} ^rogerbot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^exabot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^MJ12bot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^dotbot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^gigabot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^AhrefsBot
    RewriteRule .* – [F]

    SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot
    SetEnvIfNoCase User-Agent .*exabot.* bad_bot
    SetEnvIfNoCase User-Agent .*mj12bot.* bad_bot
    SetEnvIfNoCase User-Agent .*dotbot.* bad_bot
    SetEnvIfNoCase User-Agent .*gigabot.* bad_bot
    SetEnvIfNoCase User-Agent .*ahrefsbot.* bad_bot
    SetEnvIfNoCase User-Agent .*sitebot.* bad_bot
    SEMrush
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
     
    Jeffr2014, Apr 10, 2015 IP
    SEMrush
  2. billzo

    billzo Well-Known Member

    Messages:
    961
    Likes Received:
    278
    Best Answers:
    15
    Trophy Points:
    113
    #2
    htaccess use is inefficient and if you care about performance (as you seem to), htaccess should be disabled entirely. Configuration settings can be made directly in the Apache configuration files (such as virtual hosts).

    Blocking at the firewall level is the preferred method. However, IP addresses of bots can change. So if you block by IP address, if that bot changes IPs you will no longer be blocking it. On shared hosting, I block (using htaccess) by user agent string. So if the bot changes IPs, it will still be blocked. That will not prevent bad bots from using fake user agent strings (like a scraper intent on harvesting your content). But most of the bots you will want to block, like Majestic and Brandwatch, don't do that. They identify themselves using their user agent strings and are easy to block by that method regardless of what IP they are coming from.
     
    billzo, Apr 11, 2015 IP
  3. Jeffr2014

    Jeffr2014 Active Member

    Messages:
    254
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    55
  4. DarkMatrix

    DarkMatrix Active Member

    Messages:
    310
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    55
    #4
    another easy option is to block from cPanel
     
    DarkMatrix, May 6, 2015 IP
  5. itbypros

    itbypros Greenhorn

    Messages:
    4
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    23
    #5
    The bad bot block in htaccess or apache config files works but you can also use iptables string match if they get way out of hand. It is also useful to do that when you need to analyze the logs, you wont have them constantly flooding your screen. You can do this by a command like this:

    iptables -I INPUT -p tcp -m string --string "baidu" --algo bm -j DROP

    If you use a firewall script like CSF you would put this in /etc/csf/csfpre.sh then restart.

    The string match is very useful for a variety of web server annoyances or attacks. For example, the last few months I have been dealing with wordpress brute forces like crazy and if you can get the client to change the admin page you could make a string match for wp-login.php and so on.
     
    itbypros, May 8, 2015 IP
  6. Jeffr2014

    Jeffr2014 Active Member

    Messages:
    254
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    55
    #6
    Thanks @itbypros I currently use vhost.conf instead of .htaccess (see my previous message). I guess my question is what is more efficient method: blocking with iptables (as you suggested) or with vhost? I currently use iptables to block rogue IPs and IP ranges and it's also used by fail2ban jails...
     
    Jeffr2014, May 9, 2015 IP
  7. billzo

    billzo Well-Known Member

    Messages:
    961
    Likes Received:
    278
    Best Answers:
    15
    Trophy Points:
    113
    #7
    Blocking at the firewall level is generally preferred if possible.
     
    billzo, May 9, 2015 IP
  8. Jeffr2014

    Jeffr2014 Active Member

    Messages:
    254
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    55
    #8
    Interesting... I guess I can use iptables to block both bots and referrers with:

    iptables -I INPUT -p TCP -m string --string "bot or referrer pattern" --algo bm -j LOG --log-prefix "badguys"
    iptables -I INPUT -p TCP -m string --string "bot or referrer pattern" --algo bm -j DROP

    Any issues with this approach? I understand that the danger here is blocking legitimate requests matching the pattern, e.g. in case of vhost/.htaccess it checks only the UA, here it will check entire request for pattern match...
     
    Jeffr2014, May 10, 2015 IP