how to detect scrapers / spambots - is there any professional tool / service ?

Discussion in 'Security' started by bm24, Jun 18, 2010.

  1. #1
    question:
    -one of our sites has every day a lot of traffic form ip's (a lot are from amazon cloud) with the same quantity of pages and hits (for example 45,000 pages and 45,000 hits in 2 hours with 1 gb traffic..) , a page normaly from a humen user has 10 hits or more per pagview,
    so i think this must be a bot or scraper. is this correct?
    - this site has in awstats maybe 50 % more pageviews than in google analytics, why? are this all from bots?
    - we have every day a lot of this kind of "attacs"
    - is there any professional tool to block scrapers or bots / or can detect if there is for example from the same ip in a specific time to may request, or a fake useragent, and give alerts or block this ip automaticly? or a professional service?

    has sombody in this forum similar problems and some tips ?
     
    bm24, Jun 18, 2010 IP
  2. hans

    hans Well-Known Member

    Messages:
    2,923
    Likes Received:
    126
    Best Answers:
    1
    Trophy Points:
    173
    #2
    depending on your server control capabilities, there are multiple ways to block spam bots

    via iptables may be the most efficient one
    Google for
    spambots ip list
    to find current and hopefully maintained blacklists to be added to your iptables

    another method would be using .htaccess and deny access to known spambot IPs

    as said above - it depends on how you can control your server or web space. Google for spambot blacklists and you find multiple solutions

    I run snort + mod_securtiy + iptables and also add manually spambots to .htaccess if needed
    I prefer iptables as it reduces writing to log files and thus reduces wasteful use of server resources for spam. blocking with .htaccess however creates an entry for each denied access in your error_log - iptables just drops the requests into nul space .... or where ever ...
     
    hans, Jun 30, 2010 IP