Stoping bots submiting forms - Top Tips

Discussion in 'Programming' started by Rogem, Sep 27, 2007.

  1. #1
    There are many ways to avoid having your website being attacked by bots. I'm going to go over them :)

    CAPTCHA
    Personally I'm not a big fan of CAPTCHA'' as it annoys users (having to type randoms letters which as difficult to read) and lots of spammers have made software to read the images.


    Re-CAPTCHA
    Similar to CAPTCHA, but this one you have 2 words. One is known to the computer, the other is from a book thats been scanned. If the word you known to the computer is correct, you allowed though. The scanned word makes the maker of Re-CAPTCHA a small amount of moneys (the computer can't read it...so you tell them the word).


    Crouching CSS, Hidden Form
    This technique involves having a field that is hidden (usually with CSS), and detecting if something has been put in it. If it has got content, A bot submitted the form.
    The only problem with this one is, some bots can read CSS and know if a form is hidden. If CSS is disabled in the users browser they may also fill out the field.
    Below is an example of code you use:
    
    <form id="form1" name="form1" method="post" action="">
      <label></label>
      <p>Field 1 -
        <input type="text" name="textfield" id="textfield" />
      </p>
      <p>
        <input type="text" name="textfield2" id="textfield2" style="visibility:hidden;" />
        <input type="submit" name="button" id="button" value="Submit" />
      </p>
      <p>&nbsp; </p>
    </form>
    
    HTML:

    Timestamp
    This one is a little tricky, but can stop software submitting to you. What you need to do for this one, is generate a random number (say 999) and put it into a field and session. When the form has been submitted compare to see weather the timestamp posted matches the timestamp posted.
    One of minor problems is, if the user has disabled session cookies, the session may not load. But you can overcome this with MySQL.
    Below is some fancie PHP functions I have wrote:
    
    # TimeStamp Functions # Made By Rogem Networks (http://www.rogem.net) # Do Not remove Link back.
    function createtimestamp(){
    deletestamp();
    $timestamp = md5(rand(0, 9999));
    $timestamp = md5($timestamp);
    $timeset = date("His").rand(0, 9999).rand(0, 9999).rand(0, 9999);
    $timeset = $timeset;
    $microtime = microtime().rand(0, 9999).rand(0, 9999).rand(0, 9999);
    $microtime = $microtime;
    $timestampsession = md5($timestamp);
    $_SESSION["timestamp"."$timeset"."$microtime"] = $timestampsession;
    $timestamp = $timestamp."|||".$timeset."|||".$microtime;
    
    // now give the person two options (html or timestamp standalone).
    $return[0] = $timestamp;
    $return[1] = '<input type="hidden" name="timestamp" value="'.$timestamp.'">';
    return $return;
    }
    
    function checktimestamp(){
    if($_POST['timestamp'] == TRUE){
    $timestamp = $_POST['timestamp'];
    } else {
    $timestamp = $_GET['timestamp'];
    }
    $posted = explode("|||",$timestamp);
    if(md5($posted['0']) == $_SESSION["timestamp".$posted['1'].$posted['2']]){
    return "safe";
    } else {
    return "unsafe";
    }
    }
    
    function deletestamp(){
    $posted = explode("|||",$_POST['timestamp']);
    if(isset($_SESSION["timestamp".$posted['1']])){
    unset($_SESSION["timestamp".$posted['1']]);
    }
    }
    
    PHP:

    Scan whats sent
    This is more of a 'if the above all pass' type thing, to detect if someone is physically submitting a form to you. For example:
    
    $subject = "abcdef";
    $pattern = '/^porn/';
    if(preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 3)){
    // Found spam
    } else {
    // Not found
    }
    
    PHP:
     
    Rogem, Sep 27, 2007 IP
  2. AstarothSolutions

    AstarothSolutions Peon

    Messages:
    2,680
    Likes Received:
    77
    Best Answers:
    0
    Trophy Points:
    0
    #2
    There are a couple more common ones

    Time for response
    Set a minimum time from the form being sent before it is accepted back. If you have a form with 30 fields each which require long answers a human isnt going to be able to complete it within a second or two where as a bot can do it instantly in which case set a minium response time of 30 seconds

    Javascript calculation
    Have a hidden field which requires a value to be calculated and inserted by javascript as most bots ignore javascript. If the hidden field doesnt have the correct value, reject the form. This obviously doesnt work for the 0.5% of human users that dont use javascript

    IP limit
    Limit the number of responses per minute/ hour/ day as appropriate for each IP. If it is a contact form a human is unlikely to be sending you 5 emails per hour etc

    Question
    Similar to a captcha however you have the human answer a random simple question (who is the UK prime minister). Next to impossible for bots but at the same time some human people wont know the answer too.
     
    AstarothSolutions, Sep 27, 2007 IP
  3. Loganet

    Loganet Peon

    Messages:
    122
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I wonder if you can use a combination of question/answer, but with a picture? For example: what kind of car brand is pictured below?

    Then the user would have to answer: BMW, Mercedes, Toyota, Ford, Cheverolet

    Or perhaps, what sport is the person in the picture below doing?

    A: tennis, skiing, basketball, football, etc.
     
    Loganet, Sep 27, 2007 IP
  4. AstarothSolutions

    AstarothSolutions Peon

    Messages:
    2,680
    Likes Received:
    77
    Best Answers:
    0
    Trophy Points:
    0
    #4
    you could, but again there is still a risk of users not being able to answer correctly. For example if an american site used your sports example you would have issues with the fact that football is two different sports depending on which country you are in and what the uk call football americans call soccer etc.
     
    AstarothSolutions, Sep 28, 2007 IP
  5. Rogem

    Rogem Peon

    Messages:
    171
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Ditto the the problem with asking people questions. The main problem with that is names change from country to country, and people may not know the answer in general.

    I like the 'time response' and 'IP limit' idea. I'm not a big fan of blocking off that 0.5% of users who lack javascript, and the ones that plainly block it :)

    I think it's really a matter of getting a medium between usability and security.
     
    Rogem, Sep 28, 2007 IP
  6. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #6
    There's also the possibility of using animated images for captcha. Most bots can't read these... yet.
     
    nico_swd, Sep 28, 2007 IP
  7. Rogem

    Rogem Peon

    Messages:
    171
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #7
    The problem with CAPTCHA is the lack of usability, I don't know about you, but i get damn annoyed at having to type these in. Also, would it not be a pain for the server to make an animated image?
     
    Rogem, Sep 28, 2007 IP
  8. AstarothSolutions

    AstarothSolutions Peon

    Messages:
    2,680
    Likes Received:
    77
    Best Answers:
    0
    Trophy Points:
    0
    #8
    isnt difficult to make an animated image (at least not with .net).

    The one that I found amusing the other day was one where they had an audio option, always good for captcha, but when I clicked to listen to it they have intentionally added a massive amount of background noise, obviously to stop voice recognition software, but it made it as difficult to hear as the background noise on the captcha made it to read.

    I think on hobby/ semi-pro sites there is far too much counter bot prevention in place which well exceeds the actual threat that exists but because these techniques are becoming so common people writing the bots now think of these types of things.

    We tend to use the passive methods such as timing, hidden field javascript and IP checking and accept that we will still receive some spam but to date not a single one of the sites we have developed or run ourselves have come under any attack.
     
    AstarothSolutions, Sep 28, 2007 IP
  9. Rogem

    Rogem Peon

    Messages:
    171
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #9
    I think passive methods are the best, then the user does not notice them as much.

    I think blocking off the IP's we know are bad, is the best way forward.
     
    Rogem, Sep 28, 2007 IP