There are many ways to avoid having your website being attacked by bots. I'm going to go over them CAPTCHA Personally I'm not a big fan of CAPTCHA'' as it annoys users (having to type randoms letters which as difficult to read) and lots of spammers have made software to read the images. Re-CAPTCHA Similar to CAPTCHA, but this one you have 2 words. One is known to the computer, the other is from a book thats been scanned. If the word you known to the computer is correct, you allowed though. The scanned word makes the maker of Re-CAPTCHA a small amount of moneys (the computer can't read it...so you tell them the word). Crouching CSS, Hidden Form This technique involves having a field that is hidden (usually with CSS), and detecting if something has been put in it. If it has got content, A bot submitted the form. The only problem with this one is, some bots can read CSS and know if a form is hidden. If CSS is disabled in the users browser they may also fill out the field. Below is an example of code you use: <form id="form1" name="form1" method="post" action=""> <label></label> <p>Field 1 - <input type="text" name="textfield" id="textfield" /> </p> <p> <input type="text" name="textfield2" id="textfield2" style="visibility:hidden;" /> <input type="submit" name="button" id="button" value="Submit" /> </p> <p> </p> </form> HTML: Timestamp This one is a little tricky, but can stop software submitting to you. What you need to do for this one, is generate a random number (say 999) and put it into a field and session. When the form has been submitted compare to see weather the timestamp posted matches the timestamp posted. One of minor problems is, if the user has disabled session cookies, the session may not load. But you can overcome this with MySQL. Below is some fancie PHP functions I have wrote: # TimeStamp Functions # Made By Rogem Networks (http://www.rogem.net) # Do Not remove Link back. function createtimestamp(){ deletestamp(); $timestamp = md5(rand(0, 9999)); $timestamp = md5($timestamp); $timeset = date("His").rand(0, 9999).rand(0, 9999).rand(0, 9999); $timeset = $timeset; $microtime = microtime().rand(0, 9999).rand(0, 9999).rand(0, 9999); $microtime = $microtime; $timestampsession = md5($timestamp); $_SESSION["timestamp"."$timeset"."$microtime"] = $timestampsession; $timestamp = $timestamp."|||".$timeset."|||".$microtime; // now give the person two options (html or timestamp standalone). $return[0] = $timestamp; $return[1] = '<input type="hidden" name="timestamp" value="'.$timestamp.'">'; return $return; } function checktimestamp(){ if($_POST['timestamp'] == TRUE){ $timestamp = $_POST['timestamp']; } else { $timestamp = $_GET['timestamp']; } $posted = explode("|||",$timestamp); if(md5($posted['0']) == $_SESSION["timestamp".$posted['1'].$posted['2']]){ return "safe"; } else { return "unsafe"; } } function deletestamp(){ $posted = explode("|||",$_POST['timestamp']); if(isset($_SESSION["timestamp".$posted['1']])){ unset($_SESSION["timestamp".$posted['1']]); } } PHP: Scan whats sent This is more of a 'if the above all pass' type thing, to detect if someone is physically submitting a form to you. For example: $subject = "abcdef"; $pattern = '/^porn/'; if(preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 3)){ // Found spam } else { // Not found } PHP:
There are a couple more common ones Time for response Set a minimum time from the form being sent before it is accepted back. If you have a form with 30 fields each which require long answers a human isnt going to be able to complete it within a second or two where as a bot can do it instantly in which case set a minium response time of 30 seconds Javascript calculation Have a hidden field which requires a value to be calculated and inserted by javascript as most bots ignore javascript. If the hidden field doesnt have the correct value, reject the form. This obviously doesnt work for the 0.5% of human users that dont use javascript IP limit Limit the number of responses per minute/ hour/ day as appropriate for each IP. If it is a contact form a human is unlikely to be sending you 5 emails per hour etc Question Similar to a captcha however you have the human answer a random simple question (who is the UK prime minister). Next to impossible for bots but at the same time some human people wont know the answer too.
I wonder if you can use a combination of question/answer, but with a picture? For example: what kind of car brand is pictured below? Then the user would have to answer: BMW, Mercedes, Toyota, Ford, Cheverolet Or perhaps, what sport is the person in the picture below doing? A: tennis, skiing, basketball, football, etc.
you could, but again there is still a risk of users not being able to answer correctly. For example if an american site used your sports example you would have issues with the fact that football is two different sports depending on which country you are in and what the uk call football americans call soccer etc.
Ditto the the problem with asking people questions. The main problem with that is names change from country to country, and people may not know the answer in general. I like the 'time response' and 'IP limit' idea. I'm not a big fan of blocking off that 0.5% of users who lack javascript, and the ones that plainly block it I think it's really a matter of getting a medium between usability and security.
There's also the possibility of using animated images for captcha. Most bots can't read these... yet.
The problem with CAPTCHA is the lack of usability, I don't know about you, but i get damn annoyed at having to type these in. Also, would it not be a pain for the server to make an animated image?
isnt difficult to make an animated image (at least not with .net). The one that I found amusing the other day was one where they had an audio option, always good for captcha, but when I clicked to listen to it they have intentionally added a massive amount of background noise, obviously to stop voice recognition software, but it made it as difficult to hear as the background noise on the captcha made it to read. I think on hobby/ semi-pro sites there is far too much counter bot prevention in place which well exceeds the actual threat that exists but because these techniques are becoming so common people writing the bots now think of these types of things. We tend to use the passive methods such as timing, hidden field javascript and IP checking and accept that we will still receive some spam but to date not a single one of the sites we have developed or run ourselves have come under any attack.
I think passive methods are the best, then the user does not notice them as much. I think blocking off the IP's we know are bad, is the best way forward.