Hey all, I was wondering what steps some of you designers take to spam proof your website. What I mean by spam proofing, is preventing spam bots from sniffing email addresses on your page. If you haven't heared of this, one of the main reasons people receive spam emails, is because their email address is displayed on a web page somewhere in text form. Email harvesters scour the internet (much like googlebot) and rip any text from the source of the page which has an '@' in it and it to their email list. This is bad news if you are displaying your email address on your page. More dodgy, is if your website displays the email addresses of your members on the page somewhere. On my site, the email address of my members is displayed next to their articles/auctions. I feel I am morally obligated to prevent their email addresses ending up on some spam list, so I have taken steps to trip up the email harvesters. Basically, whenever I output an email address from my php code, I pass it through a function which breaks up the string into several chunks of text. These chunks are then joined together by client side Javascript so the email is displayed normally, but is not present anywhere in its full form in the page source. Anyway, I have done all this, but before I release my web site, I want to make sure that I haven't missed any parts of my site out. Does anyone know of any free email harvesters that will crawl my web site on my development pc, and notify me of any addresses that it picks up? Cheers, Sham.
Your best option is to not show email addresses in the first place. I use a quick and dirty cgi-based form to handle it on my company's main website. We have given the various users a numeric code. For instance: http://www.statpub.com/cgi-bin/feedback.pl?19 http://www.statpub.com/cgi-bin/feedback.pl?28 where 19 might be Dick and 28 Jane The script would look the identifying tag up in the database and email the form contents to that person. Make sure you apply strict rules to what kind of content the form can contain. Among other things, crackers might test to see if the scripts can be tricked into sending email to a third party. You do not want this to happen. The net result is your site has no email addresses which can be harvested and people can contact your authors. Many CMSes have similar features nbuilt in., Your might want to investigate your system to see if it already has that capability.
I've used a javascript that breaks the source code up so the crawlers are less likely to grab it. I also have the web site redirect from a bogus account so I can change it if I need to.
I have been using a simple html "character entity" translator. Displays as "blah @ blah. net. com," but looks like "& #098;& #108;& #097;& #104;& #064;& #098;& #108;& #097;& #104;&# 046;& #110;& #101;& #116;& #046;& #099;& #111;& #109;" to 'bots. [EDIT: Had to add spaces because the forum was turning the "& # 565" back into alpha characters!] Here's one site that discusses it: http://www.samisite.com/test-csb2nf/id129.htm
Hmm that seems like a nice idea, but surely harvesters would clock on to this trick, and scan pages for "& #064;" which is the @ symbol, and then get surrounding characters and then translate them back into plain text form?
That still seems to work pretty well. <script language="JavaScript"> <!-- var name = "crazy_rob"; var domain = "doman.com"; document.write('<a href=\"mailto:' + name + '@' + domain + '\">'); document.write(name + '@' + domain + ''); // --> </script> Code (markup):
You make a good point. I have wondered that myself. The answer seems to be that yes, in theory, it could be reverse-translated by the 'bot, but that would take work. Spammers are lazy, and much too preoccupied with the tons of lower hanging fruit. Keeping your fruit from hanging too low is a good rule of thumb in any endeavor.
create an image file with your email on it or if using a form, use a "human check" verification (generate a random code or word that the user has to enter correctly for the form to parse). If the code is wrong, the script doesn't work and no email address is ever seen.
Beware of "quick and dirty" script writing. Form injection is on the rise in a big way! JavaScript is a poor idea, given the number of users who will be inconvenienced by a method such as this (roughly 10-15% of the Web audience, and this number is rising with the continued growth of mobile Web surfing devices and more advance surfing mechanisms for people with disabilities). Character entities are hit-or-miss. A bot can still be programmed to read the final page output (another reason JavaScript is not very effective). The same can be said about images - bots are harvesting images, and yes, bypassing image verification techniques as well. Plus, don't forget, not all email addresses are harvested by bots. Human harvesters will not be fooled by any of these techniques. The only safe way to keep from having an address harvested is to not post it. Use a good form processing script and only allow email communication via a form on your site. Assuming you're using a well-secured form, this is by far the best way to go. I personally will never post an email address on a site unless the client expressedly requests it, and then I'll only do so after giving all due warnings about the dangers of being harvested. All it takes is one harvester to expose you to thousands of spam sources. All it takes is one not-so-lazy spammer to harvest right around these obfuscation techniques. Instead of counting on spammers to be lazy, be proactive and look for ways to keep addresses off the Web!
Yeah, when I went to look for the js script I posted above, I realized that I have forms everywhere except in one place. And I'm going to change that ASAP. It's 3% for my sites. People keep saying it's 10-15% but I'm just not seeing that anywhere. Anyway, good post!
You won't see it, because JS-disabled visitors won't typically get beyond the first page, and traffic will tend to reflect more usage by people who are able to make use of the site. The number continues to rise because of technology advances and perceptions regarding security, particularly in large corporations, where IT departments might disable JS across tens of thousands of employees to help prevent having their internal networks compromised. Oddly enough, the people who tend to not have JavaScript enabled - people who use the latest mobile technology and professionals in the corporate environment - are one of the most affluent demographics on the Web, precisely the people who you don't want to lose from your typical Web site audience! You never know who your best customer will be, so why use a technique that will knowingly deny any of your visitors access to your site, regardless of how large or small that population might be? (I didn't get the impression you were disagreeing, I just figured I'd give a little more detail on my assertions for everyone in general).
We're purely talking about this being an issue if you use JavaScript as a means to obfuscate your email address from harvesters. Your server logs won't show the number of people who don't use JavaScript, because someone who leaves the front page of a site never has a chance to demonstrate whether they have JS enabled or not. So there's no real way to track this, other than to look at the sizable demographics that are known to not use JavScript, and then choose not to use it for obfuscation as a result
Easiest thing to do is just put your email with separated things like: Email me at: bob -at- cogeco.ca
I ran across this elsewhere just yesterday. If your server is ASP compatible, might be worth a look. planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=9151&lngWId=4 Another way is similar to above me[Remove]@mydomain.com I always use a contact form and when I am thinking, a validation gif like jnm mentioned. My form spam is generally limited to when I forget to use the gif.
http://www.animalcrossingwildworld.com/contactus.php http://www.pagerank.info/contactus.php I sell this script for contact pages. It's easy to implement into any site with php. It has human verification and can handle multiple email addresses. I sell it for $10 and for another $10 I can install with site minor changes (your email addresses). The script is an excellent way to avoid spam as it not only requires a human to read the code but it also removes BCC from the forms which has now become a big injection problem. http://securephp.damonkohler.com/index.php/Email_Injection PM me if interested. You only need to purchase 1 copy for multiple installs but you are NOT allowed any resale rights.