I'm developing a click tracking php script as a side project, and wanted to get ideas and opinions on the best way to minimise invalid clicks. What I've done so far (relating to my specific needs for the script) is: only 1 click per unique IP per 24 hours gets logged as being valid only clicks coming from specified HTTP_REFERER domains are logged as valid any open proxy usage (where http headers reveal the source ip) is logged (to help with point 1) user agents of known bots are checked, and if a match is found, the click is not listed as being valid If you have any experience in this area, what other ways do you use to flag up potentially invalid clicks - from bots or proxy users?
Add a cookie that's set to expire after a certain amount of time.. aside from that in addition with what you've mentioned there isn't much more you can do.
Thanks, I forgot about the cookie option. I've been searching around for an hour or so, and haven't found anything else that stands out as being a good solution. Maybe doing checks at set intervals against a known stealth proxy list?
Not sure what you mean at intervals.. could check the IP of the click against a proxy list / blacklist if that's what you mean.
By intervals I meant checking all click IPs made within (for example) the last 15 minutes, and then flicking their status from valid to invalid if a match is found. I was thinking that a live check on every click could slow things down, and it might be better to do it behind the scenes, independently of user interaction. On a related note, anyone have any suggestions for the best blacklists for stealth IPs?
You can't rely on HTTP_REFERER data - it's sent by the client so can easily be manipulated or more often not sent at all. Sure, it may help prevent invalid clicks but it'd also potentially ignore valid clicks too. I dunno if thats what you want, it may be safer to simply log referrers and use that in helping find invalid clicks.
The HTTP_REFERER data isn't really to do with whether a click is generally valid or not. I included it only because it was a potentially limiting factor, but its only an optional setting, which won't always be used. I may also use it for logging purposes as you indicated above.