I'm setting up a banner advertising system on my site. In order to track page impressions properly, I need to implement a counter in PHP which increments with every valid page impression, but does nothing if the page is being spidered by robots. How do I determine using PHP whether the page has been requested by a robot or a genuine visitor? Cheers, Mat
I'd like to no this too. Sometimes on my forums I get all of the robots counted as active members, etc.
not sure how but you could usethe user_agent string to determine if its a spider or not, i no it can be done
I've been digging around a lot, and it seems there's no quick and easy way to identify a robot. If you need to identify robots in real-time on your page then as far as I can see there are 2 ways: 1. Maintain your own pageview logs in a database. Log every time a page is loaded in a database table, along with the user-agent. You then monitor the various user-agents (not too hard, there arent a great many) and maintain a table of "blacklisted" user-agents which you can identify as robots. Then when a page loads you can check the user agent against your blacklist and decide whether or not to increment your page counter. This is my preferred method. 2. Use the get_browser() function in PHP, which returns a large object which has a property identifiying whether the user-agent is a robot/crawler. However this depends on maintaining an up-to-date browscap.ini file, which you may not have access to if it's a shared server. Anyone have any better ideas? Cheers, Mat
im not expert at php, infact i know hardly anything so the code below probally means f all, but could u do this? <?php $user_agent = GetVar("HTTP_USER_AGENT", ""); if {'$user_agent'} = googlebot 2.1 (add whatever is needed to not count hit here) ?> PHP:
Yes this is the general idea. Unfortunately there are hundreds of different user-agents for robots, which is why we need some sort of lookup (either database as I suggested, or text file as used by the get_browser() function). I'm not trying to count hits for a specific robot, I'm trying to count hits for when the user-agent is NOT a robot - in order to get more accurate stats for banner impressions. Mat
how about go thru your server logs? if you can download a copy of awstats (stats program) it will have an up-2-date list of all SE robots.
I already use Urchin - which does fairly good stats. I'm not familiar with awstats, but most stats programs (including Urchin) are not real-time (they update overnight). I'm talking about tracking real-time stats here. In order to properly manage my ad banner system I need real-time stats, and to be able to determine browser/robot in real time. Cheers, Mat
what i meant was to search the awstats source code to find a list of the search engines user_agent string, then you have a pre-built (up 2 date if latest version) list of what user_agent is what
ya know ... you can always look for the manufacturing operating system too ... .... seems there are only 3 you really need to know. it's not 100% ... but it's mostly either MS, MAC or Linux. quick and dirty ... i use this to show flash navigation for real people, and plain html for bots/odd browsers. <?php $client = $_SERVER[HTTP_USER_AGENT]; if(strstr($client,"Windows") || strstr($client,"Macintosh")) { trackit();} ?> Code (markup): not sure what linux woulds show up as ... but if the user-agent contains windows or macintosh ... it fires.