PHP Code to determine Bot

Discussion in 'PHP' started by softstor, Jan 31, 2007.

  1. #1
    Is there PHP code available that can determine if the visitor is a user or a bot?
     
    softstor, Jan 31, 2007 IP
  2. CodyRo

    CodyRo Peon

    Messages:
    365
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Like a spider? You can check the referrer and see if it identifies itself as one and block it.. aside from that I don't think so..

    Unless you want to stop it from posting spam and such, then just make a captcha.
     
    CodyRo, Jan 31, 2007 IP
  3. Louis11

    Louis11 Active Member

    Messages:
    783
    Likes Received:
    26
    Best Answers:
    0
    Trophy Points:
    70
    #3
    Why not put some code at the top of your page that checks the user agent. Most bots appear as "googlebot" or "spider". If it finds one of those (along with others) have it index it. Just a thought though :)
     
    Louis11, Jan 31, 2007 IP
  4. SoftCloud

    SoftCloud Well-Known Member

    Messages:
    1,060
    Likes Received:
    28
    Best Answers:
    2
    Trophy Points:
    120
    #4
    If you search google you can get a list of all "User agent" values then you can simply do either a "IF / ELSE IF" thing in PHP or a SWITCH / CASE thing to determine if they are bots or not. :)
     
    SoftCloud, Jan 31, 2007 IP
  5. Psychotomus1

    Psychotomus1 Banned

    Messages:
    411
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #5
    thats horrible way. Make a file with an array of bots and run a for each loop to check all the values if its a bot or not. your 10'000 lines of code will be like 3 besides your array list =)
     
    Psychotomus1, Feb 1, 2007 IP
  6. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #6
    Here's a function I use.

    
    
    function is_bot()
    {
    	$spiders = array(
    		// Most common
    		'googlebot'               => 'Google Bot',
    		'slurp'                   => 'Yahoo! Slurp spider',
    		'msnbot'                  => 'MSN Bot',
    		// Others
    		'appie'                   => 'Appie spider',
    		'architext'               => 'Architext spider',
    		'asterias'                => 'Asterias spider',
    		'atomz'                   => 'Atomz spider',
    		'augurfind'               => 'Augurfind spider',
    		'bannana_bot'             => 'Bannana Bot',
    		'booch'                   => 'Booch spider',
    		'crawl'                   => 'Crawl spider',
    		'diamondbot'              => 'Diamond Bot',
    		'docomo'                  => 'Docomo spider',
    		'exabot'                  => 'Exalead Bot',
    		'frooglebot'              => 'Froogle Bot',
    		'gaisbot'                 => 'Gais Bot',
    		'gigabot'                 => 'Giga Bot',
    		'goforit'                 => 'GoForIt Bot',
    		'grub'                    => 'Grub spider',
    		'gulliver'                => 'Gulliver spider',
    		'ia_archiver'             => 'Alexa spider',
    		'iconsurf'                => 'Iconsurf spider',
    		'iltrovatore'             => 'Iltrovatore spider',
    		'indexer'                 => 'Indexer spider',
    		'infoseek'                => 'InfoSeek spider',
    		'jetbot'                  => 'JetBot',
    		'kit_fireball'            => 'Kit Fireball spider',
    		'lachesis'                => 'Lachesis spider',
    		'larbin'                  => 'Larbin web crawler',
    		'linkwalker'              => 'SEVENtwentyfour crawler',
    		'mantraagent'             => 'LookSmart spider',
    		'mercator'                => 'Mercator spider',
    		'moget'                   => 'Moget Bot',
    		'muscatferret'            => 'Muscat Ferret spider',
    		'myweb'                   => 'MyWeb spider',
    		'nameprotect'             => 'NameProject spider',
    		'naverbot'                => 'Naver Bot',
    		'ncsa beta'               => 'NCSA Beta spider',
    		'netresearchserver'       => 'Look.com spider',
    		'npbot'                   => 'NameProject spider',
    		'nutch'                   => 'Nutch spider',   
    		'osis-project'            => 'Osis-Project spider',
    		'polybot'                 => 'Polybot crawler',
    		'pompos'                  => 'Pompos crawler',
    		'poppelsdorf'             => 'Poppelsdorf bot',
    		'psbot'                   => 'Picsearch bot',
    		'scooter'                 => 'AltaVista bot',
    		'scrubby'                 => 'Scrub the web spider',
    		'seeker'                  => 'MyNewFavoriteThing RSS Feed seeker',
    		'sidewinder'              => 'InfoSeek spider',
    		'sohu'                    => 'Sohu-search spider',
    		'spider'                  => 'Unknown spider',
    		'spyder'                  => 'Unknown spider',
    		'steeler'                 => 'Steeler spider',
    		'szukacz'                 => 'Szukacz Bot',
    		't-h-u-n-d-e-r-s-t-o-n-e' => 'Thunderstone search spider',
    		'teoma'                   => 'Teoma Bot',
    		'turnitinbot'             => 'Turnitin web crawler',
    		'tutorgig'                => 'TutorGig Bot',
    		'ultraseek'               => 'Ultraseek spider',
    		'vagabondo'               => 'Vagabondo spider',
    		'voilabot'                => 'Voila.fr Bot',
    		'voyager'                 => 'Voyager',
    		'w3c_validator'           => 'W3C Validator',
    		'websitepulse'            => 'Websitepulse spider',
    		'worldlight'              => 'Worldlight spider',
    		'worm'                    => 'Worm Bot',
    		'zao'                     => 'Zao spider',
    		'zippp'                   => 'Zippp spider',
    		'zyborg'                  => 'LookSmart spider',
    		'obot'                    => 'OBot spider',
    		'bot'                     => 'Unknown Bot'
    	);
    	
    	$useragent = strtolower($_SERVER['HTTP_USER_AGENT']);
    
    	foreach ($spiders AS $spider => $name)
    	{
    		if (strpos($useragent, $spider) !== false)
    		{
    			return $name;
    		}
    	}
    	
    	return false;
    }	
    
    
    PHP:

    
    if (is_bot())
    {
       // Is bot
    }
    else
    {
       // Isn't bot
    }
    
    
    PHP:
     
    nico_swd, Feb 2, 2007 IP
    SoftCloud and abixalmon like this.
  7. Psychotomus1

    Psychotomus1 Banned

    Messages:
    411
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #7
    now nico did it the way i was talking about =)
     
    Psychotomus1, Feb 2, 2007 IP