my hosts server. Yes this month in 20 days I've had nearly 22,000 page requests from the server[b/] IP address that my website is hosted on. The reply from the company was: I've replied saying well it could be a BOT hosted on another site on the server. Am awaiting reply. What can I do, how can I investiage this more? Thanks Ian
Spoof, i agree. But, isn't there a loop somewhere in your scripts where you use PEAR, CURL, Cron some LYNX job, etc?
I only use some simple php include statements, I've never seen this before and it doesn't show on any other logs from previous months. The reply from the host: That IP is the same IP address that your sites are hosted on, so it could simply be one of your pages or scripts calling those files (ie. in a PHP include statement, etc.)
I am able to trace the BOt a little more through my logs and they all seem to end up having a go at my email forms and add certain text which is consistant in all submissions. I've added somecode that will redirect to 403.shtml but I rather redirect to another site, but who and I also don't want to cause others problems, so how can i do this? Thanks Ian
I've found some other things that are consistant is that the USER AGENT is blank, so if I was to ban anyone using my site without a user agent would that be viable? The log looks like this: IP here - - [20/May/2006:19:35:15 -0600] "GET /folder/lf2/image.jpg HTTP/1.0" 200 8240 "-" "-" Would this risk banning REAL visitors? How would I do this in the htaccess file? Thanks Ian
While i prepare you a better solution use thsi one. # Forbid if blank (or "-") Referer *and* UA RewriteCond %{HTTP_REFERER} ^-?$ RewriteCond %{HTTP_USER_AGENT} ^-?$ RewriteRule .* - [F] Code (markup):
Now, and for a good reason.. i wont explain all that is written here.. but i am sure you will undertsnad it all if you open your eyes wider ;-) This should help. RewriteEngine On # Hotlink block for jpg/jpeg/gif/png/bmp RewriteCond %{HTTP_REFERER}! ^http://YOURDOMAIN.COM/.*$ [NC] RewriteCond %{HTTP_REFERER}! ^http://(.*).YOURDOMAIN.COM/.*$ [NC] RewriteRule .*\.(jpg¦jpeg¦gif¦png¦bmp)$ http://www.YOURDOMAIN.COM/images/no-photo.gif [R,NC] # Forbid requests for exploits & annoyances # Bad requests RewriteCond %{REQUEST_METHOD}! ^(GET¦HEAD¦POST) [NC,OR] # CodeRed RewriteCond %{REQUEST_URI} ^/default\.(ida¦idq) [NC,OR] RewriteCond %{REQUEST_URI} ^/.*\.printer$ [NC,OR] # Email exploits RewriteCond %{REQUEST_URI} (mail.?form¦form¦form.?mail¦mail¦mailto)\.(cgi¦exe¦pl)$ [NC,OR] # MSOffice exploits RewriteCond %{REQUEST_URI} ^/(MSOffice¦_vti) [NC,OR] # Nimda RewriteCond %{REQUEST_URI} ^/(admin¦cmd¦httpodbc¦nsiislog¦root¦shell)\.(dll¦exe) [NC,OR] # Unknown/mixed RewriteCond %{REQUEST_URI} ^/(cltreq.asp¦owssrv.dll) [NC,OR] RewriteCond %{REQUEST_URI} ^/missing.html [NC,OR] RewriteCond %{REQUEST_URI} ^/(cgi\-bin/¦cgi\-local/)\FormMail.(cgi¦php¦pl) [NC,OR] RewriteCond %{REQUEST_URI} ^/(cgi\-bin/¦cgi\-local/)\FormMail [NC,OR] RewriteCond %{REQUEST_URI} ^/FormMail.(cgi¦php¦pl) [NC,OR] RewriteCond %{REQUEST_URI} ^/FormMail [NC,OR] RewriteCond %{REQUEST_URI} ^/sumthin [NC,OR] ReWriteCond %{REQUEST_URI} ^/default.htm [NC] RewriteRule .* - [F] # Various # RewriteCond %{REQUEST_URI} ^/(bin/¦cgi/¦cgi\-local/¦sumthin) [NC,OR] # RewriteCond %{THE_REQUEST} ^GET\ http [NC,OR] # RewriteCond %{REQUEST_URI} /sensepost\.exe [NC] # RewriteRule .* - [F] # Forbid if blank (or "-") Referer *and* UA RewriteCond %{HTTP_REFERER} ^-?$ RewriteCond %{HTTP_USER_AGENT} ^-?$ RewriteRule .* - [F] # FavIcon.ico # RedirectMatch permanent .*/favicon\.ico$ http://www.YOURDOMAIN.COM/favicon.ico Code (markup): Good luck, Ruslan
Again.. This code below shall work as a 'panadol' for your headache.. it will just do a smart logfile splitting for SPAM bots, worms, and others that you want to disable logginf for.. If you spend an hour bringing all the codes i just posed into one httpd.cof of .htaccess, i can sware that i myself can't hack you Good luck agian. # splittable logs, we do a lot of log files of every different interest ## Start with things we do not want to log in our files... SetEnvIf Request_URI "^/MSADC/(.*)$" dontlog SetEnvIf Request_URI "^/msadc/(.*)$" dontlog SetEnvIf Request_URI "^/scripts/(.*)$" dontlog SetEnvIf Request_URI "^/c/(.*)$" dontlog SetEnvIf Request_URI "^/d/(.*)$" dontlog SetEnvIf Request_URI "^/_vti_bin/(.*)$" dontlog SetEnvIf Request_URI "^/_mem_bin/(.*)$" dontlog SetEnvIf Request_URI "^/default.ida(.*)$" dontlog SetEnvIf Request_URI "^/NULL.printer(.*)$" dontlog SetEnvIf Request_URI "^/nsiislog.dll(.*)$" dontlog SetEnvIf Request_URI "^/Admin.dll(.*)$" dontlog SetEnvIf Request_URI "^/root.exe(.*)$" dontlog SetEnvIf Request_URI "^/cmd.exe(.*)$" dontlog SetEnvIf Request_URI "^/favicon.ico(.*)$" dontlog SetEnvIf Request_URI "^/cmd.exe(.*)$" dontlog SetEnvIf Request_URI "^/(.*).gif" dontlog SetEnvIf Request_URI "^/(.*).jpg" dontlog SetEnvIf Request_URI "^/(.*).png" dontlog SetEnvIfNoCase Referer http(s?)://(www?)\.(.*)\.(localhost)/(.*)" dontlog SetEnvIfNoCase Referer "^XXXX:\.(.*)" dontlog ## mod_logio logging if available <IfModule mod_logio.c> LogFormat "%{Host}i %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O" combinedio CustomLog "/PATH/TO/SAVE/LOGS/apache_access_logIO.log" combinedio env=!dontlog </IfModule> ## mod_deflate if available <IfModule mod_deflate.c> DeflateFilterNote Input instream DeflateFilterNote Output outstream DeflateFilterNote Ratio ratio LogFormat "%{Host}i - %r - %{outstream}n/%{instream}n - (%{ratio}n%%)" deflate CustomLog "/PATH/TO/SAVE/LOGS/apache_deflate.log" deflate env=!dontlog </IfModule> ## traditional logging <IfModule mod_log_config.c> # Define Different Log Formats And File Destinations LogFormat "%{Host}i %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined #CustomLog "/PATH/TO/SAVE/LOGS/apache_access.log" combined env=!dontlog </IfModule> ## Some more logging and reporting, fanatic.. <IfModule mod_setenvif.c> ## Log all these robots SetEnvIf Request_URI "^/robots\.txt$" robots CustomLog "/PATH/TO/SAVE/LOGS/apache_robots.log" combined env=robots ## From Request That Get Here, We Do More Filtering, There Are The (non-contained) in dontlog {logged hits} SetEnvIf dontlog 1 !robots BrowserMatch "Mozilla/2" nokeepalive BrowserMatch "MSIE 4\.0b2;" nokeepalive downgrade-1.0 force-response-1.0 BrowserMatch "RealPlayer 4\.0" force-response-1.0 BrowserMatch "Java/1\.0" force-response-1.0 BrowserMatch "JDK/1\.0" force-response-1.0 BrowserMatch "Microsoft Data Access Internet Publishing Provider" redirect-carefully BrowserMatch "^WebDrive" redirect-carefully BrowserMatch "^WebDAVFS/1.[012]" redirect-carefully BrowserMatch "^gnome-vfs" redirect-carefully BrowserMatch "^fastlwspider" spambot=true BrowserMatch "^findEmail" spambot=true BrowserMatch "^SurfWalker" spambot=true BrowserMatch "^Telesoft" spambot=true BrowserMatch "^Zeus.*Webster Pro" spambot=true BrowserMatch "^[DFPS]Surf\d\d[a-z]" spambot=true BrowserMatch "^[DFPS]Browse \d\.\d[a-z]" spambot=true BrowserMatch "^EmailSiphon" spambot=true BrowserMatch "^EmailWolf" spambot=true BrowserMatch "^ExtractorPro" spambot=true BrowserMatch "^CherryPicker" spambot=true BrowserMatch "^NICErsPRO" spambot=true BrowserMatch "^EmailCollector" spambot=true BrowserMatch "^Mail" spambot=true SetEnvIfNoCase spambot "true" spambots CustomLog "/PATH/TO/SAVE/LOGS/apache_spambots.log" combined env=spambots <IfModule mod_deflate.c> BrowserMatch ^Mozilla/4 gzip-only-text/html BrowserMatch ^Mozilla/4\.0[678] no-gzip BrowserMatch \bMSIE !no-gzip !gzip-only-text/html </IfModule> </IfModule> Code (markup): Comments on the codes are welcome.
Thanks Ruslan, I feel as if I'm going round in circles. I've spent the last few days checkings logs against logs and against other logs and from my other sites running the same software. It now looks like some code one this one site is causing the blank agent request, no idea why, as the only difference is this is located in a folder and the other two in the root. I'm still getting the visits from the bots and haven't tried the code yet as I may introduce some CAPTCHA code for email forms. Ian