almost a year ago I realized that 4 precise URLs have been loaded ten thousands of times always by same IP - once - then another IP ... by that time the fake traffic always came from one quiet unusual agent "Indy Library" hence I blocked all that traffic successfully for many months using .htaccess for agent "Indy Library" then about half a year ago I observed that the very same fake traffic also used another agent that was impossible to block as it was too common - IE ... I collected data from my previous "Indy Library" visitor and extracted IPs - a friend created a bash script to do the IP extraction and sort it a little extra research showed that ALL the verified fake traffic used to come from Chinese IPs and comparision with my access_log files also showed that I had no generic traffic from the same IPs. Hence my current step to successfully block such fake traffic is to use iptables together with the IPs extracted from error log and access log files. Why would you want to detect and terminate such fake traffic on your site / server ? besides the fact that fake traffic wastes server resources and thus slows down your generic value traffic there might be other reasons to consider example clicksor or similar ad networks who require a certain % of traffic to be USA or US/CA/UK origin now imagine you get 5-10% fake traffic from NON-US/CA/UK countries this fake traffic may make the difference of being accepted in a particular ad network or being rejected ! hence fake traffic may contribute to loss of income in addition to waste of resources/$ meanwhile after hundreds of days careful monitoring fake traffic and collecting data - the best option I have in this case is to use itables and either block an individual IP if that IP is re-used time and again OR to block entire C / B or even A nets of a particular subnet reoccurs numerous times and NO value traffic or a tiny % of generic traffic originates from that subnet. my current method is time consuming but efficient every few weeks or months I delete my iptables saving all blocked IP subnets and look at the error log again to see if same IPs still active so far YES - many of the IPs or IP blocks are reused again and again after a while. How can you detect at first if you have any considerable fake traffic on your server ? check your access-statistics exactly see if a few "regular" pages create an exceptional high traffic then start researching details started to recongize the fake traffic using 2 daily used tools 1. angolizer - a new version of webalizer - any other detailed stats tool may do as well - take TIME Bto study and understand the data collected 2. whoisonline a tool I used to monitore in real time ( a few seconds behind ) the actual trarffic displayed/grouped by IP on latter - simultaneous visit of 3-4 - always same URLs within a second was an instant trigger to show this traffic is fake and resulted in a more indepth research to accumulate as much data and experiences as needed to begin with search a collection of many months of your access_log files for keywords like Indy Library ( my primary fake source agent in earlier months 2006 ) or some of the IPs commonly used like 32123 84.113.192.124 -- China 24536 194.249.56.4 -- Slovenia 1864 211.96.23.90 -- China 1286 221.7.86.146 -- China 1096 218.69.155.6 -- China the first number on left side shows how many time a page request occured from that IP over the past 2+ months 100+ of OTHER IPs from China sector have been used and recorded so far. Only a very few NON-Chinese IPs have been recorded until now. any sharing of experience OR better ways to stop such fake traffic is welcome.
A. such fake traffic never comes from SE bots B. those who create such fake traffic never abide by ANY rules what so ever the purpose of their fake traffic is to create fake data for various reasons - for example to cheat or deceive investors or potential buyers of a web site or busiess C. robots.txt only is for honest people with honest intentions robots.txt has absolutely NO system POWER to enforce or deny anything at all !!! robots.txt is no security enforcement and no protection against fake traffic at all - robots.txt is a site owners published wish list ... to be complied with or not remains entirely at the good will of bot owners. fake traffic creating people certainly have neither good intentions nor honest intentions and never visit robots.txt at all. while a deny rule in htaccess is forcefully denying access to a client or IP and iptables are forcefully grounding any undesired traffic robots.txt is simply the public statment of your christmas wishes to SEARCH engine bots about a desired behavior.
my home in God - (see my site) here living in PH or sometimes also in KH always traveling remote islands or provinces when in PH then different places where ever best working conditions to make nature photography or write or work on servers