Can anyone point me in the direction of a list of bad bots to disallow in my robots.txt file, as well as, any other suggestions to cut down on bandwidth thieves. My bandwidth is through the roof. Also I have 4 IP addresses that are seem to be acting as an entry point, do I need to block them,and will that hurt legitimate traffic? <edited> Also is it safe to assume that unknown bots are, bad? How many bots is normal ? Right now I have 21 bots crawling and 3 are unidentified.
Well, I don't now, although I did add a few. I was having what I thought were an abnormal amount of hits from 3 IPs in particular. Turns out it is 3 IPs that are hosting 100's of other ips, and they are all normal. (Yahoo) But I did need the list, there were a few that I didn't know about. Still having some problems with traffic analysis, but, I'll figure it out.
The bad bots dont care about robots.txt. You need to shut them out with .htaccess Bad bots for example might be sitedownloading softwares, useless bots that only snatch your bandwidth or bots that simply scans your site for securityholes etc...
Wing is correct, at least on the grander level of "bad bots" though checking out the robot lists will likely nix out a few of the not-so-bad ones that will actually bother with checking robots.txt. Keep an eye on your stats for IP's to ban as well, like individual blocks that are pulling mg's of content daily...
I do have 3 that are pulling a lot of bandwidth, but they are owned by Yahoo, and there are 100's of sub addresses on them from different Yahoo related isp's. I assume those are good?
I have make a tools that can block bots very easy either with user agent or IP. It a tools added in NiceStat, able to track search engine bots to your site and ban bad bots. You can also check how frequent they index and what page they index. With Website Rules, you can manipulate visitor/bot's IP, User Agent, URL Referer, URL Visit, Country. If rules matched, you can ban/redirect/show message. Try demo here
I have that! 216.109.121.44 216.109.121.41 216.109.121.42 These 3 are Yahoo. They pull a lot of Bandwidth. Are they good or bad? I have never been able to get a concrete answer on that.
According to WHOIS OrgName: HotJobs.com, Ltd. OrgID: HOTJOB-6 Address: 701 First Ave City: Sunnyvale StateProv: CA PostalCode: 94089 Country: US NetRange: 216.109.112.0 - 216.109.127.255 CIDR: 216.109.112.0/20 NetName: HOTJOBS NetHandle: NET-216-109-112-0-1 Parent: NET-216-0-0-0-0 NetType: Direct Assignment NameServer: NS1.YAHOO.COM NameServer: NS2.YAHOO.COM NameServer: NS3.YAHOO.COM NameServer: NS4.YAHOO.COM NameServer: NS5.YAHOO.COM Comment: Yahoo! RegDate: 2000-09-28 Updated: 2002-12-11 RTechHandle: JA256-ARIN RTechName: Arnold, Jeffrey RTechPhone: +1-212-699-5334 RTechEmail: jba@hotj.net OrgAbuseHandle: NETWO857-ARIN OrgAbuseName: Network Abuse OrgAbusePhone: +1-408-349-3300 OrgAbuseEmail: network-abuse@cc.yahoo-inc.com OrgTechHandle: NA258-ARIN OrgTechName: Netblock Admin OrgTechPhone: +1-408-349-3300 OrgTechEmail: netblockadmin@yahoo-inc.com # ARIN WHOIS database, last updated 2007-05-27 19:10 # Enter ? for additional hints on searching ARIN's WHOIS database. PHP: I have no idea why hotjobs needs to pull content at all... so *shrug* that's your call if you wish to keep it or ban it.
Fell free to use my robot list lots of bad robots and I will add even more to this list (I gave the address so you can get the last uptodated list) www. casitecenter .com/robots.txt
I found a list which includes many bad bots and some browser agent that not being bots are considered hacking tools, including Firefox... or at least is what it's said in which I got this list from, so I would advice double check its content and mix and match with others to find the perfect disallow list
Hi, If you want block bad bots quickly (less than 2 minutes), just install the free plugin stop bad bots. (if your site is wordpress). To install, go to wordpress respository and look for stop bad bots plugin. No robots.txt neither .htaccess file requiered Cheers, Bill