1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

List of Bad Bots to disallow?

Discussion in 'robots.txt' started by hmansfield, May 20, 2007.

  1. #1
    Can anyone point me in the direction of a list of bad bots to disallow in my robots.txt file, as well as, any other suggestions to cut down on bandwidth thieves.
    My bandwidth is through the roof.
    Also I have 4 IP addresses that are seem to be acting as an entry point, do I need to block them,and will that hurt legitimate traffic?
    <edited>
    Also is it safe to assume that unknown bots are, bad?
    How many bots is normal ?
    Right now I have 21 bots crawling and 3 are unidentified.
     
    hmansfield, May 20, 2007 IP
  2. cianuro

    cianuro Peon

    Messages:
    1,857
    Likes Received:
    106
    Best Answers:
    0
    Trophy Points:
    0
    #2
    cianuro, May 20, 2007 IP
  3. hmansfield

    hmansfield Guest

    Messages:
    7,904
    Likes Received:
    298
    Best Answers:
    0
    Trophy Points:
    280
    #3
    hmansfield, May 20, 2007 IP
  4. TatiAnA

    TatiAnA Active Member

    Messages:
    1,103
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    78
    #4
    Can you tell why we need to block these robots?
     
    TatiAnA, May 23, 2007 IP
  5. hmansfield

    hmansfield Guest

    Messages:
    7,904
    Likes Received:
    298
    Best Answers:
    0
    Trophy Points:
    280
    #5
    Well, I don't now, although I did add a few.
    I was having what I thought were an abnormal amount of hits from 3 IPs in particular.
    Turns out it is 3 IPs that are hosting 100's of other ips, and they are all normal.
    (Yahoo)
    But I did need the list, there were a few that I didn't know about.
    Still having some problems with traffic analysis, but, I'll figure it out.
     
    hmansfield, May 23, 2007 IP
  6. wing

    wing Active Member

    Messages:
    210
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    58
    #6
    The bad bots dont care about robots.txt. You need to shut them out with .htaccess

    Bad bots for example might be sitedownloading softwares, useless bots that only snatch your bandwidth or bots that simply scans your site for securityholes etc...
     
    wing, May 24, 2007 IP
    Toldo likes this.
  7. Qryztufre

    Qryztufre Prominent Member

    Messages:
    6,071
    Likes Received:
    491
    Best Answers:
    0
    Trophy Points:
    300
    #7
    Wing is correct, at least on the grander level of "bad bots" though checking out the robot lists will likely nix out a few of the not-so-bad ones that will actually bother with checking robots.txt.

    Keep an eye on your stats for IP's to ban as well, like individual blocks that are pulling mg's of content daily...
     
    Qryztufre, May 24, 2007 IP
  8. hmansfield

    hmansfield Guest

    Messages:
    7,904
    Likes Received:
    298
    Best Answers:
    0
    Trophy Points:
    280
    #8
    I do have 3 that are pulling a lot of bandwidth, but they are owned by Yahoo, and there are 100's of sub addresses on them from different Yahoo related isp's.
    I assume those are good?
     
    hmansfield, May 24, 2007 IP
  9. tinkerbox

    tinkerbox Peon

    Messages:
    55
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #9
    I have make a tools that can block bots very easy either with user agent or IP. It a tools added in NiceStat, able to track search engine bots to your site and ban bad bots. You can also check how frequent they index and what page they index.

    With Website Rules, you can manipulate visitor/bot's IP, User Agent, URL Referer, URL Visit, Country. If rules matched, you can ban/redirect/show message.

    Try demo here
     
    tinkerbox, May 25, 2007 IP
  10. hmansfield

    hmansfield Guest

    Messages:
    7,904
    Likes Received:
    298
    Best Answers:
    0
    Trophy Points:
    280
    #10
    I have that!
    216.109.121.44
    216.109.121.41
    216.109.121.42

    These 3 are Yahoo. They pull a lot of Bandwidth.
    Are they good or bad?
    I have never been able to get a concrete answer on that.
     
    hmansfield, May 28, 2007 IP
  11. Qryztufre

    Qryztufre Prominent Member

    Messages:
    6,071
    Likes Received:
    491
    Best Answers:
    0
    Trophy Points:
    300
    #11
    According to WHOIS
    OrgName:    HotJobs.com, Ltd. 
    OrgID:      HOTJOB-6
    Address:    701 First Ave
    City:       Sunnyvale
    StateProv:  CA
    PostalCode: 94089
    Country:    US
    
    NetRange:   216.109.112.0 - 216.109.127.255 
    CIDR:       216.109.112.0/20 
    NetName:    HOTJOBS
    NetHandle:  NET-216-109-112-0-1
    Parent:     NET-216-0-0-0-0
    NetType:    Direct Assignment
    NameServer: NS1.YAHOO.COM
    NameServer: NS2.YAHOO.COM
    NameServer: NS3.YAHOO.COM
    NameServer: NS4.YAHOO.COM
    NameServer: NS5.YAHOO.COM
    Comment:    Yahoo!
    RegDate:    2000-09-28
    Updated:    2002-12-11
    
    RTechHandle: JA256-ARIN
    RTechName:   Arnold, Jeffrey 
    RTechPhone:  +1-212-699-5334
    RTechEmail:  jba@hotj.net 
    
    OrgAbuseHandle: NETWO857-ARIN
    OrgAbuseName:   Network Abuse 
    OrgAbusePhone:  +1-408-349-3300
    OrgAbuseEmail:  network-abuse@cc.yahoo-inc.com
    
    OrgTechHandle: NA258-ARIN
    OrgTechName:   Netblock Admin 
    OrgTechPhone:  +1-408-349-3300
    OrgTechEmail:  netblockadmin@yahoo-inc.com
    
    # ARIN WHOIS database, last updated 2007-05-27 19:10
    # Enter ? for additional hints on searching ARIN's WHOIS database.
    PHP:
    I have no idea why hotjobs needs to pull content at all... so *shrug* that's your call if you wish to keep it or ban it.
     
    Qryztufre, May 28, 2007 IP
  12. casitecenter

    casitecenter Peon

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Fell free to use my robot list lots of bad robots and I will add even more to this list :D(I gave the address so you can get the last uptodated list)
    www. casitecenter .com/robots.txt
     
    casitecenter, May 15, 2008 IP
  13. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #13
    manish.chauhan, May 16, 2008 IP
  14. Trusted Writer

    Trusted Writer Banned

    Messages:
    1,370
    Likes Received:
    52
    Best Answers:
    0
    Trophy Points:
    160
    #14
    I found a list which includes many bad bots and some browser agent that not being bots are considered hacking tools, including Firefox... or at least is what it's said in which I got this list from, so I would advice double check its content and mix and match with others to find the perfect disallow list ;)

     
    Trusted Writer, May 17, 2008 IP
  15. Sergio Minozzi

    Sergio Minozzi Greenhorn

    Messages:
    2
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    21
    #15
    Hi,
    If you want block bad bots quickly (less than 2 minutes), just install the free plugin stop bad bots. (if your site is wordpress).
    To install, go to wordpress respository and look for stop bad bots plugin.
    • No robots.txt neither .htaccess file requiered
    Cheers,
    Bill
     
    Sergio Minozzi, Feb 22, 2017 IP