1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

List of Bad Bots to disallow?

Discussion in 'robots.txt' started by hmansfield, May 20, 2007.

  1. #1
    Can anyone point me in the direction of a list of bad bots to disallow in my robots.txt file, as well as, any other suggestions to cut down on bandwidth thieves.
    My bandwidth is through the roof.
    Also I have 4 IP addresses that are seem to be acting as an entry point, do I need to block them,and will that hurt legitimate traffic?
    <edited>
    Also is it safe to assume that unknown bots are, bad?
    How many bots is normal ?
    Right now I have 21 bots crawling and 3 are unidentified.
    hmansfield, May 20, 2007 IP
  2. cianuro

    cianuro Peon

    Messages:
    1,859
    Likes Received:
    106
    Best Answers:
    0
    Trophy Points:
    0
    #2
    cianuro, May 20, 2007 IP
  3. hmansfield

    hmansfield Notable Member

    Messages:
    7,915
    Likes Received:
    293
    Best Answers:
    0
    Trophy Points:
    230
    #3
    hmansfield, May 20, 2007 IP
  4. TatiAnA

    TatiAnA Active Member

    Messages:
    1,108
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    78
    #4
    Can you tell why we need to block these robots?
    TatiAnA, May 23, 2007 IP
  5. hmansfield

    hmansfield Notable Member

    Messages:
    7,915
    Likes Received:
    293
    Best Answers:
    0
    Trophy Points:
    230
    #5
    Well, I don't now, although I did add a few.
    I was having what I thought were an abnormal amount of hits from 3 IPs in particular.
    Turns out it is 3 IPs that are hosting 100's of other ips, and they are all normal.
    (Yahoo)
    But I did need the list, there were a few that I didn't know about.
    Still having some problems with traffic analysis, but, I'll figure it out.
    hmansfield, May 23, 2007 IP
  6. wing

    wing Active Member

    Messages:
    210
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    58
    #6
    The bad bots dont care about robots.txt. You need to shut them out with .htaccess

    Bad bots for example might be sitedownloading softwares, useless bots that only snatch your bandwidth or bots that simply scans your site for securityholes etc...
    wing, May 24, 2007 IP
    Toldo likes this.
  7. Qryztufre

    Qryztufre Prominent Member

    Messages:
    6,073
    Likes Received:
    491
    Best Answers:
    0
    Trophy Points:
    300
    #7
    Wing is correct, at least on the grander level of "bad bots" though checking out the robot lists will likely nix out a few of the not-so-bad ones that will actually bother with checking robots.txt.

    Keep an eye on your stats for IP's to ban as well, like individual blocks that are pulling mg's of content daily...
    Qryztufre, May 24, 2007 IP
  8. hmansfield

    hmansfield Notable Member

    Messages:
    7,915
    Likes Received:
    293
    Best Answers:
    0
    Trophy Points:
    230
    #8
    I do have 3 that are pulling a lot of bandwidth, but they are owned by Yahoo, and there are 100's of sub addresses on them from different Yahoo related isp's.
    I assume those are good?
    hmansfield, May 24, 2007 IP
  9. tinkerbox

    tinkerbox Peon

    Messages:
    55
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #9
    I have make a tools that can block bots very easy either with user agent or IP. It a tools added in NiceStat, able to track search engine bots to your site and ban bad bots. You can also check how frequent they index and what page they index.

    With Website Rules, you can manipulate visitor/bot's IP, User Agent, URL Referer, URL Visit, Country. If rules matched, you can ban/redirect/show message.

    Try demo here
    tinkerbox, May 25, 2007 IP
  10. hmansfield

    hmansfield Notable Member

    Messages:
    7,915
    Likes Received:
    293
    Best Answers:
    0
    Trophy Points:
    230
    #10
    I have that!
    216.109.121.44
    216.109.121.41
    216.109.121.42

    These 3 are Yahoo. They pull a lot of Bandwidth.
    Are they good or bad?
    I have never been able to get a concrete answer on that.
    hmansfield, May 28, 2007 IP
  11. Qryztufre

    Qryztufre Prominent Member

    Messages:
    6,073
    Likes Received:
    491
    Best Answers:
    0
    Trophy Points:
    300
    #11
    According to WHOIS
    PHP:
    1. OrgName:    HotJobs.com, Ltd.
    2. OrgID:      HOTJOB-6
    3. Address:    701 First Ave
    4. City:       Sunnyvale
    5. StateProv:  CA
    6. PostalCode: 94089
    7. Country:    US
    8.  
    9. NetRange:   216.109.112.0 - 216.109.127.255
    10. CIDR:       216.109.112.0/20
    11. NetName:    HOTJOBS
    12. NetHandle:  NET-216-109-112-0-1
    13. Parent:     NET-216-0-0-0-0
    14. NetType:    Direct Assignment
    15. NameServer: NS1.YAHOO.COM
    16. NameServer: NS2.YAHOO.COM
    17. NameServer: NS3.YAHOO.COM
    18. NameServer: NS4.YAHOO.COM
    19. NameServer: NS5.YAHOO.COM
    20. Comment:    Yahoo!
    21. RegDate:    2000-09-28
    22. Updated:    2002-12-11
    23.  
    24. RTechHandle: JA256-ARIN
    25. RTechName:   Arnold, Jeffrey
    26. RTechPhone:  +1-212-699-5334
    27. RTechEmail:  jba@hotj.net
    28.  
    29. OrgAbuseHandle: NETWO857-ARIN
    30. OrgAbuseName:   Network Abuse
    31. OrgAbusePhone:  +1-408-349-3300
    32. OrgAbuseEmail:  network-abuse@cc.yahoo-inc.com
    33.  
    34. OrgTechHandle: NA258-ARIN
    35. OrgTechName:   Netblock Admin
    36. OrgTechPhone:  +1-408-349-3300
    37. OrgTechEmail:  netblockadmin@yahoo-inc.com
    38.  
    39. # ARIN WHOIS database, last updated 2007-05-27 19:10
    40. # Enter ? for additional hints on searching ARIN's WHOIS database.
    I have no idea why hotjobs needs to pull content at all... so *shrug* that's your call if you wish to keep it or ban it.
    Qryztufre, May 28, 2007 IP
  12. casitecenter

    casitecenter Peon

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Fell free to use my robot list lots of bad robots and I will add even more to this list :D(I gave the address so you can get the last uptodated list)
    www. casitecenter .com/robots.txt
    casitecenter, May 15, 2008 IP
  13. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,684
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #13
    manish.chauhan, May 16, 2008 IP
  14. Trusted Writer

    Trusted Writer Banned

    Messages:
    1,371
    Likes Received:
    51
    Best Answers:
    0
    Trophy Points:
    160
    #14
    I found a list which includes many bad bots and some browser agent that not being bots are considered hacking tools, including Firefox... or at least is what it's said in which I got this list from, so I would advice double check its content and mix and match with others to find the perfect disallow list ;)

    Trusted Writer, May 17, 2008 IP