Is This A Bad Bot?

Discussion in 'robots.txt' started by R0ck$tAr, Apr 12, 2006.

  1. #1
    Found this in my visitor log is it a bad bot?

    Host: 218.166.58.59
    Agent: MVAClient
     
    R0ck$tAr, Apr 12, 2006 IP
  2. exam

    exam Peon

    Messages:
    2,434
    Likes Received:
    120
    Best Answers:
    0
    Trophy Points:
    0
    #2
    exam, Apr 12, 2006 IP
  3. classifieds

    classifieds Sopchoppy Flash

    Messages:
    825
    Likes Received:
    51
    Best Answers:
    0
    Trophy Points:
    150
    #3
    Any bot that's crawling your pages that does not send you traffic in return for the use of the content and your server resources is a bad bot.

    Block the UA and the IP range(s).
     
    classifieds, Apr 12, 2006 IP
  4. exam

    exam Peon

    Messages:
    2,434
    Likes Received:
    120
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Except maybe archive.org
     
    exam, Apr 12, 2006 IP
  5. classifieds

    classifieds Sopchoppy Flash

    Messages:
    825
    Likes Received:
    51
    Best Answers:
    0
    Trophy Points:
    150
    #5
    I use archive.org to look at other sites but I block their bot from most of mine.

    Last year it went on a rampage with one site and indexed about 5k pages in 24 hours so they were graduated with honors and placed on my bad bot list.
     
    classifieds, Apr 12, 2006 IP
  6. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #6
    I keep trying to tell people this: If it's truly a bad bot, it isn't going to pay any attention to your little robots.txt file - you have to ban it in .htaccess.
     
    minstrel, Apr 12, 2006 IP
  7. exam

    exam Peon

    Messages:
    2,434
    Likes Received:
    120
    Best Answers:
    0
    Trophy Points:
    0
    #7
    You could always spank it.
     
    exam, Apr 12, 2006 IP
    minstrel likes this.
  8. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #8
    ...or you could spank it, yes.
     
    minstrel, Apr 12, 2006 IP
    exam likes this.
  9. classifieds

    classifieds Sopchoppy Flash

    Messages:
    825
    Likes Received:
    51
    Best Answers:
    0
    Trophy Points:
    150
    #9
    Who said anything about robots.txt?

    .htaccess is ok after you’ve identified the bot or IP but the only way to cut them off in real time is via session monitoring, spider traps and captchas.

    And I vote for spanking the miscreants with 2x4 ;)


    Spanking the miscreants owners of all the scrapers & harvesters
     
    classifieds, Apr 13, 2006 IP
  10. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #10
    This thread is in the robots.txt forum:

     
    minstrel, Apr 13, 2006 IP
  11. classifieds

    classifieds Sopchoppy Flash

    Messages:
    825
    Likes Received:
    51
    Best Answers:
    0
    Trophy Points:
    150
    #11
    My mistake. I linked to the post from the keyword tracker tool and didn't notice which forum it was in. . :eek:

    -jay
     
    classifieds, Apr 13, 2006 IP
  12. just-4-teens

    just-4-teens Peon

    Messages:
    3,967
    Likes Received:
    168
    Best Answers:
    0
    Trophy Points:
    0
    #12
    to block via .htaccess simply ban the user_agent like so

    
    SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot
    SetEnvIfNoCase User-Agent "^EmailWolf" bad_bot
    SetEnvIfNoCase User-Agent "^ExtractorPro" bad_bot
    SetEnvIfNoCase User-Agent "^CherryPicker" bad_bot
    SetEnvIfNoCase User-Agent "^NICErsPRO" bad_bot
    SetEnvIfNoCase User-Agent "^Teleport" bad_bot
    SetEnvIfNoCase User-Agent "^EmailCollector" bad_bot
    SetEnvIfNoCase User-Agent "^SickleBot" bad_bot
    
    <Limit GET POST>
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </Limit>
    
    Code (markup):
     
    just-4-teens, Apr 13, 2006 IP
  13. exam

    exam Peon

    Messages:
    2,434
    Likes Received:
    120
    Best Answers:
    0
    Trophy Points:
    0
    #13
    And if they're truly bad bots, they'll masquerade as some other UA, creating the need for IP banning in your .htaccess.
     
    exam, Apr 13, 2006 IP
  14. classifieds

    classifieds Sopchoppy Flash

    Messages:
    825
    Likes Received:
    51
    Best Answers:
    0
    Trophy Points:
    150
    #14
    And the really really bad bots will rotate their UAs and IP addresses, support javascript and browse your site like a user.

    I block 100 or so IP ranges a day and the volume keeps growing.

    It's an epidemic caused by Adsense and MFA SE spam sites.

    This month for my sites the worst offending countries are Peru, China, Romania, and the Netherlands. Last month the list was different and I'm sure next month will be to.
     
    classifieds, Apr 13, 2006 IP
  15. just-4-teens

    just-4-teens Peon

    Messages:
    3,967
    Likes Received:
    168
    Best Answers:
    0
    Trophy Points:
    0
    #15
    yes but i had 40 sicklebots on my site yesterday (i accidently temp allowed them access) and each one had a completly different IP.
     
    just-4-teens, Apr 13, 2006 IP
  16. exam

    exam Peon

    Messages:
    2,434
    Likes Received:
    120
    Best Answers:
    0
    Trophy Points:
    0
    #16
    As a side note, bots can be a bother but I find they are a very small percentage of my total bandwidth/visitors, so I don't get too uptight about it.
     
    exam, Apr 13, 2006 IP