Should i put more or less bots in my htaccess file?

Discussion in 'Security' started by TheSyndicate, Dec 28, 2011.

  1. #1
    Is there any bot or spider i should take out from this list. I put this list in my htaccess file

    SetEnvIfNoCase User-Agent "^BackDoorBot" bad_bot
    SetEnvIfNoCase user-agent "^BlackWidow" bad_bot 
    SetEnvIfNoCase User-Agent "^BotALot" bad_bot
    SetEnvIfNoCase User-Agent "^Cegbfeieh" bad_bot
    SetEnvIfNoCase user-agent "^ChinaClaw" bad_bot 
    SetEnvIfNoCase User-Agent "^CopyRightCheck" bad_bot
    SetEnvIfNoCase user-agent "^Custo" bad_bot 
    SetEnvIfNoCase user-agent "^DISCo" bad_bot 
    SetEnvIfNoCase user-agent "^Download\ Demon" bad_bot 
    SetEnvIfNoCase user-agent "^eCatch" bad_bot 
    SetEnvIfNoCase user-agent "^EirGrabber" bad_bot 
    SetEnvIfNoCase user-agent "^EmailSiphon" bad_bot 
    SetEnvIfNoCase user-agent "^EmailWolf" bad_bot 
    SetEnvIfNoCase user-agent "^Express\ WebPictures" bad_bot 
    SetEnvIfNoCase user-agent "^ExtractorPro" bad_bot 
    SetEnvIfNoCase user-agent "^EyeNetIE" bad_bot 
    SetEnvIfNoCase user-agent "^FlashGet" bad_bot 
    SetEnvIfNoCase user-agent "^GetRight" bad_bot 
    SetEnvIfNoCase user-agent "^GetWeb!" bad_bot 
    SetEnvIfNoCase user-agent "^Go!Zilla" bad_bot 
    SetEnvIfNoCase user-agent "^Go-Ahead-Got-It" bad_bot 
    SetEnvIfNoCase user-agent "^GrabNet" bad_bot 
    SetEnvIfNoCase user-agent "^Grafula" bad_bot 
    SetEnvIfNoCase user-agent "^HMView" bad_bot 
    SetEnvIfNoCase user-agent "HTTrack" bad_bot 
    SetEnvIfNoCase user-agent "^Image\ Stripper" bad_bot 
    SetEnvIfNoCase user-agent "Indy\ Library" [NC,OR] 
    SetEnvIfNoCase user-agent "^InterGET" bad_bot 
    SetEnvIfNoCase user-agent "^Internet\ Ninja" bad_bot 
    SetEnvIfNoCase user-agent "^JetCar" bad_bot 
    SetEnvIfNoCase user-agent "^JOC\ Web\ Spider" bad_bot 
    SetEnvIfNoCase user-agent "^larbin" bad_bot 
    SetEnvIfNoCase user-agent "^LeechFTP" bad_bot 
    SetEnvIfNoCase User-Agent "^libwww-perl" bad_bot
    SetEnvIfNoCase user-agent "^Mass\ Downloader" bad_bot 
    SetEnvIfNoCase user-agent "^MIDown\ tool" bad_bot 
    SetEnvIfNoCase user-agent "^Mister\ PiX" bad_bot 
    SetEnvIfNoCase user-agent "^Navroad" bad_bot 
    SetEnvIfNoCase user-agent "^NearSite" bad_bot 
    SetEnvIfNoCase user-agent "^NetAnts" bad_bot 
    SetEnvIfNoCase user-agent "^NetSpider" bad_bot 
    SetEnvIfNoCase user-agent "^Net\ Vampire" bad_bot 
    SetEnvIfNoCase user-agent "^NetZIP" bad_bot 
    SetEnvIfNoCase user-agent "^Octopus" bad_bot 
    SetEnvIfNoCase user-agent "^Offline\ Explorer" bad_bot 
    SetEnvIfNoCase user-agent "^Offline\ Navigator" bad_bot 
    SetEnvIfNoCase User-Agent "^Openfind" bad_bot
    SetEnvIfNoCase user-agent "^PageGrabber" bad_bot 
    SetEnvIfNoCase user-agent "^Papa\ Foto" bad_bot 
    SetEnvIfNoCase user-agent "^pavuk" bad_bot 
    SetEnvIfNoCase user-agent "^pcBrowser" bad_bot 
    SetEnvIfNoCase user-agent "^RealDownload" bad_bot 
    SetEnvIfNoCase user-agent "^ReGet" bad_bot 
    SetEnvIfNoCase user-agent "^SiteSnagger" bad_bot 
    SetEnvIfNoCase user-agent "^SmartDownload" bad_bot 
    SetEnvIfNoCase User-Agent "^SpankBot" bad_bot
    SetEnvIfNoCase user-agent "^SuperBot" bad_bot 
    SetEnvIfNoCase user-agent "^SuperHTTP" bad_bot 
    SetEnvIfNoCase user-agent "^Surfbot" bad_bot 
    SetEnvIfNoCase user-agent "^tAkeOut" bad_bot 
    SetEnvIfNoCase user-agent "^Teleport\ Pro" bad_bot 
    SetEnvIfNoCase User-Agent "^Titan" bad_bot
    SetEnvIfNoCase user-agent "^VoidEYE" bad_bot 
    SetEnvIfNoCase user-agent "^Web\ Image\ Collector" bad_bot 
    SetEnvIfNoCase user-agent "^Web\ Sucker" bad_bot 
    SetEnvIfNoCase user-agent "^WebAuto" bad_bot 
    SetEnvIfNoCase User-Agent "^WebBandit" bad_bot
    SetEnvIfNoCase user-agent "^WebCopier" bad_bot 
    SetEnvIfNoCase user-agent "^WebFetch" bad_bot 
    SetEnvIfNoCase user-agent "^WebGo\ IS" bad_bot 
    SetEnvIfNoCase user-agent "^WebLeacher" bad_bot 
    SetEnvIfNoCase user-agent "^WebReaper" bad_bot 
    SetEnvIfNoCase user-agent "^WebSauger" bad_bot 
    SetEnvIfNoCase user-agent "^Website\ eXtractor" bad_bot 
    SetEnvIfNoCase user-agent "^Website\ Quester" bad_bot 
    SetEnvIfNoCase User-Agent "^Webster Pro" bad_bot
    SetEnvIfNoCase user-agent "^WebStripper" bad_bot 
    SetEnvIfNoCase user-agent "^WebWhacker" bad_bot 
    SetEnvIfNoCase user-agent "^WebZIP" bad_bot 
    SetEnvIfNoCase user-agent "^Wget" bad_bot 
    SetEnvIfNoCase user-agent "^Widow" bad_bot 
    SetEnvIfNoCase user-agent "^WWWOFFLE" bad_bot 
    SetEnvIfNoCase user-agent "^Xaldon\ WebSpider" bad_bot 
    SetEnvIfNoCase user-agent "^Zeus" bad_bot 
    <FilesMatch "(.*)">
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </FilesMatch>
    PHP:
     
    TheSyndicate, Dec 28, 2011 IP
  2. kokopelli

    kokopelli Peon

    Messages:
    2,436
    Likes Received:
    29
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Here's a list I use:
    ^$
    8484 Boston Project
    AA
    Advanced Email Extractor*
    agdm79@mail.ru
    ahrefs
    AhrefsBot
    AhrefsBot/1.0
    aipbot
    Alexibot
    Amiga-AWeb/3.4
    Anarchie
    Anonymizer
    Art-Online
    ASPSeek
    asterias
    attach
    Attributor
    autoemailspider
    autoemailspider_bot
    backdoor
    BackDoorbot
    BackDoorBot
    BaiDuSpider
    Baiduspider
    Baiduspider-image
    Baiduspider-video
    Bandit
    BatchFTP
    BecomeBot
    Bigfoot
    Black Hole
    Black.Hole
    BlackWidow
    BlowFish
    Bork-edition
    bot.*
    BotALot
    botALot
    Bot mailto:craftbot@yahoo.com
    Brutus/AET
    BuiltBotTough
    BuiltbotTough
    Bullseye
    BunnySlippers
    Butch__2.1.1
    Cegbfeieh
    cgichk
    CheeseBot
    Cheesebot
    CherryPicker
    CherryPicker*
    CherryPicker/1.0_bot
    CherryPickerElite/1.0_bot
    CherryPickerSE/1.0_bot
    ChinaClaw
    Cityreview
    combine
    compatible ; MSIE
    concealed defense
    CopyGuard
    CopyRightCheck
    core-project/1.0
    cosmos
    crawl
    Crescent
    Crescent Internet ToolPak
    crescent internet toolpak
    Crescent Internet ToolPak_bot
    curl/
    Custo
    DataCha0s
    DataCha0s/2.0
    Deepnet Explorer
    desktopsmiley
    DigExt
    Digimarc WebReader
    DIIbot
    DISCo
    DittoSpyder
    DOC
    DoCoMo
    Dotbot
    Download\
    Download Demon
    Download Ninja
    Download Ninja 2.0
    DownloadsDemon
    DTS Agent
    DynaWeb
    eCatch
    ecollector
    EirGrabber
    EmailCollector
    EmailCollector/1.0_bot
    EmailSiphon
    EmailSiphon_bot
    EmailWolf
    EmailWolf 1.00_bot
    envolk
    EroCrawler
    Exabot
    Express\
    ExpresssWebPictures
    Express WebPictures
    ExtractorPro
    Extractorpro
    ExtractorPro_bot
    EyeNetIE
    .*fantomBrowser
    .*fantomCrew Browser
    fast
    Faxobot
    feedfinder
    Fetch
    Fetch API Request
    fiddler
    FlashGet
    flipboardbrowser
    FooBar/42
    Foobot
    Franklin Locator
    FrontPage
    GameBoy, Powered by Nintendo
    gamingharbor
    GetRight
    GetWeb!
    Gigabot/...
    Gigabot.*
    Go-Ahead-Got-It
    Go!Zilla
    GrabNet
    Grafula
    grub-client
    grub crawler
    Harvest
    heritrix
    hl_ftien_spider
    hloader
    HMView
    HTMLParser
    .*HTTP_GET_VARS
    http_get_vars
    httplib
    HTTrack
    humanlinks
    ia_archiver
    iblog
    ichiro
    Image\
    ImagesStripper
    ImagesSucker
    Image Stripper
    Image Sucker
    Indy\
    indy library
    Indy Library
    IndysLibrary
    InfoNaviRobot
    InfonaviRobot
    InterGET
    Internet\
    INTERNET EXPLOITER SUX
    Internet-exprorer
    Internet Ninja
    Internet Ninja x.0
    Jakarta
    Jakarta Commons
    Java
    Java/
    JBH Agent 2.0
    Jennybot
    JennyBot
    JetCar
    JOC\
    JOC Web Spider
    juicyaccess
    k1b compatible; rss 6.0; windows sot 5.1 security kol
    k2spider
    Kenjin Spider
    Kenjin.Spider
    Keyword.Density
    K-Meleon/0.8
    larbin
    Larbin
    LeechFTP
    LexiBot
    Lexibot
    libcurl
    libWeb/clsHTTP
    libwww
    libwww-perl
    LinkextractorPro
    linko
    LinkScan/8.1a.Unix
    Linkwalker
    LinkWalker
    lwp
    lwp-request
    LWP::Simple
    lwp-trivial
    Majestic.*
    Mass\
    Mass Downloader
    Mata.Hari
    Microsoft Data Access
    Microsoft Internet Explorer/5.0$
    ^Microsoft URL
    Microsoft.URL
    Microsoft URL Control
    Microsoft.URL.Control
    MIDown\
    MIDown tool
    MIIxpc
    Missigua
    Mister\
    Mister PiX
    Mister.PiX
    MJ12bot
    moget
    Morzilla
    Mosiac 1.*
    Mozilla/2
    Mozilla/3.Mozilla/2.01
    Mozilla/3.Mozilla/2.01$
    Mozilla/4.0 (compatible; MSIE 4.0; Windows NT; ....../1.0 )$
    Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Maxthon)$
    Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1$
    Mozilla/(4|5).0$
    ^Mozilla/5.0$
    Mozilla.*Nessus
    Mozilla.*NEWT
    MRSPUTNIK
    MSIECrawler
    MS Web Services Client Protocol
    nameprotect
    NASA Search
    NaverBot
    Navroad
    NearSite
    Net\
    NetAnts
    netforex
    NetMechanic
    NetSpider
    Net Vampire
    NetZIP
    NeuralBot/0.2
    NEWT ActiveX; Win32
    NG 1.x (Exalead)
    NICErsPRO
    NICErsPRO_bot
    .*Nikto
    Nokia-WAPToolkit.* googlebot.*googlebot
    NPbot
    NPBot
    Nutch
    Octopus
    Offline Explorer
    Offline.Explorer
    Offline Navigator
    Openbot
    Openfind
    Opera/6.01 (Windows ME; U) [en]
    Opera/9.0 (Windows NT 5.1; U; en)
    PageGrabber
    Pagerabber
    panscient
    Papa\
    Papa Foto
    pavuk
    pcBrowser
    PECL::HTTP
    picscout
    plaNETWORK
    pleasecrawl/1.
    PMAFind
    POE-Component-Client
    poe-component-client
    POE-Component-Client-HTTP
    POE::Component::Client::HTTP/
    Port Huron Labs
    Program Shareware
    Program Shareware 1
    Program Shareware 1.0.0
    ProPowerbot/2.14
    ProPowerBot/2.14
    ProWebWalker
    ProWebWalker
    psbot
    psbot/0.1
    PycURL
    PycURL/7.15.5$
    QihooBot
    QuepasaCreep
    QueryN.Metasearch
    RealDownload
    ReGet
    RepoMonkey
    RMA
    Rufus Web Miner
    .*SAFEXPLORER TL
    safexplorer tl
    Scooter
    searchbot admin@google.com
    searchestate
    security scan
    ^Shockwave Flash
    sitecheck.internetseer.com
    SiteSnagger
    SiteSnagger
    Slurp
    SlySearch
    SmartDownload
    SMBot
    Snapbot
    Snoopy
    Sogou
    Sogou.*
    sohu.*
    Sosospider
    Spankbot
    SpankBot
    spanner
    Sphider
    spider
    S.T.A.L.K.E.R.
    stress test
    SuperBot
    Superbot
    SuperHTTP
    Surfbot
    SurveyBot
    suzuran
    Szukacz/1.4
    tAkeOut
    Teleport
    TeleportPro
    teleport pro
    Teleport Pro
    Telesoft
    Telesoft*
    TestBED.6.3
    .*T H A T ' S  G O T T A  H U R T*
    The.Intraformant
    TheNomad
    .*THIS IS AN EXPLOIT*
    TightTwatbot
    TightTwatBot
    TinEye
    Titan
    TJvMultiHttpGrabber Component
    TMCrawler
    toCrawl/UrlDispatcher
    TrackBack/
    True_Robot
    turingos
    TurnitinBot
    Turnitinbot/1.5
    TurnitinBot/1.5
    twengabot
    TwengaBot
    Twiceler
    Twitturly
    UbiCrawler
    URLy.Warning
    User-Agent
    User-Agent: Mozilla/4.0
    vadixbot
    VB Project
    VCI
    Viewzi
    voideye
    VoidEYE
    voyager/1.0
    WebAuto
    WebBandit
    webbandit
    WebBandit
    webbandit
    WebBandit
    WebBandit/2.1_bot
    WebBandit/3.50_bot
    webbandit/4.00.0_bot
    WebCapture
    WebCopier
    Web Downloader
    WebEMailExtrac.*
    WebEMailExtrac*
    WebEMailExtractor
    WebEMailExtractor/1.0B_bot
    WebEnhancer
    WebFetch
    WebGo\
    WebGo IS
    Web Image Collector
    Web.Image.Collector
    WebLeacher
    WebmasterWorldForumbot
    WebmasterWorldForumBot
    WEBMOLE
    WebReaper
    .*WebRoot
    WebSauger
    Website eXtractor
    Website Quester
    Website.Quester
    Webster
    Webster.Pro
    Webstripper
    WebStripper
    Web Sucker
    WebVulnScan
    WebWhacker
    WebZIP
    WebZip
    West Wind Internet Protocols
    Wget
    wget
    Wget
    ^Wget
    Wget/1.8.2
    whatweb/
    Widow
    windows-update-agent
    Windows-Update-Agent
    WISEbot
    WordPress/2.0.2
    Wordpress Hash Grabber
    WWW-Collector-E
    WWW::Mechanize
    WWWOFFLE
    ^www.weblogs.com
    Xaldon\
    Xaldon WebSpider
    Xenu.*
    Xenu.*Link.*Sleuth.*
    xmlrpc exploit*
    XX
    Yandex
    Yandex.*
    YandexBlogs
    YandexBot
    yandexbot
    YandexMedia
    YebolBot
    Yeti
    Yodao.*
    Youdao.*
    YoudaoBot
    Zao
    Zealbot
    Zeus
    Zeus.*Webster
    Zeus .*Webster Pro*
    ZyBORG
    ZyBorg
    Code (markup):
     
    kokopelli, Dec 29, 2011 IP
  3. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #3
    Ok is there a way to call another file or you put all these in your htaccess file i am sure you update it from time to time?
     
    TheSyndicate, Dec 29, 2011 IP
  4. kokopelli

    kokopelli Peon

    Messages:
    2,436
    Likes Received:
    29
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I block them server-wide via mod_security using the method outlined here: http://www.puntapirata.com/ModSec-Rules.php

    You could also add the directives to the httpd.conf file to block them server-wide, but I had problems with that.

    I'm not fanatical about this list, and only add something if I notice a really active bad bod. Mod_security + CSF take care of banning them permanently, should they (or other bots) misbehave.

    There are several websites that list the latest bad bots, in case you want to check them from time-to-time and update your list.
     
    kokopelli, Dec 29, 2011 IP
  5. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #5
    Why do you have opera there?
     
    TheSyndicate, Dec 30, 2011 IP
  6. BigTim3

    BigTim3 Guest

    Messages:
    266
    Likes Received:
    1
    Best Answers:
    2
    Trophy Points:
    0
    #6
    does this slow websites down?
     
    BigTim3, Dec 30, 2011 IP
  7. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #7
    Since i put them in my htaccess file probably a bit but on the other hand loads of bots slow your website down. If you have a dedicated server it can put in the firewall i think.
     
    TheSyndicate, Dec 30, 2011 IP
  8. Orangu

    Orangu Active Member

    Messages:
    571
    Likes Received:
    21
    Best Answers:
    0
    Trophy Points:
    60
    #8
    Excessively large htacess files hurt server performance as htacess will have to be processed for every file request.
    As mentioned a few posts above adding deny rules to your firewall is a better solution performance wise, plus you don't need to create new htaccess rules for every new domain you want to host.
     
    Orangu, Dec 31, 2011 IP
  9. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #9
    I am on a shared server ...
     
    TheSyndicate, Dec 31, 2011 IP