1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

List of bad bots ?

Discussion in 'Site & Server Administration' started by visioninfotech, May 20, 2008.

  1. #1
    Anyone knows the list of bad bots ?

    There are many crawlers which are causing overloading on our website.

    I tried installing anticrawl, but it blocks many useful bots like Googlebot's image and mobile bot.

    Any idea of some new list of bots which one can put in htaccess.

    Thanks
    Gurpreet
     
    visioninfotech, May 20, 2008 IP
  2. relixx

    relixx Active Member

    Messages:
    946
    Likes Received:
    54
    Best Answers:
    0
    Trophy Points:
    70
    #2
    relixx, May 20, 2008 IP
  3. alikuru

    alikuru Peon

    Messages:
    44
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Here is a small list which I've created after regularly investigating my access.log's;
    ^$
    libwww-perl
    charlotte
    Metalogger
    irlbot
    lmcrawler
    java
    libwww
    lwp::simple
    larbin
    mothra
    netscan
    snapbot
    sna-0
    Microsoft URL Control
    Missigua Locator
    PEAR HTTP_Request class
    Wells Search II
    psycheclone
    Python-urllib
    WEP Search
    FDW
    Code (markup):
    It is generally useless to ban bots via their user-agent strings, since it can be configured to anything (including legitimate ones like IE or Mozilla) for most bots. But, it still can be useful if you use at least the first two user-agents on my list, because in most cases script kiddies generally either leave their bots user-agent string intact or just deletes the string ;)
     
    alikuru, May 20, 2008 IP
  4. Trusted Writer

    Trusted Writer Banned

    Messages:
    1,370
    Likes Received:
    52
    Best Answers:
    0
    Trophy Points:
    160
    #4
    I have this list that mixed and matched with the above can improve your robots.txt
    
    User-agent: Googlebot-Image
    Disallow: / 
    
    User-agent: BotRightHere 
    User-agent: larbin 
    User-agent: b2w/0.1 
    User-agent: Copernic 
    User-agent: psbot 
    User-agent: Python-urllib 
    User-agent: NetMechanic 
    User-agent: URL_Spider_Pro 
    User-agent: CherryPicker 
    User-agent: EmailCollector 
    User-agent: EmailSiphon 
    User-agent: WebBandit 
    User-agent: EmailWolf 
    User-agent: ExtractorPro 
    User-agent: CopyRightCheck 
    User-agent: Crescent 
    User-agent: SiteSnagger 
    User-agent: ProWebWalker 
    User-agent: CheeseBot 
    User-agent: LNSpiderguy 
    User-agent: Alexibot 
    User-agent: Teleport 
    User-agent: TeleportPro 
    User-agent: MIIxpc 
    User-agent: Telesoft 
    User-agent: Website Quester 
    User-agent: WebZip 
    User-agent: moget/2.1 
    User-agent: WebZip/4.0 
    User-agent: WebStripper 
    User-agent: WebSauger 
    User-agent: WebCopier 
    User-agent: NetAnts 
    User-agent: Mister PiX 
    User-agent: WebAuto 
    User-agent: TheNomad 
    User-agent: WWW-Collector-E 
    User-agent: RMA 
    User-agent: libWeb/clsHTTP 
    User-agent: asterias 
    User-agent: httplib 
    User-agent: turingos 
    User-agent: spanner 
    User-agent: InfoNaviRobot 
    User-agent: Harvest/1.5 
    User-agent: Bullseye/1.0 
    User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95) 
    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 
    User-agent: CherryPickerSE/1.0 
    User-agent: CherryPickerElite/1.0 
    User-agent: WebBandit/3.50 
    User-agent: NICErsPRO 
    User-agent: DittoSpyder 
    User-agent: Foobot 
    User-agent: SpankBot 
    User-agent: BotALot 
    User-agent: lwp-trivial/1.34 
    User-agent: lwp-trivial 
    User-agent: BunnySlippers 
    User-agent: URLy Warning 
    User-agent: Wget/1.6 
    User-agent: Wget/1.5.3 
    User-agent: Wget 
    User-agent: LinkWalker 
    User-agent: cosmos 
    User-agent: moget 
    User-agent: hloader 
    User-agent: humanlinks 
    User-agent: LinkextractorPro 
    User-agent: Offline Explorer 
    User-agent: Mata Hari 
    User-agent: LexiBot 
    User-agent: Web Image Collector 
    User-agent: The Intraformant 
    User-agent: True_Robot/1.0 
    User-agent: True_Robot 
    User-agent: BlowFish/1.0 
    User-agent: JennyBot 
    User-agent: MIIxpc/4.2 
    User-agent: BuiltBotTough 
    User-agent: ProPowerBot/2.14 
    User-agent: BackDoorBot/1.0 
    User-agent: toCrawl/UrlDispatcher 
    User-agent: suzuran 
    User-agent: TightTwatBot 
    User-agent: VCI WebViewer VCI WebViewer Win32 
    User-agent: VCI 
    User-agent: Szukacz/1.4 
    User-agent: Openfind data gatherer 
    User-agent: Openfind 
    User-agent: Xenu's Link Sleuth 1.1c 
    User-agent: Xenu's 
    User-agent: Zeus 
    User-agent: RepoMonkey Bait & Tackle/v1.01 
    User-agent: RepoMonkey 
    User-agent: Openbot 
    User-agent: URL Control 
    User-agent: Zeus Link Scout 
    User-agent: Zeus 32297 Webster Pro V2.9 Win32 
    User-agent: Webster Pro 
    User-agent: EroCrawler 
    User-agent: LinkScan/8.1a Unix 
    User-agent: Keyword Density/0.9 
    User-agent: Kenjin Spider 
    User-agent: Iron33/1.0.2 
    User-agent: Bookmark search tool 
    User-agent: GetRight/4.2 
    User-agent: FairAd Client 
    User-agent: Gaisbot 
    User-agent: Aqua_Products 
    User-agent: Radiation Retriever 1.1 
    User-agent: Flaming AttackBot 
    User-agent: Curl 
    User-agent: Web Reaper
    User-agent: Firefox
    User-agent: Opera
    User-agent: Netscape
    User-agent: WebVulnCrawl
    User-agent: WebVulnScan
    Disallow: /
    
    User-agent: *
    Disallow: 
    
    
    Code (markup):
    A few of them are web browsers but, according to the site I get this list from, such software contributes with hacking due to their plugins, thus require blocking. I'm not ir that's okay though.
     
    Trusted Writer, May 20, 2008 IP
  5. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #5
    You can also find some bad robots list here...:)
     
    manish.chauhan, May 20, 2008 IP
  6. relixx

    relixx Active Member

    Messages:
    946
    Likes Received:
    54
    Best Answers:
    0
    Trophy Points:
    70
    #6
    If you read the thread, you'd of noticed I already mentioned your list, lol :p
     
    relixx, May 21, 2008 IP
    manish.chauhan likes this.
  7. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #7
    Thanks friend to mark my website...I didn't watch at your post earlier...
    Thanks again...
     
    manish.chauhan, May 21, 2008 IP
  8. visioninfotech

    visioninfotech Banned

    Messages:
    739
    Likes Received:
    33
    Best Answers:
    0
    Trophy Points:
    0
    #8
    trusted,

    you have put firefox and opera in the list ?

    User-agent: Firefox
    User-agent: Opera
     
    visioninfotech, May 21, 2008 IP
  9. qforquack

    qforquack Banned

    Messages:
    46
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Why are they considered bad?
     
    qforquack, May 21, 2008 IP
  10. alikuru

    alikuru Peon

    Messages:
    44
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Because they are usually used for harvesting email addresses from websites (obviously, for creating email lists for spamming) or creating spam comments/posts/edits to the sites which has editable sections open to public. Also, in some cases, they are used for hacking attempts.
     
    alikuru, May 24, 2008 IP