Only googlebot,slurp and msnbot should crawl my site.

Discussion in 'robots.txt' started by birdsq, Sep 10, 2007.

  1. #1
    Please help!!
    I need only google, yahoo and msn to crawl my site..
    How can i do that?
     
    birdsq, Sep 10, 2007 IP
  2. evera

    evera Peon

    Messages:
    283
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You can try this robots.txt, but most spiders don't care about robots.txt anyways, so maybe it won't help much. And its not complete some spiders are maybe missing:

    
    
    User-Agent: Teoma
    User-Agent: Ask Jeeves
    User-Agent: Jeeves
    User-agent: Seekbot/1.0 
    User-agent: seekbot
    User-agent: EchO!/2.0
    User-agent: echo!
    User-agent: convera
    User-agent: Convera Internet Spider V6.x
    User-agent: ConveraCrawler/0.2
    User-agent: ConveraCrawler/0.9d
    User-agent: ConveraMultiMediaCrawler/0.1
    User-Agent: Mozilla/2.0 (compatible; Ask Jeeves)  
    User-agent: aipbot
    User-agent: Aqua_Products 
    User-agent: asterias 
    User-agent: b2w/0.1 
    User-agent: BackDoorBot/1.0 
    User-agent: becomebot
    User-agent: BlowFish/1.0 
    User-agent: Bookmark search tool 
    User-agent: BotALot 
    User-agent: BotRightHere 
    User-agent: BuiltBotTough 
    User-agent: Bullseye/1.0 
    User-agent: BunnySlippers 
    User-agent: CheeseBot 
    User-agent: CherryPicker 
    User-agent: CherryPickerElite/1.0 
    User-agent: CherryPickerSE/1.0 
    User-agent: Copernic 
    User-agent: CopyRightCheck 
    User-agent: cosmos 
    User-agent: Crescent 
    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 
    User-agent: Curl 
    User-agent: DittoSpyder 
    User-agent: EmailCollector 
    User-agent: EmailSiphon 
    User-agent: EmailWolf 
    User-agent: EroCrawler 
    User-agent: ExtractorPro 
    User-agent: FairAd Client 
    User-agent: Fasterfox
    User-agent: Flaming AttackBot 
    User-agent: Foobot 
    User-agent: Gaisbot 
    User-agent: GetRight/4.2 
    User-agent: Harvest/1.5 
    User-agent: hloader 
    User-agent: httplib 
    User-agent: HTTrack 3.0 
    User-agent: humanlinks 
    User-agent: IconSurf
    User-agent: InfoNaviRobot 
    User-agent: Iron33/1.0.2 
    User-agent: JennyBot 
    User-agent: Kenjin Spider 
    User-agent: Keyword Density/0.9 
    User-agent: larbin 
    User-agent: LexiBot 
    User-agent: libWeb/clsHTTP 
    User-agent: LinkextractorPro 
    User-agent: LinkScan/8.1a Unix 
    User-agent: LinkWalker 
    User-agent: LNSpiderguy 
    User-agent: lwp-trivial 
    User-agent: lwp-trivial/1.34 
    User-agent: Mata Hari 
    User-agent: Microsoft URL Control 
    User-agent: Microsoft URL Control - 5.01.4511 
    User-agent: Microsoft URL Control - 6.00.8169 
    User-agent: MIIxpc 
    User-agent: MIIxpc/4.2 
    User-agent: Mister PiX 
    User-agent: moget 
    User-agent: moget/2.1 
    User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95) 
    User-agent: MSIECrawler 
    User-agent: NetAnts 
    User-agent: NetMechanic 
    User-agent: NICErsPRO 
    User-agent: Offline Explorer 
    User-agent: Openbot 
    User-agent: Openfind 
    User-agent: Openfind data gatherer 
    User-agent: Oracle Ultra Search 
    User-agent: PerMan 
    User-agent: ProPowerBot/2.14 
    User-agent: ProWebWalker 
    User-agent: psbot 
    User-agent: Python-urllib 
    User-agent: QueryN Metasearch 
    User-agent: Radiation Retriever 1.1 
    User-agent: RepoMonkey 
    User-agent: RepoMonkey Bait & Tackle/v1.01 
    User-agent: RMA 
    User-agent: searchpreview 
    User-agent: SiteSnagger 
    User-agent: seekbot 
    User-agent: Seekbot 
    User-agent: Seekbot/1.0 
    User-agent: SpankBot 
    User-agent: spanner 
    User-agent: SurveyBot
    User-agent: suzuran 
    User-agent: Szukacz/1.4 
    User-agent: Teleport 
    User-agent: TeleportPro 
    User-agent: Telesoft 
    User-agent: The Intraformant 
    User-agent: TheNomad 
    User-agent: TightTwatBot 
    User-agent: toCrawl/UrlDispatcher 
    User-agent: True_Robot 
    User-agent: True_Robot/1.0 
    User-agent: turingos 
    User-agent: TurnitinBot 
    User-agent: TurnitinBot/1.5 
    User-agent: URL Control 
    User-agent: URL_Spider_Pro 
    User-agent: URLy Warning 
    User-agent: VCI 
    User-agent: VCI WebViewer VCI WebViewer Win32 
    User-agent: Web Image Collector 
    User-agent: WebAuto 
    User-agent: WebBandit 
    User-agent: WebBandit/3.50 
    User-agent: WebCapture 2.0 
    User-agent: WebCopier 
    User-agent: WebCopier v.2.2 
    User-agent: WebCopier v3.2a 
    User-agent: WebEnhancer 
    User-agent: Web Reaper
    User-agent: WebSauger 
    User-agent: Website Quester 
    User-agent: Webster Pro 
    User-agent: WebStripper 
    User-agent: WebZip 
    User-agent: WebZip 
    User-agent: WebZip/4.0 
    User-agent: WebZIP/4.21 
    User-agent: WebZIP/5.0 
    User-agent: WebVulnCrawl
    User-agent: WebVulnScan
    User-agent: Wget 
    User-agent: wget 
    User-agent: Wget/1.5.3 
    User-agent: Wget/1.6 
    User-agent: WWW-Collector-E 
    User-agent: Xenu's 
    User-agent: Xenu's Link Sleuth 1.1c 
    User-agent: Zeus 
    User-agent: Zeus 32297 Webster Pro V2.9 Win32 
    User-agent: Zeus Link Scout 
    Disallow: /
    
     
    Code (markup):
     
    evera, Sep 14, 2007 IP
    birdsq likes this.
  3. BlogSalesman

    BlogSalesman Well-Known Member

    Messages:
    1,687
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    100
    #3
    Is there a reason you went with disallow, and didn't just do the 3 bots he wants and allow?
     
    BlogSalesman, Sep 18, 2007 IP
  4. evera

    evera Peon

    Messages:
    283
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #4
    The reason is the robots exclusion standard never defined allow, because the robots.txt is to exclude stuff.
     
    evera, Sep 18, 2007 IP
  5. MichelRobinson2

    MichelRobinson2 Guest

    Messages:
    118
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #5
    This code is very good but I know one think its working properly or not
     
    MichelRobinson2, Sep 27, 2007 IP
  6. harrysmith

    harrysmith Well-Known Member

    Messages:
    466
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    110
    #6
    You can block other spiders using robots.txt file.
    Also if you want to restrict certain pages you may use <meta name="robots" content="noindex,nofollow"> so that those pages will not get crawl/indexed ( if you are looking forthe same )
     
    harrysmith, Sep 28, 2007 IP