1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Only googlebot,slurp and msnbot should crawl my site.

Discussion in 'robots.txt' started by birdsq, Sep 10, 2007.

  1. #1
    Please help!!
    I need only google, yahoo and msn to crawl my site..
    How can i do that?
     
    birdsq, Sep 10, 2007 IP
  2. evera

    evera Peon

    Messages:
    284
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You can try this robots.txt, but most spiders don't care about robots.txt anyways, so maybe it won't help much. And its not complete some spiders are maybe missing:

    
    
    User-Agent: Teoma
    User-Agent: Ask Jeeves
    User-Agent: Jeeves
    User-agent: Seekbot/1.0 
    User-agent: seekbot
    User-agent: EchO!/2.0
    User-agent: echo!
    User-agent: convera
    User-agent: Convera Internet Spider V6.x
    User-agent: ConveraCrawler/0.2
    User-agent: ConveraCrawler/0.9d
    User-agent: ConveraMultiMediaCrawler/0.1
    User-Agent: Mozilla/2.0 (compatible; Ask Jeeves)  
    User-agent: aipbot
    User-agent: Aqua_Products 
    User-agent: asterias 
    User-agent: b2w/0.1 
    User-agent: BackDoorBot/1.0 
    User-agent: becomebot
    User-agent: BlowFish/1.0 
    User-agent: Bookmark search tool 
    User-agent: BotALot 
    User-agent: BotRightHere 
    User-agent: BuiltBotTough 
    User-agent: Bullseye/1.0 
    User-agent: BunnySlippers 
    User-agent: CheeseBot 
    User-agent: CherryPicker 
    User-agent: CherryPickerElite/1.0 
    User-agent: CherryPickerSE/1.0 
    User-agent: Copernic 
    User-agent: CopyRightCheck 
    User-agent: cosmos 
    User-agent: Crescent 
    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 
    User-agent: Curl 
    User-agent: DittoSpyder 
    User-agent: EmailCollector 
    User-agent: EmailSiphon 
    User-agent: EmailWolf 
    User-agent: EroCrawler 
    User-agent: ExtractorPro 
    User-agent: FairAd Client 
    User-agent: Fasterfox
    User-agent: Flaming AttackBot 
    User-agent: Foobot 
    User-agent: Gaisbot 
    User-agent: GetRight/4.2 
    User-agent: Harvest/1.5 
    User-agent: hloader 
    User-agent: httplib 
    User-agent: HTTrack 3.0 
    User-agent: humanlinks 
    User-agent: IconSurf
    User-agent: InfoNaviRobot 
    User-agent: Iron33/1.0.2 
    User-agent: JennyBot 
    User-agent: Kenjin Spider 
    User-agent: Keyword Density/0.9 
    User-agent: larbin 
    User-agent: LexiBot 
    User-agent: libWeb/clsHTTP 
    User-agent: LinkextractorPro 
    User-agent: LinkScan/8.1a Unix 
    User-agent: LinkWalker 
    User-agent: LNSpiderguy 
    User-agent: lwp-trivial 
    User-agent: lwp-trivial/1.34 
    User-agent: Mata Hari 
    User-agent: Microsoft URL Control 
    User-agent: Microsoft URL Control - 5.01.4511 
    User-agent: Microsoft URL Control - 6.00.8169 
    User-agent: MIIxpc 
    User-agent: MIIxpc/4.2 
    User-agent: Mister PiX 
    User-agent: moget 
    User-agent: moget/2.1 
    User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95) 
    User-agent: MSIECrawler 
    User-agent: NetAnts 
    User-agent: NetMechanic 
    User-agent: NICErsPRO 
    User-agent: Offline Explorer 
    User-agent: Openbot 
    User-agent: Openfind 
    User-agent: Openfind data gatherer 
    User-agent: Oracle Ultra Search 
    User-agent: PerMan 
    User-agent: ProPowerBot/2.14 
    User-agent: ProWebWalker 
    User-agent: psbot 
    User-agent: Python-urllib 
    User-agent: QueryN Metasearch 
    User-agent: Radiation Retriever 1.1 
    User-agent: RepoMonkey 
    User-agent: RepoMonkey Bait & Tackle/v1.01 
    User-agent: RMA 
    User-agent: searchpreview 
    User-agent: SiteSnagger 
    User-agent: seekbot 
    User-agent: Seekbot 
    User-agent: Seekbot/1.0 
    User-agent: SpankBot 
    User-agent: spanner 
    User-agent: SurveyBot
    User-agent: suzuran 
    User-agent: Szukacz/1.4 
    User-agent: Teleport 
    User-agent: TeleportPro 
    User-agent: Telesoft 
    User-agent: The Intraformant 
    User-agent: TheNomad 
    User-agent: TightTwatBot 
    User-agent: toCrawl/UrlDispatcher 
    User-agent: True_Robot 
    User-agent: True_Robot/1.0 
    User-agent: turingos 
    User-agent: TurnitinBot 
    User-agent: TurnitinBot/1.5 
    User-agent: URL Control 
    User-agent: URL_Spider_Pro 
    User-agent: URLy Warning 
    User-agent: VCI 
    User-agent: VCI WebViewer VCI WebViewer Win32 
    User-agent: Web Image Collector 
    User-agent: WebAuto 
    User-agent: WebBandit 
    User-agent: WebBandit/3.50 
    User-agent: WebCapture 2.0 
    User-agent: WebCopier 
    User-agent: WebCopier v.2.2 
    User-agent: WebCopier v3.2a 
    User-agent: WebEnhancer 
    User-agent: Web Reaper
    User-agent: WebSauger 
    User-agent: Website Quester 
    User-agent: Webster Pro 
    User-agent: WebStripper 
    User-agent: WebZip 
    User-agent: WebZip 
    User-agent: WebZip/4.0 
    User-agent: WebZIP/4.21 
    User-agent: WebZIP/5.0 
    User-agent: WebVulnCrawl
    User-agent: WebVulnScan
    User-agent: Wget 
    User-agent: wget 
    User-agent: Wget/1.5.3 
    User-agent: Wget/1.6 
    User-agent: WWW-Collector-E 
    User-agent: Xenu's 
    User-agent: Xenu's Link Sleuth 1.1c 
    User-agent: Zeus 
    User-agent: Zeus 32297 Webster Pro V2.9 Win32 
    User-agent: Zeus Link Scout 
    Disallow: /
    
     
    Code (markup):
     
    evera, Sep 14, 2007 IP
    birdsq likes this.
  3. BlogSalesman

    BlogSalesman Well-Known Member

    Messages:
    1,689
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    100
    #3
    Is there a reason you went with disallow, and didn't just do the 3 bots he wants and allow?
     
    BlogSalesman, Sep 18, 2007 IP
  4. evera

    evera Peon

    Messages:
    284
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #4
    The reason is the robots exclusion standard never defined allow, because the robots.txt is to exclude stuff.
     
    evera, Sep 18, 2007 IP
  5. MichelRobinson2

    MichelRobinson2 Guest

    Messages:
    118
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #5
    This code is very good but I know one think its working properly or not
     
    MichelRobinson2, Sep 27, 2007 IP
  6. harrysmith

    harrysmith Active Member

    Messages:
    453
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    60
    #6
    You can block other spiders using robots.txt file.
    Also if you want to restrict certain pages you may use <meta name="robots" content="noindex,nofollow"> so that those pages will not get crawl/indexed ( if you are looking forthe same )
     
    harrysmith, Sep 28, 2007 IP