1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Only googlebot,slurp and msnbot should crawl my site.

Discussion in 'robots.txt' started by birdsq, Sep 10, 2007.

  1. #1
    Please help!!
    I need only google, yahoo and msn to crawl my site..
    How can i do that?
     
    birdsq, Sep 10, 2007 IP
  2. evera

    evera Peon

    Messages:
    284
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You can try this robots.txt, but most spiders don't care about robots.txt anyways, so maybe it won't help much. And its not complete some spiders are maybe missing:

    Code (Text):
    1.  
    2.  
    3. User-Agent: Teoma
    4. User-Agent: Ask Jeeves
    5. User-Agent: Jeeves
    6. User-agent: Seekbot/1.0
    7. User-agent: seekbot
    8. User-agent: EchO!/2.0
    9. User-agent: echo!
    10. User-agent: convera
    11. User-agent: Convera Internet Spider V6.x
    12. User-agent: ConveraCrawler/0.2
    13. User-agent: ConveraCrawler/0.9d
    14. User-agent: ConveraMultiMediaCrawler/0.1
    15. User-Agent: Mozilla/2.0 (compatible; Ask Jeeves)  
    16. User-agent: aipbot
    17. User-agent: Aqua_Products
    18. User-agent: asterias
    19. User-agent: b2w/0.1
    20. User-agent: BackDoorBot/1.0
    21. User-agent: becomebot
    22. User-agent: BlowFish/1.0
    23. User-agent: Bookmark search tool
    24. User-agent: BotALot
    25. User-agent: BotRightHere
    26. User-agent: BuiltBotTough
    27. User-agent: Bullseye/1.0
    28. User-agent: BunnySlippers
    29. User-agent: CheeseBot
    30. User-agent: CherryPicker
    31. User-agent: CherryPickerElite/1.0
    32. User-agent: CherryPickerSE/1.0
    33. User-agent: Copernic
    34. User-agent: CopyRightCheck
    35. User-agent: cosmos
    36. User-agent: Crescent
    37. User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    38. User-agent: Curl
    39. User-agent: DittoSpyder
    40. User-agent: EmailCollector
    41. User-agent: EmailSiphon
    42. User-agent: EmailWolf
    43. User-agent: EroCrawler
    44. User-agent: ExtractorPro
    45. User-agent: FairAd Client
    46. User-agent: Fasterfox
    47. User-agent: Flaming AttackBot
    48. User-agent: Foobot
    49. User-agent: Gaisbot
    50. User-agent: GetRight/4.2
    51. User-agent: Harvest/1.5
    52. User-agent: hloader
    53. User-agent: httplib
    54. User-agent: HTTrack 3.0
    55. User-agent: humanlinks
    56. User-agent: IconSurf
    57. User-agent: InfoNaviRobot
    58. User-agent: Iron33/1.0.2
    59. User-agent: JennyBot
    60. User-agent: Kenjin Spider
    61. User-agent: Keyword Density/0.9
    62. User-agent: larbin
    63. User-agent: LexiBot
    64. User-agent: libWeb/clsHTTP
    65. User-agent: LinkextractorPro
    66. User-agent: LinkScan/8.1a Unix
    67. User-agent: LinkWalker
    68. User-agent: LNSpiderguy
    69. User-agent: lwp-trivial
    70. User-agent: lwp-trivial/1.34
    71. User-agent: Mata Hari
    72. User-agent: Microsoft URL Control
    73. User-agent: Microsoft URL Control - 5.01.4511
    74. User-agent: Microsoft URL Control - 6.00.8169
    75. User-agent: MIIxpc
    76. User-agent: MIIxpc/4.2
    77. User-agent: Mister PiX
    78. User-agent: moget
    79. User-agent: moget/2.1
    80. User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
    81. User-agent: MSIECrawler
    82. User-agent: NetAnts
    83. User-agent: NetMechanic
    84. User-agent: NICErsPRO
    85. User-agent: Offline Explorer
    86. User-agent: Openbot
    87. User-agent: Openfind
    88. User-agent: Openfind data gatherer
    89. User-agent: Oracle Ultra Search
    90. User-agent: PerMan
    91. User-agent: ProPowerBot/2.14
    92. User-agent: ProWebWalker
    93. User-agent: psbot
    94. User-agent: Python-urllib
    95. User-agent: QueryN Metasearch
    96. User-agent: Radiation Retriever 1.1
    97. User-agent: RepoMonkey
    98. User-agent: RepoMonkey Bait & Tackle/v1.01
    99. User-agent: RMA
    100. User-agent: searchpreview
    101. User-agent: SiteSnagger
    102. User-agent: seekbot
    103. User-agent: Seekbot
    104. User-agent: Seekbot/1.0
    105. User-agent: SpankBot
    106. User-agent: spanner
    107. User-agent: SurveyBot
    108. User-agent: suzuran
    109. User-agent: Szukacz/1.4
    110. User-agent: Teleport
    111. User-agent: TeleportPro
    112. User-agent: Telesoft
    113. User-agent: The Intraformant
    114. User-agent: TheNomad
    115. User-agent: TightTwatBot
    116. User-agent: toCrawl/UrlDispatcher
    117. User-agent: True_Robot
    118. User-agent: True_Robot/1.0
    119. User-agent: turingos
    120. User-agent: TurnitinBot
    121. User-agent: TurnitinBot/1.5
    122. User-agent: URL Control
    123. User-agent: URL_Spider_Pro
    124. User-agent: URLy Warning
    125. User-agent: VCI
    126. User-agent: VCI WebViewer VCI WebViewer Win32
    127. User-agent: Web Image Collector
    128. User-agent: WebAuto
    129. User-agent: WebBandit
    130. User-agent: WebBandit/3.50
    131. User-agent: WebCapture 2.0
    132. User-agent: WebCopier
    133. User-agent: WebCopier v.2.2
    134. User-agent: WebCopier v3.2a
    135. User-agent: WebEnhancer
    136. User-agent: Web Reaper
    137. User-agent: WebSauger
    138. User-agent: Website Quester
    139. User-agent: Webster Pro
    140. User-agent: WebStripper
    141. User-agent: WebZip
    142. User-agent: WebZip
    143. User-agent: WebZip/4.0
    144. User-agent: WebZIP/4.21
    145. User-agent: WebZIP/5.0
    146. User-agent: WebVulnCrawl
    147. User-agent: WebVulnScan
    148. User-agent: Wget
    149. User-agent: wget
    150. User-agent: Wget/1.5.3
    151. User-agent: Wget/1.6
    152. User-agent: WWW-Collector-E
    153. User-agent: Xenu's
    154. User-agent: Xenu's Link Sleuth 1.1c
    155. User-agent: Zeus
    156. User-agent: Zeus 32297 Webster Pro V2.9 Win32
    157. User-agent: Zeus Link Scout
    158. Disallow: /
    159.  
    160.  
     
    evera, Sep 14, 2007 IP
    birdsq likes this.
  3. BlogSalesman

    BlogSalesman Well-Known Member

    Messages:
    1,688
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    100
    #3
    Is there a reason you went with disallow, and didn't just do the 3 bots he wants and allow?
     
    BlogSalesman, Sep 18, 2007 IP
  4. evera

    evera Peon

    Messages:
    284
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #4
    The reason is the robots exclusion standard never defined allow, because the robots.txt is to exclude stuff.
     
    evera, Sep 18, 2007 IP
  5. MichelRobinson2

    MichelRobinson2 Guest

    Messages:
    118
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #5
    This code is very good but I know one think its working properly or not
     
    MichelRobinson2, Sep 27, 2007 IP
  6. harrysmith

    harrysmith Active Member

    Messages:
    453
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    60
    #6
    You can block other spiders using robots.txt file.
    Also if you want to restrict certain pages you may use <meta name="robots" content="noindex,nofollow"> so that those pages will not get crawl/indexed ( if you are looking forthe same )
     
    harrysmith, Sep 28, 2007 IP