How to put robots.txt

Discussion in 'robots.txt' started by nonflasher, Feb 9, 2006.

  1. #1
    hey all,

    I've a little problem. I've got at the moment only ftp access to my server to the (-sub) directory which is linked to my site.

    example:

    www.example.com/template/ <-here are all my files like index.htm ...

    but I have to put the robots.txt into

    www.example.com/ <---

    How can I do that, the server runs with confixx.
     
    nonflasher, Feb 9, 2006 IP
  2. jrd1mra

    jrd1mra Peon

    Messages:
    243
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #2
    you have to get ftp access to either the www/ folder or the root folder. You can do this through your site admin or your cpanel. Or ask your host to do it for you.
     
    jrd1mra, Feb 9, 2006 IP
  3. JEET

    JEET Notable Member

    Messages:
    3,832
    Likes Received:
    502
    Best Answers:
    19
    Trophy Points:
    265
    #3
    I think this is how it works .

    Example.com already has a robots.txt
    Now you created a folder in their root .
    If they don't "disallow " your folder named "templets" in robots.txt , then search engines will not have problems indexing your site .

    "An empty robots is considered as Allow: all" until disallow is mentioned .

    MSN did not ask for a robots file when I created a folder in my root .It used the robots of the main domain in your case example.com
    Check the error logs . If there is a request for robots.txt , then you need to put one otherwise they found it .

    I am not very sure about what I said above , it's just what I saw in my logs.
    Let others comment .

    Regards
    jeet
     
    JEET, Feb 9, 2006 IP
  4. mussolinihitler

    mussolinihitler Peon

    Messages:
    258
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #4
    hmm....if no robots.txt means index everything. JEET, so does that mean we dont need a robots.txt if there is nothing to dissallow in our website ?
     
    mussolinihitler, May 7, 2006 IP
  5. concord

    concord Guest

    Messages:
    10
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    see if the robots.txt is already there? If not, you typically dont need it unless you want to get sophisticated about who and who not to block
     
    concord, May 7, 2006 IP
  6. DP Most

    DP Most Well-Known Member

    Messages:
    478
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    108
    #6
    Can any one you tell me its benefits pls
     
    DP Most, May 7, 2006 IP
  7. dizyn

    dizyn Active Member

    Messages:
    251
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    53
    #7
    can anyone share sample robots.txt file.
     
    dizyn, May 12, 2006 IP
  8. bentong

    bentong Banned

    Messages:
    3,543
    Likes Received:
    257
    Best Answers:
    0
    Trophy Points:
    0
    #8
    sometimes you have to blocked some robots coz they are only a waste of your bandwidth, email harvester bots, etc...

    @ dizyn
    http://www.seyq.com/robots.txt <- sample

    just look for a robots.txt file to some popular sites, copy it and upload to your server. ;)
     
    bentong, May 13, 2006 IP
  9. jrd1mra

    jrd1mra Peon

    Messages:
    243
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #9
    User-agent: *


    User-agent: Turn It In
    Disallow: /

    User-agent: grub-client
    Disallow: /

    User-agent: grub
    Disallow: /

    User-agent: looksmart
    Disallow: /

    User-agent: WebZip
    Disallow: /

    User-agent: larbin
    Disallow: /

    User-agent: b2w/0.1
    Disallow: /

    User-agent: psbot
    Disallow: /

    User-agent: Python-urllib
    Disallow: /

    User-agent: NetMechanic
    Disallow: /

    User-agent: URL_Spider_Pro
    Disallow: /

    User-agent: CherryPicker
    Disallow: /

    User-agent: EmailCollector
    Disallow: /

    User-agent: EmailSiphon
    Disallow: /

    User-agent: WebBandit
    Disallow: /

    User-agent: EmailWolf
    Disallow: /

    User-agent: ExtractorPro
    Disallow: /

    User-agent: CopyRightCheck
    Disallow: /

    User-agent: Crescent
    Disallow: /

    User-agent: SiteSnagger
    Disallow: /

    User-agent: ProWebWalker
    Disallow: /

    User-agent: CheeseBot
    Disallow: /

    User-agent: LNSpiderguy
    Disallow: /


    User-agent: Teleport
    Disallow: /

    User-agent: TeleportPro
    Disallow: /

    User-agent: MIIxpc
    Disallow: /

    User-agent: Telesoft
    Disallow: /

    User-agent: Website Quester
    Disallow: /

    User-agent: moget/2.1
    Disallow: /

    User-agent: WebZip/4.0
    Disallow: /

    User-agent: WebStripper
    Disallow: /

    User-agent: WebSauger
    Disallow: /

    User-agent: WebCopier
    Disallow: /

    User-agent: NetAnts
    Disallow: /

    User-agent: Mister PiX
    Disallow: /

    User-agent: WebAuto
    Disallow: /

    User-agent: TheNomad
    Disallow: /

    User-agent: WWW-Collector-E
    Disallow: /

    User-agent: RMA
    Disallow: /

    User-agent: libWeb/clsHTTP
    Disallow: /

    User-agent: asterias
    Disallow: /

    User-agent: httplib
    Disallow: /

    User-agent: turingos
    Disallow: /

    User-agent: spanner
    Disallow: /

    User-agent: InfoNaviRobot
    Disallow: /

    User-agent: Harvest/1.5
    Disallow: /

    User-agent: Bullseye/1.0
    Disallow: /

    User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
    Disallow: /

    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    Disallow: /

    User-agent: CherryPickerSE/1.0
    Disallow: /

    User-agent: CherryPickerElite/1.0
    Disallow: /

    User-agent: WebBandit/3.50
    Disallow: /

    User-agent: NICErsPRO
    Disallow: /

    User-agent: Microsoft URL Control - 5.01.4511
    Disallow: /

    User-agent: DittoSpyder
    Disallow: /

    User-agent: Foobot
    Disallow: /

    User-agent: WebmasterWorldForumBot
    Disallow: /

    User-agent: SpankBot
    Disallow: /

    User-agent: BotALot
    Disallow: /

    User-agent: lwp-trivial/1.34
    Disallow: /

    User-agent: lwp-trivial
    Disallow: /

    User-agent: BunnySlippers
    Disallow: /

    User-agent: Microsoft URL Control - 6.00.8169
    Disallow: /

    User-agent: URLy Warning
    Disallow: /

    User-agent: /1.6
    Disallow: /

    User-agent: /1.5.3
    Disallow: /

    User-agent:
    Disallow: /

    User-agent: LinkWalker
    Disallow: /

    User-agent: cosmos
    Disallow: /

    User-agent: moget
    Disallow: /

    User-agent: hloader
    Disallow: /

    User-agent: humanlinks
    Disallow: /

    User-agent: LinkextractorPro
    Disallow: /

    User-agent: Offline Explorer
    Disallow: /

    User-agent: Mata Hari
    Disallow: /

    User-agent: LexiBot
    Disallow: /

    User-agent: Web Image Collector
    Disallow: /

    User-agent: The Intraformant
    Disallow: /

    User-agent: True_Robot/1.0
    Disallow: /

    User-agent: True_Robot
    Disallow: /

    User-agent: BlowFish/1.0
    Disallow: /

    User-agent: JennyBot
    Disallow: /

    User-agent: MIIxpc/4.2
    Disallow: /

    User-agent: BuiltBotTough
    Disallow: /

    User-agent: ProPowerBot/2.14
    Disallow: /

    User-agent: BackDoorBot/1.0
    Disallow: /

    User-agent: toCrawl/UrlDispatcher
    Disallow: /

    User-agent: WebEnhancer
    Disallow: /

    User-agent: suzuran
    Disallow: /

    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow: /

    User-agent: VCI
    Disallow: /

    User-agent: Szukacz/1.4
    Disallow: /

    User-agent: QueryN Metasearch
    Disallow: /

    User-agent: Openfind data gathere
    Disallow: /

    User-agent: Openfind
    Disallow: /

    User-agent: Xenu's Link Sleuth 1.1c
    Disallow: /

    User-agent: Xenu's
    Disallow: /

    User-agent: Zeus
    Disallow: /

    User-agent: RepoMonkey Bait & Tackle/v1.01
    Disallow: /

    User-agent: RepoMonkey
    Disallow: /

    User-agent: Microsoft URL Control
    Disallow: /

    User-agent: Openbot
    Disallow: /

    User-agent: URL Control
    Disallow: /

    User-agent: Zeus Link Scout
    Disallow: /

    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    Disallow: /

    User-agent: Webster Pro
    Disallow: /

    User-agent: EroCrawler
    Disallow: /

    User-agent: LinkScan/8.1a Unix
    Disallow: /

    User-agent: Keyword Density/0.9
    Disallow: /

    User-agent: Kenjin Spider
    Disallow: /

    User-agent: Iron33/1.0.2
    Disallow: /

    User-agent: Bookmark search tool
    Disallow: /

    User-agent: GetRight/4.2
    Disallow: /

    User-agent: FairAd Client
    Disallow: /

    User-agent: Gaisbot
    Disallow: /

    User-agent: Aqua_Products
    Disallow: /

    User-agent: Radiation Retriever 1.1
    Disallow: /

    User-agent: Flaming AttackBot
    Disallow: /

    User-agent: Oracle Ultra Search
    Disallow: /

    User-agent: MSIECrawler
    Disallow: /

    User-agent: PerMan
    Disallow: /

    User-agent: searchpreview
    Disallow: /

    User-agent: aipbot
    Disallow: /

    User-agent: abot
    Disallow: /
     
    jrd1mra, May 13, 2006 IP
  10. sam1

    sam1 Active Member

    Messages:
    679
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    58
    #10
    Hmmm... very good info.
     
    sam1, Jun 3, 2006 IP
  11. MaxPowers

    MaxPowers Well-Known Member

    Messages:
    264
    Likes Received:
    5
    Best Answers:
    1
    Trophy Points:
    120
    #11
    robots.txt needs to go in your web root.... mydomain.com/robots.txt

    It is an exclusion standard to kindly request that bots shouldn't visit an area or a page, but it does not 'block' bad bots. It merely suggests to 'honest' bots that they shouldn't spider an area.

    I have a script on my site specifically for email harvesters that links to page after page of fake email addresses generated at random. The robots.txt lists this page as Disallowed for the sake of 'honest' bots like Google, MSN, Yahoo, (and the AutoMapIt.com spider), but 'bad' bots that ignore the robots.txt will go hog wild harvesting thousands or even millions of random, gibberish email addresses.

    Robots.txt doesn't block bad bots so it's best to use that file to help good bots stay out of traps and admin areas. As for bad bots, if you find a particular one visiting you, it's best to use htaccess to block them by IP, user-agent, or other identifying info... htaccess is absolute, robots.txt is just a suggestion.
     
    MaxPowers, Jun 3, 2006 IP
  12. dizyn

    dizyn Active Member

    Messages:
    251
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    53
    #12
    thanks for sharing.
     
    dizyn, Jun 7, 2006 IP
  13. jason102178

    jason102178 Peon

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #13
    hey maybe someone can help me i did install the robot.txt to my website but i have a question i have 2 parts to my website my main website is jayboi25 for example and i have jayboi24 and jayboi26 to make my website work my problem was google was only crawling the one part of my site so i installed the robot.txt and then i uploaded a sitemap and sent that to google what im not sure of is do i have to put a sitemap on jayboi24 and jayboi26 also or just put 3 sitemaps on my main part of my site jayboi25 im not sure how to do this if anyone can help feel free to contact me thanks
     
    jason102178, Jun 5, 2008 IP
  14. Hunnigs

    Hunnigs Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Thank you, very helped!
     
    Hunnigs, Jun 6, 2008 IP