What pages should be there in robots.txt ?

Discussion in 'Google' started by vacationcluster, Mar 1, 2012.

  1. #1
    Hello members ,


    i would like to know that which pages should be there in robots.txt of our site , whether it is contact us page , login page , Registration page or other pages ...
    Please give me your reply . Waiting for your reply .



    Regards,
    Suzanne .
     
    vacationcluster, Mar 1, 2012 IP
  2. GMF

    GMF Well-Known Member

    Messages:
    855
    Likes Received:
    113
    Best Answers:
    19
    Trophy Points:
    145
    #2
    Do you even know the concept of the robots.txt file and its uses?
     
    GMF, Mar 1, 2012 IP
  3. matty_wllson

    matty_wllson Peon

    Messages:
    158
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Robots.txt file is used to restrict or allow the search engines to crawl a website. If you want that search engines crawlers visit your whole site then put the following setting of robots.txt file

    User-agent:*
    Disallow :

    And if you want to restrict the search engines crawlers to visit your login or contact us page then you will put the below setting.

    User-agent:*
    Disallow : / login or contact us directory

    I hope you understand the concept.

    Good luck
     
    matty_wllson, Mar 1, 2012 IP
  4. vacationcluster

    vacationcluster Peon

    Messages:
    99
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Yes , it mainly for search engine to make understand that disallow or allow this page
     
    vacationcluster, Mar 1, 2012 IP
  5. imfusa

    imfusa Active Member

    Messages:
    694
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    70
    #5
    Well it depends, which pages you do not want to get indexed. usually the contact page of a website should get indexed, the registration and login should not.
     
    imfusa, Mar 1, 2012 IP
  6. ronniealbert28

    ronniealbert28 Peon

    Messages:
    7
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    its all depend on search engine algorithm.
     
    ronniealbert28, Mar 4, 2012 IP
  7. C.Rebecca

    C.Rebecca Active Member

    Messages:
    1,401
    Likes Received:
    11
    Best Answers:
    1
    Trophy Points:
    65
    #7
    Pages which you think are not necessary for SE crawling are suppose to include in robots.txt files. e.g. printable of a page or page with dupe content or any other page which you don't want to get crawled.
     
    C.Rebecca, Mar 5, 2012 IP
  8. christajoe

    christajoe Member

    Messages:
    44
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    46
    #8
    The pages those are required not to shown onto the web-page or from the users, are kept in the robots.txt file. The pages that are required for the internal working process of an organization are mainly kept in robots. txt file.:)
     
    christajoe, Mar 5, 2012 IP
  9. cheenki

    cheenki Peon

    Messages:
    59
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #9
    It’s all about you, what you want to get indexed. All up to you what would like to allow or disallow for search engine.
     
    cheenki, Mar 5, 2012 IP
  10. monagupta4u

    monagupta4u Peon

    Messages:
    23
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Those pages should be blocked in the robots.txt that you don't want to be crawled by the search engine.

    Thanks,
     
    monagupta4u, Mar 5, 2012 IP
  11. ThePassiveIncomeBlog

    ThePassiveIncomeBlog Active Member

    Messages:
    847
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    50
    #11
    Do those robot.txt identified by all search engines or Google only?
     
    ThePassiveIncomeBlog, Mar 5, 2012 IP
  12. webgnomes

    webgnomes Peon

    Messages:
    7
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    @vacationcluster You should only include files in your robots.txt that you do NOT want search engine crawlers to index. Typically, you only use robots.txt if there is a large section of your site that you don't want search engines to index (e.g., an entire directory). If you only want to keep specific pages out of the index, it's typically easier to use a robots meta tag for those specific pages.

    @ThePassiveIncomeBlog Each record in a robots.txt file has a User-agent field, which specifies which search engines the record applies to. For example, User-agent: * specifies that all well-behaved crawlers should respect the corresponding record. If you only want to apply a record to Google, you would use User-agent: Googlebot

    For more information about the robots.txt file, read this: http://www.webgnomes.org/blog/robots-txt-file-guide-that-wont-put-you-to-sleep/
    For more information about the robots meta tag, read this: http://www.webgnomes.org/blog/robots-meta-tag-definitive-guide/
     
    webgnomes, Mar 6, 2012 IP
  13. DickGomes

    DickGomes Peon

    Messages:
    26
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #13
    It depends on you. Which pages you want to want to get indexed and which you don't.
     
    DickGomes, Mar 6, 2012 IP
  14. ericksteve

    ericksteve Peon

    Messages:
    66
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Robots.txt should contain all the relevant pages of your site that has some useful information about your company or product. All these pages will be static and no dynamic or gate way pages should be there in Robots.txt. You need not to add privacy policy, terms and conditions and other pages however its up to you whether you want to add contact us page or not. I recommend to add contact us page in Robotx.txt.
     
    ericksteve, Mar 7, 2012 IP
  15. vacationcluster

    vacationcluster Peon

    Messages:
    99
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #15
    thanks for your useful and precious information about robots.txt
     
    vacationcluster, Mar 7, 2012 IP
  16. garish

    garish Member

    Messages:
    97
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    28
    #16
    The pages which you don't want Search engines to crawl and index.
     
    garish, Mar 7, 2012 IP
  17. ImAdmirer

    ImAdmirer Greenhorn

    Messages:
    73
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    18
    #17
    The following is the robots.txt file content which I use on most of my wordpress sites:

    
    Sitemap: http://www.website.com/sitemap.xml
    
    User-agent: Mediapartners-Google
    Disallow:
    
    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /temp/
    Disallow: /any-other-folder-to-restrict/
    Disallow: /wp-login.php
    Disallow: /wp-admin/
    Disallow: /wp-comments-post.php
    Disallow: /wp-commentsrss2.php
    
    User-agent: *
    Disallow: /*.gif$
    Disallow: /*.jpg$
    Disallow: /*.jpeg$
    Disallow: /*.png$
    Disallow: /*.zip$
    Disallow: /*.doc$
    Disallow: /*.exe$
    Disallow: /*.pdf$
    
    User-agent: ia_archiver
    Disallow: /
    User-agent: atSpider 
    Disallow: /
    User-agent: b2w/0.1
    Disallow: /
    User-agent: BecomeBot
    Disallow: /
    User-agent: CheeseBot
    Disallow: /
    User-agent: CherryPicker
    Disallow: /
    User-agent: CopyRightCheck
    Disallow: /
    User-agent: Copernic
    Disallow: /
    User-agent: Crescent
    Disallow: /
    User-agent: DSurf
    Disallow: /
    User-agent: dumbot
    Disallow: /
    User-agent: EliteSys Entry 
    Disallow: /
    User-agent: EmailCollector
    Disallow: /
    User-agent: EmailSiphon
    Disallow: /
    User-agent: EmailWolf
    Disallow: /
    User-agent: Enterprise_Search/1.0
    Disallow: /
    User-agent: Enterprise_Search
    Disallow: /
    User-agent: es
    Disallow: /
    User-agent: ExtractorPro
    Disallow: /
    User-agent: Flaming AttackBot
    Disallow: /
    User-agent: FreeFind
    Disallow: /
    User-agent: grub
    Disallow: /
    User-agent: grub-client
    Disallow: /
    User-agent: Hatena Antenna
    Disallow: /
    User-agent: Jetbot
    Disallow: /
    User-agent: Jetbot/1.0
    Disallow: /
    User-agent: larbin
    Disallow: /
    User-agent: Mail Sweeper
    Disallow: /
    User-agent: munky
    Disallow: /
    User-agent: naver
    Disallow: /
    User-agent: NetMechanic
    Disallow: /
    User-agent: Nutch
    Disallow: /
    User-agent: OmniExplorer_Bot
    Disallow: /
    User-agent: Oracle Ultra Search
    Disallow: /
    User-agent: PerMan
    Disallow: /
    User-agent: ProWebWalker
    Disallow: /
    User-agent: psbot
    Disallow: /
    User-agent: Python-urllib
    Disallow: /
    User-agent: Radiation Retriever 1.1
    Disallow: /
    User-agent: Roverbot
    Disallow: /
    User-agent: searchpreview
    Disallow: /
    User-agent: SiteSnagger
    Disallow: /
    User-agent: sootle
    Disallow: /
    User-agent: Stanford
    Disallow: /
    User-agent: URL_Spider_Pro
    Disallow: /
    User-agent: WebBandit
    Disallow: /
    User-agent: WebEmailExtrac
    Disallow: / 
    User-agent: WebVac
    Disallow: /
    User-agent: WebZip
    Disallow: /
    User-agent: xGet
    Disallow: /
    User-agent: wGet
    Disallow: / 
    User-agent: WebWalk 
    Disallow: /
    User-agent: webvac
    Disallow: /
    User-agent: WebReaper 
    Disallow: /
    User-agent: WebMirror
    Disallow: /
    User-agent: WebFetcher 
    Disallow: /
    User-agent: WebCopy
    Disallow: /
    User-agent: webcopier 
    Disallow: /
    User-agent: WebCatcher
    Disallow: / 
    User-agent: WebBandit
    Disallow: /
    User-agent: w3mir
    Disallow: /
    User-agent: vobsub 
    Disallow: /
    User-agent: Templeton 
    Disallow: /
    User-agent: ssearcher100 
    Disallow: /
    User-agent: SpiderBot
    Disallow: /
    User-agent: Shai'Hulud 
    Disallow: /
    User-agent: PBWF
    Disallow: /
    User-agent: LightningDownload 
    Disallow: /
    User-agent: KDD Exploror
    Disallow: /
    User-agent: Jeeves
    Disallow: /
    User-agent: Internet Explore 
    Disallow: /
    User-agent: InfoSpiders
    Disallow: /
    User-agent: httrack
    Disallow: /
    User-agent: HavIndex 
    Disallow: /
    User-agent: GetUrl
    Disallow: /
    User-agent: GetBot
    Disallow: / 
    User-agent: ESIRover 
    Disallow: /
    User-agent: Download Wonder 
    Disallow: /
    User-agent: Collage
    Disallow: /
    User-agent: LNSpiderguy
    Disallow: /
    User-agent: Alexibot
    Disallow: /
    User-agent: Teleport
    Disallow: /
    User-agent: TeleportPro
    Disallow: /
    User-agent: Stanford Comp Sci
    Disallow: /
    User-agent: MIIxpc
    Disallow: /
    User-agent: Telesoft
    Disallow: /
    User-agent: Website Quester
    Disallow: /
    User-agent: moget/2.1
    Disallow: /
    User-agent: WebZip/4.0
    Disallow: /
    User-agent: WebStripper
    Disallow: /
    User-agent: WebSauger
    Disallow: /
    User-agent: WebCopier
    Disallow: /
    User-agent: NetAnts
    Disallow: /
    User-agent: Mister PiX
    Disallow: /
    User-agent: WebAuto
    Disallow: /
    User-agent: TheNomad
    Disallow: /
    User-agent: WWW-Collector-E
    Disallow: /
    User-agent: RMA
    Disallow: /
    User-agent: libWeb/clsHTTP
    Disallow: /
    User-agent: asterias
    Disallow: /
    User-agent: httplib
    Disallow: /
    User-agent: turingos
    Disallow: /
    User-agent: spanner
    Disallow: /
    User-agent: InfoNaviRobot
    Disallow: /
    User-agent: Harvest/1.5
    Disallow: /
    User-agent: Bullseye/1.0
    Disallow: /
    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    Disallow: /
    User-agent: CherryPickerSE/1.0
    Disallow: /
    User-agent: CherryPickerElite/1.0
    Disallow: /
    User-agent: WebBandit/3.50
    Disallow: /
    User-agent: NICErsPRO
    Disallow: /
    User-agent: Microsoft URL Control - 5.01.4511
    Disallow: /
    User-agent: DittoSpyder
    Disallow: /
    User-agent: Foobot
    Disallow: /
    User-agent: SpankBot
    Disallow: /
    User-agent: BotALot
    Disallow: /
    User-agent: lwp-trivial/1.34
    Disallow: /
    User-agent: lwp-trivial
    Disallow: /
    User-agent: BunnySlippers
    Disallow: /
    User-agent: Microsoft URL Control - 6.00.8169
    Disallow: /
    User-agent: URLy Warning
    Disallow: /
    User-agent: Wget/1.6
    Disallow: /
    User-agent: Wget/1.5.3
    Disallow: /
    User-agent: Wget
    Disallow: /
    User-agent: LinkWalker
    Disallow: /
    User-agent: cosmos
    Disallow: /
    User-agent: moget
    Disallow: /
    User-agent: hloader
    Disallow: /
    User-agent: URL Control
    Disallow: /
    User-agent: Zeus Link Scout
    Disallow: /
    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    Disallow: /
    User-agent: Webster Pro
    Disallow: /
    User-agent: EroCrawler
    Disallow: /
    User-agent: LinkScan/8.1a Unix
    Disallow: /
    User-agent: Keyword Density/0.9
    Disallow: /
    User-agent: Kenjin Spider
    Disallow: /
    User-agent: Iron33/1.0.2
    Disallow: /
    User-agent: Bookmark search tool
    Disallow: /
    User-agent: GetRight/4.2
    Disallow: /
    User-agent: FairAd Client
    Disallow: /
    User-agent: Gaisbot
    Disallow: /
    User-agent: humanlinks
    Disallow: /
    User-agent: LinkextractorPro
    Disallow: /
    User-agent: Offline Explorer
    Disallow: /
    User-agent: Mata Hari
    Disallow: /
    User-agent: LexiBot
    Disallow: /
    User-agent: Web Image Collector
    Disallow: /
    User-agent: The Intraformant
    Disallow: /
    User-agent: True_Robot/1.0
    Disallow: /
    User-agent: True_Robot
    Disallow: /
    User-agent: BlowFish/1.0
    Disallow: /
    User-agent: JennyBot
    Disallow: /
    User-agent: MIIxpc/4.2
    Disallow: /
    User-agent: BuiltBotTough
    Disallow: /
    User-agent: ProPowerBot/2.14
    Disallow: /
    User-agent: BackDoorBot/1.0
    Disallow: /
    User-agent: toCrawl/UrlDispatcher
    Disallow: /
    User-agent: WebEnhancer
    Disallow: /
    User-agent: suzuran
    Disallow: /
    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow: /
    User-agent: VCI
    Disallow: /
    User-agent: Szukacz/1.4 
    Disallow: /
    User-agent: QueryN Metasearch
    Disallow: /
    User-agent: Openfind 
    Disallow: /
    User-agent: Zeus
    Disallow: /
    User-agent: RepoMonkey Bait & Tackle/v1.01
    Disallow: /
    User-agent: RepoMonkey
    Disallow: /
    User-agent: Microsoft URL Control
    Disallow: /
    User-agent: Openbot
    Disallow: /
    
    Code (markup):
     
    ImAdmirer, Mar 7, 2012 IP
  18. whitestar

    whitestar Peon

    Messages:
    48
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #18
    Here's some useful info regards robots.txt: seomoz.org/learn-seo/robotstxt
     
    whitestar, Mar 7, 2012 IP