Google Crawl and Robots.txt

Discussion in 'Google Analytics' started by my_way, Nov 18, 2011.

  1. #1
    Can you suggest me, what kind of file/folder that we have to disallow at robots.txt, so Google or other search engines can not to access or crawl our site's private content.
     
    my_way, Nov 18, 2011 IP
  2. tripbuilder

    tripbuilder Peon

    Messages:
    72
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    you can disallow those pages in robots.txt like amin panal, or any files that you dont want to people see that is just for your use only.
     
    tripbuilder, Nov 18, 2011 IP
  3. Mike Hussey

    Mike Hussey Peon

    Messages:
    24
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Any Kind of file and folders are easily disallow by Crwaler.

    User agent *
    Dis Allow: /.html/
    Dis Allow: /about us.html/
     
    Mike Hussey, Nov 22, 2011 IP
  4. born_star16

    born_star16 Member

    Messages:
    297
    Likes Received:
    0
    Best Answers:
    1
    Trophy Points:
    36
    #4
    You can disallow login, payment, admin pages, etc. to the crawler.
     
    born_star16, Nov 22, 2011 IP
  5. khemraj

    khemraj Well-Known Member

    Messages:
    400
    Likes Received:
    13
    Best Answers:
    1
    Trophy Points:
    125
    #5
    Your private pages and least important pages can be kept out of the reach of search engines
     
    khemraj, Nov 22, 2011 IP
  6. iccube

    iccube Peon

    Messages:
    220
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    @ Mike Hussey
    I agree with your details about hiding private things from searcg engine like login ,payment
     
    iccube, Nov 22, 2011 IP
  7. khemraj

    khemraj Well-Known Member

    Messages:
    400
    Likes Received:
    13
    Best Answers:
    1
    Trophy Points:
    125
    #7
    Payment is ok but why "Login"?
     
    khemraj, Nov 23, 2011 IP
  8. udaypal

    udaypal Peon

    Messages:
    201
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    It depends on the webmaster or owner of the websites which files and folders they want not to crawl or disclose their privacy to the non users.Mainly it is used for security and privacy purpose.
     
    udaypal, Nov 30, 2011 IP
  9. ryanlawrence171

    ryanlawrence171 Peon

    Messages:
    184
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    You can disallow those pages which you would not like to be crawl in search engine ........ like your website landing page etc....
     
    ryanlawrence171, Dec 5, 2011 IP
  10. fast_cashloan

    fast_cashloan Peon

    Messages:
    59
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    it's depend on you , which one you want to show to google & which one not.mainly we put robots on payment pages for safety purpose.thanks.
     
    fast_cashloan, Dec 5, 2011 IP
  11. brightstartour2011

    brightstartour2011 Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #11
    shopping cart, admin, payment gateway
     
    brightstartour2011, Dec 5, 2011 IP
  12. andrusimonds11

    andrusimonds11 Greenhorn

    Messages:
    97
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #12
    Yes it is right.
     
    andrusimonds11, Dec 5, 2011 IP
  13. immu

    immu Active Member

    Messages:
    69
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    91
    #13
    User agent * <-- Every crawl bot is invited
    Dis Allow: /about us.html/ <-- this page shall not crawl
    Dis Allow: /login.html/ <-- this page shall not crawl
     
    immu, Dec 6, 2011 IP
  14. my_way

    my_way Peon

    Messages:
    103
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Finally i collect from other site, can you review it ? its correct ?

    User-agent: Alexibot
    Disallow: /

    User-agent: Aqua_Products
    Disallow: /

    User-agent: asterias
    Disallow: /

    User-agent: b2w/0.1
    Disallow: /

    User-agent: BackDoorBot/1.0
    Disallow: /

    User-agent: BlowFish/1.0
    Disallow: /

    User-agent: Bookmark search tool
    Disallow: /

    User-agent: BotALot
    Disallow: /

    User-agent: BotRightHere
    Disallow: /

    User-agent: BuiltBotTough
    Disallow: /

    User-agent: Bullseye/1.0
    Disallow: /

    User-agent: BunnySlippers
    Disallow: /

    User-agent: CheeseBot
    Disallow: /

    User-agent: CherryPicker
    Disallow: /

    User-agent: CherryPickerElite/1.0
    Disallow: /

    User-agent: CherryPickerSE/1.0
    Disallow: /

    User-agent: Copernic
    Disallow: /

    User-agent: CopyRightCheck
    Disallow: /

    User-agent: cosmos
    Disallow: /

    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    Disallow: /

    User-agent: Crescent
    Disallow: /

    User-agent: DittoSpyder
    Disallow: /

    User-agent: EmailCollector
    Disallow: /

    User-agent: EmailSiphon
    Disallow: /

    User-agent: EmailWolf
    Disallow: /

    User-agent: EroCrawler
    Disallow: /

    User-agent: ExtractorPro
    Disallow: /

    User-agent: FairAd Client
    Disallow: /

    User-agent: Flaming AttackBot
    Disallow: /

    User-agent: Foobot
    Disallow: /

    User-agent: Gaisbot
    Disallow: /

    User-agent: GetRight/4.2
    Disallow: /

    User-agent: Harvest/1.5
    Disallow: /

    User-agent: hloader
    Disallow: /

    User-agent: httplib
    Disallow: /

    User-agent: HTTrack 3.0
    Disallow: /

    User-agent: humanlinks
    Disallow: /

    User-agent: InfoNaviRobot
    Disallow: /

    User-agent: Iron33/1.0.2
    Disallow: /

    User-agent: JennyBot
    Disallow: /

    User-agent: Kenjin Spider
    Disallow: /

    User-agent: Keyword Density/0.9
    Disallow: /

    User-agent: larbin
    Disallow: /

    User-agent: LexiBot
    Disallow: /

    User-agent: libWeb/clsHTTP
    Disallow: /

    User-agent: LinkextractorPro
    Disallow: /

    User-agent: LinkScan/8.1a Unix
    Disallow: /

    User-agent: LinkWalker
    Disallow: /

    User-agent: LNSpiderguy
    Disallow: /

    User-agent: lwp-trivial/1.34
    Disallow: /

    User-agent: lwp-trivial
    Disallow: /

    User-agent: Mata Hari
    Disallow: /

    User-agent: Microsoft URL Control - 5.01.4511
    Disallow: /

    User-agent: Microsoft URL Control - 6.00.8169
    Disallow: /

    User-agent: Microsoft URL Control
    Disallow: /

    User-agent: MIIxpc/4.2
    Disallow: /

    User-agent: MIIxpc
    Disallow: /

    User-agent: Mister PiX
    Disallow: /

    User-agent: moget/2.1
    Disallow: /

    User-agent: moget
    Disallow: /

    User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
    Disallow: /

    User-agent: MSIECrawler
    Disallow: /

    User-agent: NetAnts
    Disallow: /

    User-agent: NICErsPRO
    Disallow: /

    User-agent: Offline Explorer
    Disallow: /

    User-agent: Openbot
    Disallow: /

    User-agent: Openfind data gatherer
    Disallow: /

    User-agent: Openfind
    Disallow: /

    User-agent: Oracle Ultra Search
    Disallow: /

    User-agent: PerMan
    Disallow: /

    User-agent: ProPowerBot/2.14
    Disallow: /

    User-agent: ProWebWalker
    Disallow: /

    User-agent: psbot
    Disallow: /

    User-agent: Python-urllib
    Disallow: /

    User-agent: QueryN Metasearch
    Disallow: /

    User-agent: Radiation Retriever 1.1
    Disallow: /

    User-agent: RepoMonkey Bait & Tackle/v1.01
    Disallow: /

    User-agent: RepoMonkey
    Disallow: /

    User-agent: RMA
    Disallow: /

    User-agent: searchpreview
    Disallow: /

    User-agent: SiteSnagger
    Disallow: /

    User-agent: SpankBot
    Disallow: /

    User-agent: spanner
    Disallow: /

    User-agent: suzuran
    Disallow: /

    User-agent: Szukacz/1.4
    Disallow: /

    User-agent: Teleport
    Disallow: /

    User-agent: TeleportPro
    Disallow: /

    User-agent: Telesoft
    Disallow: /

    User-agent: The Intraformant
    Disallow: /

    User-agent: TheNomad
    Disallow: /

    User-agent: TightTwatBot
    Disallow: /

    User-agent: toCrawl/UrlDispatcher
    Disallow: /

    User-agent: True_Robot/1.0
    Disallow: /

    User-agent: True_Robot
    Disallow: /

    User-agent: turingos
    Disallow: /

    User-agent: TurnitinBot/1.5
    Disallow: /

    User-agent: TurnitinBot
    Disallow: /

    User-agent: URL Control
    Disallow: /

    User-agent: URL_Spider_Pro
    Disallow: /

    User-agent: URLy Warning
    Disallow: /

    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow: /

    User-agent: VCI
    Disallow: /

    User-agent: Web Image Collector
    Disallow: /

    User-agent: WebAuto
    Disallow: /

    User-agent: WebBandit/3.50
    Disallow: /

    User-agent: WebBandit
    Disallow: /

    User-agent: WebCapture 2.0
    Disallow: /

    User-agent: WebCopier v.2.2
    Disallow: /

    User-agent: WebCopier v3.2a
    Disallow: /

    User-agent: WebCopier
    Disallow: /

    User-agent: WebEnhancer
    Disallow: /

    User-agent: WebSauger
    Disallow: /

    User-agent: Website Quester
    Disallow: /

    User-agent: Webster Pro
    Disallow: /

    User-agent: WebStripper
    Disallow: /

    User-agent: WebZip/4.0
    Disallow: /

    User-agent: WebZIP/4.21
    Disallow: /

    User-agent: WebZIP/5.0
    Disallow: /

    User-agent: WebZip
    Disallow: /

    User-agent: Wget/1.5.3
    Disallow: /

    User-agent: Wget/1.6
    Disallow: /

    User-agent: Wget
    Disallow: /

    User-agent: wget
    Disallow: /

    User-agent: WWW-Collector-E
    Disallow: /

    User-agent: Xenu's Link Sleuth 1.1c
    Disallow: /

    User-agent: Xenu's
    Disallow: /

    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    Disallow: /

    User-agent: Zeus Link Scout
    Disallow: /

    User-agent: Zeus
    Disallow: /

    User-agent: Adsbot-Google
    Disallow:

    User-agent: Googlebot
    Disallow:

    User-agent: Mediapartners-Google
    Disallow:

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-content/plugins/
    Disallow: /wp-content/cache/
    Disallow: /wp-content/themes/
    Disallow: /wp-login.php
    Disallow: /wp-register.php
     
    my_way, Dec 12, 2011 IP
  15. rizecorp

    rizecorp Peon

    Messages:
    79
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #15
    Hi,

    Currently, robots are used in order to prevent automatic registration of users, logging, and use of Authentication code. It depends on your website niche. List out the pages where user's private data is used and stop access that part of website using robot.txt.

    Thanks.
     
    rizecorp, Dec 12, 2011 IP
  16. bijoy2012

    bijoy2012 Peon

    Messages:
    5
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #16
    You have to disallow any private data page,
     
    bijoy2012, Dec 27, 2011 IP