What pages should be there in robots.txt ?

vacationcluster Peon

Messages:: 99

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#1

Hello members ,

i would like to know that which pages should be there in robots.txt of our site , whether it is contact us page , login page , Registration page or other pages ...
Please give me your reply . Waiting for your reply .

Regards,
Suzanne .

vacationcluster, Mar 1, 2012 IP

GMF Well-Known Member

Messages:: 855

Likes Received:: 113

Best Answers:: 19

Trophy Points:: 145

#2

Do you even know the concept of the robots.txt file and its uses?

GMF, Mar 1, 2012 IP

matty_wllson Peon

Messages:: 158

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#3

Robots.txt file is used to restrict or allow the search engines to crawl a website. If you want that search engines crawlers visit your whole site then put the following setting of robots.txt file

User-agent:*
Disallow :

And if you want to restrict the search engines crawlers to visit your login or contact us page then you will put the below setting.

User-agent:*
Disallow : / login or contact us directory

I hope you understand the concept.

Good luck

matty_wllson, Mar 1, 2012 IP

vacationcluster Peon

Messages:: 99

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#4

Yes , it mainly for search engine to make understand that disallow or allow this page

vacationcluster, Mar 1, 2012 IP

imfusa Active Member

Messages:: 694

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 70

#5

Well it depends, which pages you do not want to get indexed. usually the contact page of a website should get indexed, the registration and login should not.

imfusa, Mar 1, 2012 IP

ronniealbert28 Peon

Messages:: 7

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#6

vacationcluster said: ↑

Yes , it mainly for search engine to make understand that disallow or allow this page
Click to expand...

its all depend on search engine algorithm.

ronniealbert28, Mar 4, 2012 IP

C.Rebecca Active Member

Messages:: 1,401

Likes Received:: 11

Best Answers:: 1

Trophy Points:: 65

#7

Pages which you think are not necessary for SE crawling are suppose to include in robots.txt files. e.g. printable of a page or page with dupe content or any other page which you don't want to get crawled.

C.Rebecca, Mar 5, 2012 IP

christajoe Member

Messages:: 44

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 46

#8

The pages those are required not to shown onto the web-page or from the users, are kept in the robots.txt file. The pages that are required for the internal working process of an organization are mainly kept in robots. txt file.

christajoe, Mar 5, 2012 IP

cheenki Peon

Messages:: 59

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#9

Itâ€™s all about you, what you want to get indexed. All up to you what would like to allow or disallow for search engine.

cheenki, Mar 5, 2012 IP

monagupta4u Peon

Messages:: 23

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#10

Those pages should be blocked in the robots.txt that you don't want to be crawled by the search engine.

Thanks,

monagupta4u, Mar 5, 2012 IP

ThePassiveIncomeBlog Active Member

Messages:: 847

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 50

#11

Do those robot.txt identified by all search engines or Google only?

ThePassiveIncomeBlog, Mar 5, 2012 IP

webgnomes Peon

Messages:: 7

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#12

@vacationcluster You should only include files in your robots.txt that you do NOT want search engine crawlers to index. Typically, you only use robots.txt if there is a large section of your site that you don't want search engines to index (e.g., an entire directory). If you only want to keep specific pages out of the index, it's typically easier to use a robots meta tag for those specific pages.

@ThePassiveIncomeBlog Each record in a robots.txt file has a User-agent field, which specifies which search engines the record applies to. For example, User-agent: * specifies that all well-behaved crawlers should respect the corresponding record. If you only want to apply a record to Google, you would use User-agent: Googlebot

For more information about the robots.txt file, read this: http://www.webgnomes.org/blog/robots-txt-file-guide-that-wont-put-you-to-sleep/
For more information about the robots meta tag, read this: http://www.webgnomes.org/blog/robots-meta-tag-definitive-guide/

webgnomes, Mar 6, 2012 IP

DickGomes Peon

Messages:: 26

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#13

It depends on you. Which pages you want to want to get indexed and which you don't.

DickGomes, Mar 6, 2012 IP

ericksteve Peon

Messages:: 66

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#14

Robots.txt should contain all the relevant pages of your site that has some useful information about your company or product. All these pages will be static and no dynamic or gate way pages should be there in Robots.txt. You need not to add privacy policy, terms and conditions and other pages however its up to you whether you want to add contact us page or not. I recommend to add contact us page in Robotx.txt.

ericksteve, Mar 7, 2012 IP

vacationcluster Peon

Messages:: 99

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#15

thanks for your useful and precious information about robots.txt

vacationcluster, Mar 7, 2012 IP

garish Member

Messages:: 97

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 28

#16

The pages which you don't want Search engines to crawl and index.

garish, Mar 7, 2012 IP

ImAdmirer Greenhorn

Messages:: 73

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 18

#17

The following is the robots.txt file content which I use on most of my wordpress sites:


Sitemap: http://www.website.com/sitemap.xml

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /cgi-bin/
Disallow: /temp/
Disallow: /any-other-folder-to-restrict/
Disallow: /wp-login.php
Disallow: /wp-admin/
Disallow: /wp-comments-post.php
Disallow: /wp-commentsrss2.php

User-agent: *
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /*.jpeg$
Disallow: /*.png$
Disallow: /*.zip$
Disallow: /*.doc$
Disallow: /*.exe$
Disallow: /*.pdf$

User-agent: ia_archiver
Disallow: /
User-agent: atSpider 
Disallow: /
User-agent: b2w/0.1
Disallow: /
User-agent: BecomeBot
Disallow: /
User-agent: CheeseBot
Disallow: /
User-agent: CherryPicker
Disallow: /
User-agent: CopyRightCheck
Disallow: /
User-agent: Copernic
Disallow: /
User-agent: Crescent
Disallow: /
User-agent: DSurf
Disallow: /
User-agent: dumbot
Disallow: /
User-agent: EliteSys Entry 
Disallow: /
User-agent: EmailCollector
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: EmailWolf
Disallow: /
User-agent: Enterprise_Search/1.0
Disallow: /
User-agent: Enterprise_Search
Disallow: /
User-agent: es
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: Flaming AttackBot
Disallow: /
User-agent: FreeFind
Disallow: /
User-agent: grub
Disallow: /
User-agent: grub-client
Disallow: /
User-agent: Hatena Antenna
Disallow: /
User-agent: Jetbot
Disallow: /
User-agent: Jetbot/1.0
Disallow: /
User-agent: larbin
Disallow: /
User-agent: Mail Sweeper
Disallow: /
User-agent: munky
Disallow: /
User-agent: naver
Disallow: /
User-agent: NetMechanic
Disallow: /
User-agent: Nutch
Disallow: /
User-agent: OmniExplorer_Bot
Disallow: /
User-agent: Oracle Ultra Search
Disallow: /
User-agent: PerMan
Disallow: /
User-agent: ProWebWalker
Disallow: /
User-agent: psbot
Disallow: /
User-agent: Python-urllib
Disallow: /
User-agent: Radiation Retriever 1.1
Disallow: /
User-agent: Roverbot
Disallow: /
User-agent: searchpreview
Disallow: /
User-agent: SiteSnagger
Disallow: /
User-agent: sootle
Disallow: /
User-agent: Stanford
Disallow: /
User-agent: URL_Spider_Pro
Disallow: /
User-agent: WebBandit
Disallow: /
User-agent: WebEmailExtrac
Disallow: / 
User-agent: WebVac
Disallow: /
User-agent: WebZip
Disallow: /
User-agent: xGet
Disallow: /
User-agent: wGet
Disallow: / 
User-agent: WebWalk 
Disallow: /
User-agent: webvac
Disallow: /
User-agent: WebReaper 
Disallow: /
User-agent: WebMirror
Disallow: /
User-agent: WebFetcher 
Disallow: /
User-agent: WebCopy
Disallow: /
User-agent: webcopier 
Disallow: /
User-agent: WebCatcher
Disallow: / 
User-agent: WebBandit
Disallow: /
User-agent: w3mir
Disallow: /
User-agent: vobsub 
Disallow: /
User-agent: Templeton 
Disallow: /
User-agent: ssearcher100 
Disallow: /
User-agent: SpiderBot
Disallow: /
User-agent: Shai'Hulud 
Disallow: /
User-agent: PBWF
Disallow: /
User-agent: LightningDownload 
Disallow: /
User-agent: KDD Exploror
Disallow: /
User-agent: Jeeves
Disallow: /
User-agent: Internet Explore 
Disallow: /
User-agent: InfoSpiders
Disallow: /
User-agent: httrack
Disallow: /
User-agent: HavIndex 
Disallow: /
User-agent: GetUrl
Disallow: /
User-agent: GetBot
Disallow: / 
User-agent: ESIRover 
Disallow: /
User-agent: Download Wonder 
Disallow: /
User-agent: Collage
Disallow: /
User-agent: LNSpiderguy
Disallow: /
User-agent: Alexibot
Disallow: /
User-agent: Teleport
Disallow: /
User-agent: TeleportPro
Disallow: /
User-agent: Stanford Comp Sci
Disallow: /
User-agent: MIIxpc
Disallow: /
User-agent: Telesoft
Disallow: /
User-agent: Website Quester
Disallow: /
User-agent: moget/2.1
Disallow: /
User-agent: WebZip/4.0
Disallow: /
User-agent: WebStripper
Disallow: /
User-agent: WebSauger
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: NetAnts
Disallow: /
User-agent: Mister PiX
Disallow: /
User-agent: WebAuto
Disallow: /
User-agent: TheNomad
Disallow: /
User-agent: WWW-Collector-E
Disallow: /
User-agent: RMA
Disallow: /
User-agent: libWeb/clsHTTP
Disallow: /
User-agent: asterias
Disallow: /
User-agent: httplib
Disallow: /
User-agent: turingos
Disallow: /
User-agent: spanner
Disallow: /
User-agent: InfoNaviRobot
Disallow: /
User-agent: Harvest/1.5
Disallow: /
User-agent: Bullseye/1.0
Disallow: /
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /
User-agent: CherryPickerSE/1.0
Disallow: /
User-agent: CherryPickerElite/1.0
Disallow: /
User-agent: WebBandit/3.50
Disallow: /
User-agent: NICErsPRO
Disallow: /
User-agent: Microsoft URL Control - 5.01.4511
Disallow: /
User-agent: DittoSpyder
Disallow: /
User-agent: Foobot
Disallow: /
User-agent: SpankBot
Disallow: /
User-agent: BotALot
Disallow: /
User-agent: lwp-trivial/1.34
Disallow: /
User-agent: lwp-trivial
Disallow: /
User-agent: BunnySlippers
Disallow: /
User-agent: Microsoft URL Control - 6.00.8169
Disallow: /
User-agent: URLy Warning
Disallow: /
User-agent: Wget/1.6
Disallow: /
User-agent: Wget/1.5.3
Disallow: /
User-agent: Wget
Disallow: /
User-agent: LinkWalker
Disallow: /
User-agent: cosmos
Disallow: /
User-agent: moget
Disallow: /
User-agent: hloader
Disallow: /
User-agent: URL Control
Disallow: /
User-agent: Zeus Link Scout
Disallow: /
User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /
User-agent: Webster Pro
Disallow: /
User-agent: EroCrawler
Disallow: /
User-agent: LinkScan/8.1a Unix
Disallow: /
User-agent: Keyword Density/0.9
Disallow: /
User-agent: Kenjin Spider
Disallow: /
User-agent: Iron33/1.0.2
Disallow: /
User-agent: Bookmark search tool
Disallow: /
User-agent: GetRight/4.2
Disallow: /
User-agent: FairAd Client
Disallow: /
User-agent: Gaisbot
Disallow: /
User-agent: humanlinks
Disallow: /
User-agent: LinkextractorPro
Disallow: /
User-agent: Offline Explorer
Disallow: /
User-agent: Mata Hari
Disallow: /
User-agent: LexiBot
Disallow: /
User-agent: Web Image Collector
Disallow: /
User-agent: The Intraformant
Disallow: /
User-agent: True_Robot/1.0
Disallow: /
User-agent: True_Robot
Disallow: /
User-agent: BlowFish/1.0
Disallow: /
User-agent: JennyBot
Disallow: /
User-agent: MIIxpc/4.2
Disallow: /
User-agent: BuiltBotTough
Disallow: /
User-agent: ProPowerBot/2.14
Disallow: /
User-agent: BackDoorBot/1.0
Disallow: /
User-agent: toCrawl/UrlDispatcher
Disallow: /
User-agent: WebEnhancer
Disallow: /
User-agent: suzuran
Disallow: /
User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /
User-agent: VCI
Disallow: /
User-agent: Szukacz/1.4 
Disallow: /
User-agent: QueryN Metasearch
Disallow: /
User-agent: Openfind 
Disallow: /
User-agent: Zeus
Disallow: /
User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /
User-agent: RepoMonkey
Disallow: /
User-agent: Microsoft URL Control
Disallow: /
User-agent: Openbot
Disallow: /

Code (markup):

ImAdmirer, Mar 7, 2012 IP

whitestar Peon

Messages:: 48

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#18

Here's some useful info regards robots.txt: seomoz.org/learn-seo/robotstxt

whitestar, Mar 7, 2012 IP

Log in or Sign up

What pages should be there in robots.txt ?

vacationcluster Peon

GMF Well-Known Member

matty_wllson Peon

vacationcluster Peon

imfusa Active Member

ronniealbert28 Peon

C.Rebecca Active Member

christajoe Member

cheenki Peon

monagupta4u Peon

ThePassiveIncomeBlog Active Member

webgnomes Peon

DickGomes Peon

ericksteve Peon

vacationcluster Peon

garish Member

ImAdmirer Greenhorn

whitestar Peon

Useful Searches