Hello members , i would like to know that which pages should be there in robots.txt of our site , whether it is contact us page , login page , Registration page or other pages ... Please give me your reply . Waiting for your reply . Regards, Suzanne .
Robots.txt file is used to restrict or allow the search engines to crawl a website. If you want that search engines crawlers visit your whole site then put the following setting of robots.txt file User-agent:* Disallow : And if you want to restrict the search engines crawlers to visit your login or contact us page then you will put the below setting. User-agent:* Disallow : / login or contact us directory I hope you understand the concept. Good luck
Well it depends, which pages you do not want to get indexed. usually the contact page of a website should get indexed, the registration and login should not.
Pages which you think are not necessary for SE crawling are suppose to include in robots.txt files. e.g. printable of a page or page with dupe content or any other page which you don't want to get crawled.
The pages those are required not to shown onto the web-page or from the users, are kept in the robots.txt file. The pages that are required for the internal working process of an organization are mainly kept in robots. txt file.
It’s all about you, what you want to get indexed. All up to you what would like to allow or disallow for search engine.
Those pages should be blocked in the robots.txt that you don't want to be crawled by the search engine. Thanks,
@vacationcluster You should only include files in your robots.txt that you do NOT want search engine crawlers to index. Typically, you only use robots.txt if there is a large section of your site that you don't want search engines to index (e.g., an entire directory). If you only want to keep specific pages out of the index, it's typically easier to use a robots meta tag for those specific pages. @ThePassiveIncomeBlog Each record in a robots.txt file has a User-agent field, which specifies which search engines the record applies to. For example, User-agent: * specifies that all well-behaved crawlers should respect the corresponding record. If you only want to apply a record to Google, you would use User-agent: Googlebot For more information about the robots.txt file, read this: http://www.webgnomes.org/blog/robots-txt-file-guide-that-wont-put-you-to-sleep/ For more information about the robots meta tag, read this: http://www.webgnomes.org/blog/robots-meta-tag-definitive-guide/
Robots.txt should contain all the relevant pages of your site that has some useful information about your company or product. All these pages will be static and no dynamic or gate way pages should be there in Robots.txt. You need not to add privacy policy, terms and conditions and other pages however its up to you whether you want to add contact us page or not. I recommend to add contact us page in Robotx.txt.
The following is the robots.txt file content which I use on most of my wordpress sites: Sitemap: http://www.website.com/sitemap.xml User-agent: Mediapartners-Google Disallow: User-agent: * Disallow: /cgi-bin/ Disallow: /temp/ Disallow: /any-other-folder-to-restrict/ Disallow: /wp-login.php Disallow: /wp-admin/ Disallow: /wp-comments-post.php Disallow: /wp-commentsrss2.php User-agent: * Disallow: /*.gif$ Disallow: /*.jpg$ Disallow: /*.jpeg$ Disallow: /*.png$ Disallow: /*.zip$ Disallow: /*.doc$ Disallow: /*.exe$ Disallow: /*.pdf$ User-agent: ia_archiver Disallow: / User-agent: atSpider Disallow: / User-agent: b2w/0.1 Disallow: / User-agent: BecomeBot Disallow: / User-agent: CheeseBot Disallow: / User-agent: CherryPicker Disallow: / User-agent: CopyRightCheck Disallow: / User-agent: Copernic Disallow: / User-agent: Crescent Disallow: / User-agent: DSurf Disallow: / User-agent: dumbot Disallow: / User-agent: EliteSys Entry Disallow: / User-agent: EmailCollector Disallow: / User-agent: EmailSiphon Disallow: / User-agent: EmailWolf Disallow: / User-agent: Enterprise_Search/1.0 Disallow: / User-agent: Enterprise_Search Disallow: / User-agent: es Disallow: / User-agent: ExtractorPro Disallow: / User-agent: Flaming AttackBot Disallow: / User-agent: FreeFind Disallow: / User-agent: grub Disallow: / User-agent: grub-client Disallow: / User-agent: Hatena Antenna Disallow: / User-agent: Jetbot Disallow: / User-agent: Jetbot/1.0 Disallow: / User-agent: larbin Disallow: / User-agent: Mail Sweeper Disallow: / User-agent: munky Disallow: / User-agent: naver Disallow: / User-agent: NetMechanic Disallow: / User-agent: Nutch Disallow: / User-agent: OmniExplorer_Bot Disallow: / User-agent: Oracle Ultra Search Disallow: / User-agent: PerMan Disallow: / User-agent: ProWebWalker Disallow: / User-agent: psbot Disallow: / User-agent: Python-urllib Disallow: / User-agent: Radiation Retriever 1.1 Disallow: / User-agent: Roverbot Disallow: / User-agent: searchpreview Disallow: / User-agent: SiteSnagger Disallow: / User-agent: sootle Disallow: / User-agent: Stanford Disallow: / User-agent: URL_Spider_Pro Disallow: / User-agent: WebBandit Disallow: / User-agent: WebEmailExtrac Disallow: / User-agent: WebVac Disallow: / User-agent: WebZip Disallow: / User-agent: xGet Disallow: / User-agent: wGet Disallow: / User-agent: WebWalk Disallow: / User-agent: webvac Disallow: / User-agent: WebReaper Disallow: / User-agent: WebMirror Disallow: / User-agent: WebFetcher Disallow: / User-agent: WebCopy Disallow: / User-agent: webcopier Disallow: / User-agent: WebCatcher Disallow: / User-agent: WebBandit Disallow: / User-agent: w3mir Disallow: / User-agent: vobsub Disallow: / User-agent: Templeton Disallow: / User-agent: ssearcher100 Disallow: / User-agent: SpiderBot Disallow: / User-agent: Shai'Hulud Disallow: / User-agent: PBWF Disallow: / User-agent: LightningDownload Disallow: / User-agent: KDD Exploror Disallow: / User-agent: Jeeves Disallow: / User-agent: Internet Explore Disallow: / User-agent: InfoSpiders Disallow: / User-agent: httrack Disallow: / User-agent: HavIndex Disallow: / User-agent: GetUrl Disallow: / User-agent: GetBot Disallow: / User-agent: ESIRover Disallow: / User-agent: Download Wonder Disallow: / User-agent: Collage Disallow: / User-agent: LNSpiderguy Disallow: / User-agent: Alexibot Disallow: / User-agent: Teleport Disallow: / User-agent: TeleportPro Disallow: / User-agent: Stanford Comp Sci Disallow: / User-agent: MIIxpc Disallow: / User-agent: Telesoft Disallow: / User-agent: Website Quester Disallow: / User-agent: moget/2.1 Disallow: / User-agent: WebZip/4.0 Disallow: / User-agent: WebStripper Disallow: / User-agent: WebSauger Disallow: / User-agent: WebCopier Disallow: / User-agent: NetAnts Disallow: / User-agent: Mister PiX Disallow: / User-agent: WebAuto Disallow: / User-agent: TheNomad Disallow: / User-agent: WWW-Collector-E Disallow: / User-agent: RMA Disallow: / User-agent: libWeb/clsHTTP Disallow: / User-agent: asterias Disallow: / User-agent: httplib Disallow: / User-agent: turingos Disallow: / User-agent: spanner Disallow: / User-agent: InfoNaviRobot Disallow: / User-agent: Harvest/1.5 Disallow: / User-agent: Bullseye/1.0 Disallow: / User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 Disallow: / User-agent: CherryPickerSE/1.0 Disallow: / User-agent: CherryPickerElite/1.0 Disallow: / User-agent: WebBandit/3.50 Disallow: / User-agent: NICErsPRO Disallow: / User-agent: Microsoft URL Control - 5.01.4511 Disallow: / User-agent: DittoSpyder Disallow: / User-agent: Foobot Disallow: / User-agent: SpankBot Disallow: / User-agent: BotALot Disallow: / User-agent: lwp-trivial/1.34 Disallow: / User-agent: lwp-trivial Disallow: / User-agent: BunnySlippers Disallow: / User-agent: Microsoft URL Control - 6.00.8169 Disallow: / User-agent: URLy Warning Disallow: / User-agent: Wget/1.6 Disallow: / User-agent: Wget/1.5.3 Disallow: / User-agent: Wget Disallow: / User-agent: LinkWalker Disallow: / User-agent: cosmos Disallow: / User-agent: moget Disallow: / User-agent: hloader Disallow: / User-agent: URL Control Disallow: / User-agent: Zeus Link Scout Disallow: / User-agent: Zeus 32297 Webster Pro V2.9 Win32 Disallow: / User-agent: Webster Pro Disallow: / User-agent: EroCrawler Disallow: / User-agent: LinkScan/8.1a Unix Disallow: / User-agent: Keyword Density/0.9 Disallow: / User-agent: Kenjin Spider Disallow: / User-agent: Iron33/1.0.2 Disallow: / User-agent: Bookmark search tool Disallow: / User-agent: GetRight/4.2 Disallow: / User-agent: FairAd Client Disallow: / User-agent: Gaisbot Disallow: / User-agent: humanlinks Disallow: / User-agent: LinkextractorPro Disallow: / User-agent: Offline Explorer Disallow: / User-agent: Mata Hari Disallow: / User-agent: LexiBot Disallow: / User-agent: Web Image Collector Disallow: / User-agent: The Intraformant Disallow: / User-agent: True_Robot/1.0 Disallow: / User-agent: True_Robot Disallow: / User-agent: BlowFish/1.0 Disallow: / User-agent: JennyBot Disallow: / User-agent: MIIxpc/4.2 Disallow: / User-agent: BuiltBotTough Disallow: / User-agent: ProPowerBot/2.14 Disallow: / User-agent: BackDoorBot/1.0 Disallow: / User-agent: toCrawl/UrlDispatcher Disallow: / User-agent: WebEnhancer Disallow: / User-agent: suzuran Disallow: / User-agent: VCI WebViewer VCI WebViewer Win32 Disallow: / User-agent: VCI Disallow: / User-agent: Szukacz/1.4 Disallow: / User-agent: QueryN Metasearch Disallow: / User-agent: Openfind Disallow: / User-agent: Zeus Disallow: / User-agent: RepoMonkey Bait & Tackle/v1.01 Disallow: / User-agent: RepoMonkey Disallow: / User-agent: Microsoft URL Control Disallow: / User-agent: Openbot Disallow: / Code (markup):