Can you suggest me, what kind of file/folder that we have to disallow at robots.txt, so Google or other search engines can not to access or crawl our site's private content.
you can disallow those pages in robots.txt like amin panal, or any files that you dont want to people see that is just for your use only.
Any Kind of file and folders are easily disallow by Crwaler. User agent * Dis Allow: /.html/ Dis Allow: /about us.html/
@ Mike Hussey I agree with your details about hiding private things from searcg engine like login ,payment
It depends on the webmaster or owner of the websites which files and folders they want not to crawl or disclose their privacy to the non users.Mainly it is used for security and privacy purpose.
You can disallow those pages which you would not like to be crawl in search engine ........ like your website landing page etc....
it's depend on you , which one you want to show to google & which one not.mainly we put robots on payment pages for safety purpose.thanks.
User agent * <-- Every crawl bot is invited Dis Allow: /about us.html/ <-- this page shall not crawl Dis Allow: /login.html/ <-- this page shall not crawl
Finally i collect from other site, can you review it ? its correct ? User-agent: Alexibot Disallow: / User-agent: Aqua_Products Disallow: / User-agent: asterias Disallow: / User-agent: b2w/0.1 Disallow: / User-agent: BackDoorBot/1.0 Disallow: / User-agent: BlowFish/1.0 Disallow: / User-agent: Bookmark search tool Disallow: / User-agent: BotALot Disallow: / User-agent: BotRightHere Disallow: / User-agent: BuiltBotTough Disallow: / User-agent: Bullseye/1.0 Disallow: / User-agent: BunnySlippers Disallow: / User-agent: CheeseBot Disallow: / User-agent: CherryPicker Disallow: / User-agent: CherryPickerElite/1.0 Disallow: / User-agent: CherryPickerSE/1.0 Disallow: / User-agent: Copernic Disallow: / User-agent: CopyRightCheck Disallow: / User-agent: cosmos Disallow: / User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 Disallow: / User-agent: Crescent Disallow: / User-agent: DittoSpyder Disallow: / User-agent: EmailCollector Disallow: / User-agent: EmailSiphon Disallow: / User-agent: EmailWolf Disallow: / User-agent: EroCrawler Disallow: / User-agent: ExtractorPro Disallow: / User-agent: FairAd Client Disallow: / User-agent: Flaming AttackBot Disallow: / User-agent: Foobot Disallow: / User-agent: Gaisbot Disallow: / User-agent: GetRight/4.2 Disallow: / User-agent: Harvest/1.5 Disallow: / User-agent: hloader Disallow: / User-agent: httplib Disallow: / User-agent: HTTrack 3.0 Disallow: / User-agent: humanlinks Disallow: / User-agent: InfoNaviRobot Disallow: / User-agent: Iron33/1.0.2 Disallow: / User-agent: JennyBot Disallow: / User-agent: Kenjin Spider Disallow: / User-agent: Keyword Density/0.9 Disallow: / User-agent: larbin Disallow: / User-agent: LexiBot Disallow: / User-agent: libWeb/clsHTTP Disallow: / User-agent: LinkextractorPro Disallow: / User-agent: LinkScan/8.1a Unix Disallow: / User-agent: LinkWalker Disallow: / User-agent: LNSpiderguy Disallow: / User-agent: lwp-trivial/1.34 Disallow: / User-agent: lwp-trivial Disallow: / User-agent: Mata Hari Disallow: / User-agent: Microsoft URL Control - 5.01.4511 Disallow: / User-agent: Microsoft URL Control - 6.00.8169 Disallow: / User-agent: Microsoft URL Control Disallow: / User-agent: MIIxpc/4.2 Disallow: / User-agent: MIIxpc Disallow: / User-agent: Mister PiX Disallow: / User-agent: moget/2.1 Disallow: / User-agent: moget Disallow: / User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95) Disallow: / User-agent: MSIECrawler Disallow: / User-agent: NetAnts Disallow: / User-agent: NICErsPRO Disallow: / User-agent: Offline Explorer Disallow: / User-agent: Openbot Disallow: / User-agent: Openfind data gatherer Disallow: / User-agent: Openfind Disallow: / User-agent: Oracle Ultra Search Disallow: / User-agent: PerMan Disallow: / User-agent: ProPowerBot/2.14 Disallow: / User-agent: ProWebWalker Disallow: / User-agent: psbot Disallow: / User-agent: Python-urllib Disallow: / User-agent: QueryN Metasearch Disallow: / User-agent: Radiation Retriever 1.1 Disallow: / User-agent: RepoMonkey Bait & Tackle/v1.01 Disallow: / User-agent: RepoMonkey Disallow: / User-agent: RMA Disallow: / User-agent: searchpreview Disallow: / User-agent: SiteSnagger Disallow: / User-agent: SpankBot Disallow: / User-agent: spanner Disallow: / User-agent: suzuran Disallow: / User-agent: Szukacz/1.4 Disallow: / User-agent: Teleport Disallow: / User-agent: TeleportPro Disallow: / User-agent: Telesoft Disallow: / User-agent: The Intraformant Disallow: / User-agent: TheNomad Disallow: / User-agent: TightTwatBot Disallow: / User-agent: toCrawl/UrlDispatcher Disallow: / User-agent: True_Robot/1.0 Disallow: / User-agent: True_Robot Disallow: / User-agent: turingos Disallow: / User-agent: TurnitinBot/1.5 Disallow: / User-agent: TurnitinBot Disallow: / User-agent: URL Control Disallow: / User-agent: URL_Spider_Pro Disallow: / User-agent: URLy Warning Disallow: / User-agent: VCI WebViewer VCI WebViewer Win32 Disallow: / User-agent: VCI Disallow: / User-agent: Web Image Collector Disallow: / User-agent: WebAuto Disallow: / User-agent: WebBandit/3.50 Disallow: / User-agent: WebBandit Disallow: / User-agent: WebCapture 2.0 Disallow: / User-agent: WebCopier v.2.2 Disallow: / User-agent: WebCopier v3.2a Disallow: / User-agent: WebCopier Disallow: / User-agent: WebEnhancer Disallow: / User-agent: WebSauger Disallow: / User-agent: Website Quester Disallow: / User-agent: Webster Pro Disallow: / User-agent: WebStripper Disallow: / User-agent: WebZip/4.0 Disallow: / User-agent: WebZIP/4.21 Disallow: / User-agent: WebZIP/5.0 Disallow: / User-agent: WebZip Disallow: / User-agent: Wget/1.5.3 Disallow: / User-agent: Wget/1.6 Disallow: / User-agent: Wget Disallow: / User-agent: wget Disallow: / User-agent: WWW-Collector-E Disallow: / User-agent: Xenu's Link Sleuth 1.1c Disallow: / User-agent: Xenu's Disallow: / User-agent: Zeus 32297 Webster Pro V2.9 Win32 Disallow: / User-agent: Zeus Link Scout Disallow: / User-agent: Zeus Disallow: / User-agent: Adsbot-Google Disallow: User-agent: Googlebot Disallow: User-agent: Mediapartners-Google Disallow: User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-content/plugins/ Disallow: /wp-content/cache/ Disallow: /wp-content/themes/ Disallow: /wp-login.php Disallow: /wp-register.php
Hi, Currently, robots are used in order to prevent automatic registration of users, logging, and use of Authentication code. It depends on your website niche. List out the pages where user's private data is used and stop access that part of website using robot.txt. Thanks.