Hello, I understand what the robots.txt file is for but I am a bit lost on how to add one and I have a site with this message when I attempt to use a validator. We're sorry, this robots.txt does NOT validate. Warnings Detected: 391 Errors Detected: 415 First post here. Thanks for the help
Hi mcsp -- and welcome to DP! you create a robots.txt just as every other file (like your index.html), and upload it to the root dir of your website. you can check it with http://www.yourwebsitename.com/robots.txt if you do not want to see that error message, just create an empty file (0 bytes) and upload it as robots.txt
Hey thanks for the help ! So it is not in the index page html ? It is a .txt file on the root. Much less brain damage then I thought. Thanks again !
A basic robots.txt file looks like this: User-agent: * Disallow: Code (markup): meaning ALL spiders (*) "disallow" nothing (allow/index everything). More information Official robots.txt standards site A robots.txt tutorial A robots.txt syntax checker A robots.txt validator
Well I have to say I am impressed with the response. This looks like a great group here. Happy to have stumbled in. Been lurking for a year. Thanks again
These are good: http://www.internet-search-engines-faq.com/bad-robots.shtml http://www.internet-search-engines-faq.com/robots-txt.shtml http://tool.motoricerca.info/robots-checker.phtml http://www.searchengineworld.com/cgi-bin/robotcheck.cgi
Here is an example that we used before: # Robots.txt file from http://www.website.com # # Bans from text, images and graphics = just add a note # User-agent: * User-agent: alexa.com User-agent: archive.org User-agent: ia_archiver User-agent: Alexibot User-agent: Aqua_Products User-agent: BackDoorBot User-agent: BackDoorBot/1.0 User-agent: Black.Hole User-agent: BlackWidow User-agent: BlowFish User-agent: BlowFish/1.0 User-agent: Bookmark search tool User-agent: Bot mailto:craftbot@yahoo.com User-agent: BotALot User-agent: BotRightHere User-agent: BuiltBotTough User-agent: Bullseye User-agent: Bullseye/1.0 User-agent: BunnySlippers User-agent: Cegbfeieh User-agent: CheeseBot User-agent: CherryPicker User-agent: CherryPickerElite/1.0 User-agent: CherryPickerSE/1.0 User-agent: ChinaClaw User-agent: Copernic User-agent: CopyRightCheck User-agent: Crescent User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 User-agent: Custo User-agent: DISCo User-agent: DISCo Pump 3.0 User-agent: DISCo Pump 3.2 User-agent: DISCoFinder User-agent: DittoSpyder User-agent: Download Demon User-agent: Download Demon/3.2.0.8 User-agent: Download Demon/3.5.0.11 User-agent: EirGrabber User-agent: EmailCollector User-agent: EmailSiphon User-agent: EmailWolf User-agent: EroCrawler User-agent: Express WebPictures User-agent: Express WebPictures (www.express-soft.com) User-agent: ExtractorPro User-agent: EyeNetIE User-agent: FairAd Client User-agent: Flaming AttackBot User-agent: FlashGet User-agent: FlashGet WebWasher 3.2 User-agent: Foobot User-agent: FrontPage User-agent: FrontPage [NC,OR] User-agent: Gaisbot User-agent: GetRight User-agent: GetRight/2.11 User-agent: GetRight/3.1 User-agent: GetRight/3.2 User-agent: GetRight/3.3 User-agent: GetRight/3.3.3 User-agent: GetRight/3.3.4 User-agent: GetRight/4.0.0 User-agent: GetRight/4.1.0 User-agent: GetRight/4.1.1 User-agent: GetRight/4.1.2 User-agent: GetRight/4.2 User-agent: GetRight/4.2b (Portuguxeas) User-agent: GetRight/4.2c User-agent: GetRight/4.3 User-agent: GetRight/4.5 User-agent: GetRight/4.5a User-agent: GetRight/4.5b User-agent: GetRight/4.5b1 User-agent: GetRight/4.5b2 User-agent: GetRight/4.5b3 User-agent: GetRight/4.5b6 User-agent: GetRight/4.5b7 User-agent: GetRight/4.5c User-agent: GetRight/4.5d User-agent: GetRight/4.5e User-agent: GetRight/5.0beta1 User-agent: GetRight/5.0beta2 User-agent: GetWeb! User-agent: Go!Zilla User-agent: Go!Zilla (www.gozilla.com) User-agent: Go!Zilla 3.3 (www.gozilla.com) User-agent: Go!Zilla 3.5 (www.gozilla.com) User-agent: Go-Ahead-Got-It User-agent: Googlebot User-agent: Googlebot-Image User-agent: GrabNet User-agent: Grafula User-agent: HMView User-agent: HTTrack User-agent: HTTrack 3.0 User-agent: HTTrack [NC,OR] User-agent: Harvest User-agent: Harvest/1.5 User-agent: Image Stripper User-agent: Image Sucker User-agent: Indy Library User-agent: Indy Library [NC,OR] User-agent: InfoNaviRobot User-agent: InterGET User-agent: Internet Ninja User-agent: Internet Ninja 4.0 User-agent: Internet Ninja 5.0 User-agent: Internet Ninja 6.0 User-agent: Iron33/1.0.2 User-agent: JOC Web Spider User-agent: JennyBot User-agent: JetCar User-agent: Kenjin Spider User-agent: Kenjin.Spider User-agent: Keyword Density/0.9 User-agent: Keyword.Density User-agent: LNSpiderguy User-agent: LeechFTP User-agent: LexiBot User-agent: LinkScan/8.1a Unix User-agent: LinkScan/8.1a.Unix User-agent: LinkWalker User-agent: LinkextractorPro User-agent: MIDown tool User-agent: MIIxpc User-agent: MIIxpc/4.2 User-agent: MSIECrawler User-agent: Mass Downloader User-agent: Mass Downloader/2.2 User-agent: Mata Hari User-agent: Mata.Hari User-agent: Microsoft URL Control User-agent: Microsoft URL Control - 5.01.4511 User-agent: Microsoft URL Control - 6.00.8169 User-agent: Microsoft.URL User-agent: Mister PiX User-agent: Mister PiX version.dll User-agent: Mister Pix II 2.01 User-agent: Mister Pix II 2.02a User-agent: Mister.PiX User-agent: NICErsPRO User-agent: NPBot User-agent: NPbot User-agent: Navroad User-agent: NearSite User-agent: Net Vampire User-agent: Net Vampire/3.0 User-agent: NetAnts User-agent: NetAnts/1.10 User-agent: NetAnts/1.23 User-agent: NetAnts/1.24 User-agent: NetAnts/1.25 User-agent: NetMechanic User-agent: NetSpider User-agent: NetZIP User-agent: NetZip Downloader 1.0 Win32(Nov 12 1998) User-agent: NetZip-Downloader/1.0.62 (Win32; Dec 7 1998) User-agent: NetZippy+(http://www.innerprise.net/usp-spider.asp) User-agent: Octopus User-agent: Offline Explorer User-agent: Offline Explorer/1.2 User-agent: Offline Explorer/1.4 User-agent: Offline Explorer/1.6 User-agent: Offline Explorer/1.7 User-agent: Offline Explorer/1.9 User-agent: Offline Explorer/2.0 User-agent: Offline Explorer/2.1 User-agent: Offline Explorer/2.3 User-agent: Offline Explorer/2.4 User-agent: Offline Explorer/2.5 User-agent: Offline Navigator User-agent: Offline.Explorer User-agent: Openbot User-agent: Openfind User-agent: Openfind data gatherer User-agent: Oracle Ultra Search User-agent: PageGrabber User-agent: Papa Foto User-agent: PerMan User-agent: ProPowerBot/2.14 User-agent: ProWebWalker User-agent: Python-urllib User-agent: QueryN Metasearch User-agent: QueryN.Metasearch User-agent: RMA User-agent: Radiation Retriever 1.1 User-agent: ReGet User-agent: RealDownload User-agent: RealDownload/4.0.0.40 User-agent: RealDownload/4.0.0.41 User-agent: RealDownload/4.0.0.42 User-agent: RepoMonkey User-agent: RepoMonkey Bait & Tackle/v1.01 User-agent: SiteSnagger User-agent: SlySearch User-agent: SmartDownload User-agent: SmartDownload/1.2.76 (Win32; Apr 1 1999) User-agent: SmartDownload/1.2.77 (Win32; Aug 17 1999) User-agent: SmartDownload/1.2.77 (Win32; Feb 1 2000) User-agent: SmartDownload/1.2.77 (Win32; Jun 19 2001) User-agent: SpankBot User-agent: Sqworm/2.9.85-BETA (beta_release; 20011115-775; i686-pc-linux User-agent: SuperBot User-agent: SuperBot/3.0 (Win32) User-agent: SuperBot/3.1 (Win32) User-agent: SuperHTTP User-agent: SuperHTTP/1.0 User-agent: Surfbot User-agent: Szukacz/1.4 User-agent: Teleport User-agent: Teleport Pro User-agent: Teleport Pro/1.29 User-agent: Teleport Pro/1.29.1590 User-agent: Teleport Pro/1.29.1634 User-agent: Teleport Pro/1.29.1718 User-agent: Teleport Pro/1.29.1820 User-agent: Teleport Pro/1.29.1847 User-agent: TeleportPro User-agent: Telesoft User-agent: The Intraformant User-agent: The.Intraformant User-agent: TheNomad User-agent: TightTwatBot User-agent: Titan User-agent: True_Robot User-agent: True_Robot/1.0 User-agent: TurnitinBot User-agent: TurnitinBot/1.5 User-agent: URL Control User-agent: URL_Spider_Pro User-agent: URLy Warning User-agent: URLy.Warning User-agent: VCI User-agent: VCI WebViewer VCI WebViewer Win32 User-agent: VoidEYE User-agent: WWW-Collector-E User-agent: WWWOFFLE User-agent: Web Image Collector User-agent: Web Sucker User-agent: Web.Image.Collector User-agent: WebAuto User-agent: WebAuto/3.40 (Win98; I) User-agent: WebBandit User-agent: WebBandit/3.50 User-agent: WebCapture 2.0 User-agent: WebCopier User-agent: WebCopier v.2.2 User-agent: WebCopier v2.5 User-agent: WebCopier v2.6 User-agent: WebCopier v2.7a User-agent: WebCopier v2.8 User-agent: WebCopier v3.0 User-agent: WebCopier v3.0.1 User-agent: WebCopier v3.2 User-agent: WebCopier v3.2a User-agent: WebEMailExtrac.* User-agent: WebEnhancer User-agent: WebFetch User-agent: WebGo IS User-agent: WebLeacher User-agent: WebReaper User-agent: WebReaper [info@webreaper.net] User-agent: WebReaper [webreaper@otway.com] User-agent: WebReaper v9.1 - www.otway.com/webreaper User-agent: WebReaper v9.7 - www.webreaper.net User-agent: WebReaper v9.8 - www.webreaper.net User-agent: WebReaper vWebReaper v7.3 - www,otway.com/webreaper User-agent: WebSauger User-agent: WebSauger 1.20b User-agent: WebSauger 1.20j User-agent: WebSauger 1.20k User-agent: WebStripper User-agent: WebStripper/2.03 User-agent: WebStripper/2.10 User-agent: WebStripper/2.12 User-agent: WebStripper/2.13 User-agent: WebStripper/2.15 User-agent: WebStripper/2.16 User-agent: WebStripper/2.19 User-agent: WebWhacker User-agent: WebZIP User-agent: WebZIP/2.75 (http://www.spidersoft.com) User-agent: WebZIP/3.65 (http://www.spidersoft.com) User-agent: WebZIP/3.80 (http://www.spidersoft.com) User-agent: WebZIP/4.0 (http://www.spidersoft.com) User-agent: WebZIP/4.1 (http://www.spidersoft.com) User-agent: WebZIP/4.21 User-agent: WebZIP/4.21 (http://www.spidersoft.com) User-agent: WebZIP/5.0 User-agent: WebZIP/5.0 (http://www.spidersoft.com) User-agent: WebZIP/5.0 PR1 (http://www.spidersoft.com) User-agent: WebZip User-agent: WebZip/4.0 User-agent: WebmasterWorldForumBot User-agent: Website Quester User-agent: Website Quester - www.asona.org User-agent: Website Quester - www.esalesbiz.com/extra/ User-agent: Website eXtractor User-agent: Website eXtractor (http://www.asona.org) User-agent: Website.Quester User-agent: Webster Pro User-agent: Webster.Pro User-agent: Wget User-agent: Wget/1.5.2 User-agent: Wget/1.5.3 User-agent: Wget/1.6 User-agent: Wget/1.7 User-agent: Wget/1.8 User-agent: Wget/1.8.1 User-agent: Wget/1.8.1+cvs User-agent: Wget/1.8.2 User-agent: Wget/1.9-beta User-agent: Widow User-agent: Xaldon WebSpider User-agent: Xaldon WebSpider 2.5.b3 User-agent: Xenu's User-agent: Xenu's Link Sleuth 1.1c
here is the rest of it: User-agent: Zeus User-agent: Zeus 11389 Webster Pro V2.9 Win32 User-agent: Zeus 11652 Webster Pro V2.9 Win32 User-agent: Zeus 18018 Webster Pro V2.9 Win32 User-agent: Zeus 26378 Webster Pro V2.9 Win32 User-agent: Zeus 30747 Webster Pro V2.9 Win32 User-agent: Zeus 32297 Webster Pro V2.9 Win32 User-agent: Zeus 39206 Webster Pro V2.9 Win32 User-agent: Zeus 41641 Webster Pro V2.9 Win32 User-agent: Zeus 44238 Webster Pro V2.9 Win32 User-agent: Zeus 51070 Webster Pro V2.9 Win32 User-agent: Zeus 51674 Webster Pro V2.9 Win32 User-agent: Zeus 51837 Webster Pro V2.9 Win32 User-agent: Zeus 63567 Webster Pro V2.9 Win32 User-agent: Zeus 6694 Webster Pro V2.9 Win32 User-agent: Zeus 71129 Webster Pro V2.9 Win32 User-agent: Zeus 82016 Webster Pro V2.9 Win32 User-agent: Zeus 82900 Webster Pro V2.9 Win32 User-agent: Zeus 84842 Webster Pro V2.9 Win32 User-agent: Zeus 90872 Webster Pro V2.9 Win32 User-agent: Zeus 94934 Webster Pro V2.9 Win32 User-agent: Zeus 95245 Webster Pro V2.9 Win32 User-agent: Zeus 95351 Webster Pro V2.9 Win32 User-agent: Zeus 97371 Webster Pro V2.9 Win32 User-agent: Zeus Link Scout User-agent: asterias User-agent: b2w/0.1 User-agent: cosmos User-agent: eCatch User-agent: eCatch/3.0 User-agent: hloader User-agent: httplib User-agent: humanlinks User-agent: larbin User-agent: larbin (samualt9@bigfoot.com) User-agent: larbin User-agent: larbin_2.6.2 (kabura@sushi.com) User-agent: larbin_2.6.2 (larbin2.6.2@unspecified.mail) User-agent: larbin_2.6.2 (listonATccDOTgatechDOTedu) User-agent: larbin_2.6.2 (vitalbox1@hotmail.com) User-agent: larbin_2.6.2 User-agent: larbin_2.6.2 User-agent: larbin_2.6.2 User-agent: larbin_2.6.2 listonATccDOTgatechDOTedu User-agent: larbin_2.6.2 User-agent: libWeb/clsHTTP User-agent: lwp-trivial User-agent: lwp-trivial/1.34 User-agent: moget User-agent: moget/2.1 User-agent: pavuk User-agent: pcBrowser User-agent: psbot User-agent: searchpreview User-agent: spanner User-agent: suzuran User-agent: tAkeOut User-agent: toCrawl/UrlDispatcher User-agent: turingos User-agent: webfetch/2.1.0 User-agent: wget Disallow: / Disallow: /.gif$ Disallow: /.jpg$ Disallow: /.jpeg$ Disallow: /.png$ Disallow: /addanyfileoranydirectoryhere
Geeze, gatordun... that's WAY overkill, IMO. I've seen such lists before. Prior to uploading that robots.txt file, how many of those had actually visited your site? In particular, all thosee Getright and Filezilla references are to download accelerators -- why are you so worried about them?Mu advice is to keep your robots.txt file as simple as possible. If you do find a rogue bot eating bandwidth, ban it. But you just don't need these huge robots.txt files, IMO.
It's a friends list. He guards against everything and needs too. That is why we are looking into excluding them in .htaccess. Look under apache htaccess area. The one thing is htaccess usually has to be disabled to load frontpage webs. Still looking for a tweak for that.
Here is a list here for the htaccess file that we are working on. http://forums.digitalpoint.com/showthread.php?t=22487
OMFG! How many robots have you submitted to. Personally I would put a disallow to any non-page directory to all bots, but let google adsense go anywhere it likes - more content; more ads!
No, that's incorrect. When the FPSE are installed, FP installs it's own htaccess file (and hides it). Use a 3rd party FTP program to unhide the file on your server if necessary and copy it back to your hard drive. Then, edit it with notepad and -- THIS IS IMPORTANT -- add any additional htaccess lines you wish TO THE BOTTOM OF THE ORIGINAL htaccess file. Then upload the appended/amended file back to your server.
Mistrel we use FLASHFTP for the htaccess file. Right now we edit it live. But we have a error, when we try to publish to the site, we are working on that, so we disable the full htaccess file and put up the default htaccess file to publish or tweak the site, then put back the full htacess file after we are done with editing or publishing. It's a temp solution. So it's better to leave images and the site locked down, until we figure out where the htaccess problem is.
Re-read my post. Either you overwrote the original FP htaccess file, or you've messed up the htaccess file in some other way.
We know and we are still looking. But it works for now and bans people / countires / bandwidth thieves and indexing images and using our images on other sites. That is the main directive for now! Everything always needs a tweak.