hey all, I've a little problem. I've got at the moment only ftp access to my server to the (-sub) directory which is linked to my site. example: www.example.com/template/ <-here are all my files like index.htm ... but I have to put the robots.txt into www.example.com/ <--- How can I do that, the server runs with confixx.
you have to get ftp access to either the www/ folder or the root folder. You can do this through your site admin or your cpanel. Or ask your host to do it for you.
I think this is how it works . Example.com already has a robots.txt Now you created a folder in their root . If they don't "disallow " your folder named "templets" in robots.txt , then search engines will not have problems indexing your site . "An empty robots is considered as Allow: all" until disallow is mentioned . MSN did not ask for a robots file when I created a folder in my root .It used the robots of the main domain in your case example.com Check the error logs . If there is a request for robots.txt , then you need to put one otherwise they found it . I am not very sure about what I said above , it's just what I saw in my logs. Let others comment . Regards jeet
hmm....if no robots.txt means index everything. JEET, so does that mean we dont need a robots.txt if there is nothing to dissallow in our website ?
see if the robots.txt is already there? If not, you typically dont need it unless you want to get sophisticated about who and who not to block
sometimes you have to blocked some robots coz they are only a waste of your bandwidth, email harvester bots, etc... @ dizyn http://www.seyq.com/robots.txt <- sample just look for a robots.txt file to some popular sites, copy it and upload to your server.
User-agent: * User-agent: Turn It In Disallow: / User-agent: grub-client Disallow: / User-agent: grub Disallow: / User-agent: looksmart Disallow: / User-agent: WebZip Disallow: / User-agent: larbin Disallow: / User-agent: b2w/0.1 Disallow: / User-agent: psbot Disallow: / User-agent: Python-urllib Disallow: / User-agent: NetMechanic Disallow: / User-agent: URL_Spider_Pro Disallow: / User-agent: CherryPicker Disallow: / User-agent: EmailCollector Disallow: / User-agent: EmailSiphon Disallow: / User-agent: WebBandit Disallow: / User-agent: EmailWolf Disallow: / User-agent: ExtractorPro Disallow: / User-agent: CopyRightCheck Disallow: / User-agent: Crescent Disallow: / User-agent: SiteSnagger Disallow: / User-agent: ProWebWalker Disallow: / User-agent: CheeseBot Disallow: / User-agent: LNSpiderguy Disallow: / User-agent: Teleport Disallow: / User-agent: TeleportPro Disallow: / User-agent: MIIxpc Disallow: / User-agent: Telesoft Disallow: / User-agent: Website Quester Disallow: / User-agent: moget/2.1 Disallow: / User-agent: WebZip/4.0 Disallow: / User-agent: WebStripper Disallow: / User-agent: WebSauger Disallow: / User-agent: WebCopier Disallow: / User-agent: NetAnts Disallow: / User-agent: Mister PiX Disallow: / User-agent: WebAuto Disallow: / User-agent: TheNomad Disallow: / User-agent: WWW-Collector-E Disallow: / User-agent: RMA Disallow: / User-agent: libWeb/clsHTTP Disallow: / User-agent: asterias Disallow: / User-agent: httplib Disallow: / User-agent: turingos Disallow: / User-agent: spanner Disallow: / User-agent: InfoNaviRobot Disallow: / User-agent: Harvest/1.5 Disallow: / User-agent: Bullseye/1.0 Disallow: / User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95) Disallow: / User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 Disallow: / User-agent: CherryPickerSE/1.0 Disallow: / User-agent: CherryPickerElite/1.0 Disallow: / User-agent: WebBandit/3.50 Disallow: / User-agent: NICErsPRO Disallow: / User-agent: Microsoft URL Control - 5.01.4511 Disallow: / User-agent: DittoSpyder Disallow: / User-agent: Foobot Disallow: / User-agent: WebmasterWorldForumBot Disallow: / User-agent: SpankBot Disallow: / User-agent: BotALot Disallow: / User-agent: lwp-trivial/1.34 Disallow: / User-agent: lwp-trivial Disallow: / User-agent: BunnySlippers Disallow: / User-agent: Microsoft URL Control - 6.00.8169 Disallow: / User-agent: URLy Warning Disallow: / User-agent: /1.6 Disallow: / User-agent: /1.5.3 Disallow: / User-agent: Disallow: / User-agent: LinkWalker Disallow: / User-agent: cosmos Disallow: / User-agent: moget Disallow: / User-agent: hloader Disallow: / User-agent: humanlinks Disallow: / User-agent: LinkextractorPro Disallow: / User-agent: Offline Explorer Disallow: / User-agent: Mata Hari Disallow: / User-agent: LexiBot Disallow: / User-agent: Web Image Collector Disallow: / User-agent: The Intraformant Disallow: / User-agent: True_Robot/1.0 Disallow: / User-agent: True_Robot Disallow: / User-agent: BlowFish/1.0 Disallow: / User-agent: JennyBot Disallow: / User-agent: MIIxpc/4.2 Disallow: / User-agent: BuiltBotTough Disallow: / User-agent: ProPowerBot/2.14 Disallow: / User-agent: BackDoorBot/1.0 Disallow: / User-agent: toCrawl/UrlDispatcher Disallow: / User-agent: WebEnhancer Disallow: / User-agent: suzuran Disallow: / User-agent: VCI WebViewer VCI WebViewer Win32 Disallow: / User-agent: VCI Disallow: / User-agent: Szukacz/1.4 Disallow: / User-agent: QueryN Metasearch Disallow: / User-agent: Openfind data gathere Disallow: / User-agent: Openfind Disallow: / User-agent: Xenu's Link Sleuth 1.1c Disallow: / User-agent: Xenu's Disallow: / User-agent: Zeus Disallow: / User-agent: RepoMonkey Bait & Tackle/v1.01 Disallow: / User-agent: RepoMonkey Disallow: / User-agent: Microsoft URL Control Disallow: / User-agent: Openbot Disallow: / User-agent: URL Control Disallow: / User-agent: Zeus Link Scout Disallow: / User-agent: Zeus 32297 Webster Pro V2.9 Win32 Disallow: / User-agent: Webster Pro Disallow: / User-agent: EroCrawler Disallow: / User-agent: LinkScan/8.1a Unix Disallow: / User-agent: Keyword Density/0.9 Disallow: / User-agent: Kenjin Spider Disallow: / User-agent: Iron33/1.0.2 Disallow: / User-agent: Bookmark search tool Disallow: / User-agent: GetRight/4.2 Disallow: / User-agent: FairAd Client Disallow: / User-agent: Gaisbot Disallow: / User-agent: Aqua_Products Disallow: / User-agent: Radiation Retriever 1.1 Disallow: / User-agent: Flaming AttackBot Disallow: / User-agent: Oracle Ultra Search Disallow: / User-agent: MSIECrawler Disallow: / User-agent: PerMan Disallow: / User-agent: searchpreview Disallow: / User-agent: aipbot Disallow: / User-agent: abot Disallow: /
robots.txt needs to go in your web root.... mydomain.com/robots.txt It is an exclusion standard to kindly request that bots shouldn't visit an area or a page, but it does not 'block' bad bots. It merely suggests to 'honest' bots that they shouldn't spider an area. I have a script on my site specifically for email harvesters that links to page after page of fake email addresses generated at random. The robots.txt lists this page as Disallowed for the sake of 'honest' bots like Google, MSN, Yahoo, (and the AutoMapIt.com spider), but 'bad' bots that ignore the robots.txt will go hog wild harvesting thousands or even millions of random, gibberish email addresses. Robots.txt doesn't block bad bots so it's best to use that file to help good bots stay out of traps and admin areas. As for bad bots, if you find a particular one visiting you, it's best to use htaccess to block them by IP, user-agent, or other identifying info... htaccess is absolute, robots.txt is just a suggestion.
hey maybe someone can help me i did install the robot.txt to my website but i have a question i have 2 parts to my website my main website is jayboi25 for example and i have jayboi24 and jayboi26 to make my website work my problem was google was only crawling the one part of my site so i installed the robot.txt and then i uploaded a sitemap and sent that to google what im not sure of is do i have to put a sitemap on jayboi24 and jayboi26 also or just put 3 sitemaps on my main part of my site jayboi25 im not sure how to do this if anyone can help feel free to contact me thanks