Hi, Can someone advice on robots.txt file. M site has a sitemap both xml and html which works fine with google, yahoo and msn. I do not have any robots.txt file. However some search engine repeatedly looks for this file. I need help in a simple robots.txt file to direct all robots to the xml or html file. Thanks in advance. Jack
hi jack, I think you need to create a robots.txt file for your sites. Every search engine first find your robots.txt file in you files. As far as concerned about the xml and html sitemap both are important, but for the point of view of search engine, you must xml map because search engine crawl easily and will get your new pages indexed by xml sitemap. bye
There are now two purposes for a robots.txt file. The first (and main) one is to tell robots which parts of your site they should NOT view. The second purpose is a more recent addition to the robots.txt standard and is to let robots know where your sitemap file is. If the robots are finding your sitemap file already, then there isn't much need to add it's location to your robots.txt file, but it won't hurt.
# /robots.txt file for http://www.wallpaperweb.org User-agent: * Sitemap: http://www.wallpaperweb.org/sitemap.xml Disallow: /system_error.asp
Forget robots.txt file. It is nothing important. Learn more on sitemap specially if you site have thousands of pages. make more than one sitemaps if needed. I guess I am bit late to reply here.
It is always a good practice to have a robots.txt file. If you have nothing to enter in it, you can create a blank file. It will prevent the redundant 404 errors. Another file which you should have on the server to reduce 404 errors is favicon.ico.
If you need every page to be indexed, you can use the following info in the txt.fie: User-agent: * Disallow:
robots.txt has no link with the sitemap file your sitemap shd be sitemap.xml for google for yahoo its a text file
i think sitemap.xml more acceptable and preferable for by most SE. robots.txt maninly prevent your site from bad boots who consume your bandwith but if bandwidth is not a issue for your website than you need not want robots.txt
in robots.txt you tell tje search engin not to go to parts in you site, but we want that google will see all of our site so ' i dont put robots.txt anywhere
Just create the robots.txt file in your root directory of the web space. & put the following code in that text file to allow all the robots to crawl your site.. User-agent: * Disallow: Code (markup):
It is used to exclude pages from bots, such as search engines. For instance, if you wanted to have a specific page not shown in the search engines. You can normally get answers to simple questions like this by Googling
Jack, you do not need a robots.txt file. We use it to tell the SEs NOT to index a directory or file. It is also used to tell some SEs where to fine the Site Map file. This is mine: # Robots.txt file created by 1/20/08 # For domain: http://www.catanich.com # # All other robots will spider the domain User-agent: * Disallow: /_common/ Disallow: /_private/ Disallow: /_ScriptLibrary/ Disallow: /_*/ Sitemap: http://www.catanich.com/sitemap.xml.gz It also should be noted that a blank line in the robots.txt file will create an error.
Does it ? I have never heard anything about a blank line causing an error but if it does it certainly could explain some of the strange behaviour that some crawlers exhibit. Presumably, when it causes an error the crawler will ignore the rest of the file below the blank line. I guess some crawlers may even throw the whole file out if they get an error.