How do allow Google and Bing Bots to crawl my website's sitemap.xml file and index the pages in the sitemap.xml file? what codes to put in robots.txt file to make this happen?
Nothing. Google and Bing bots will crawl your sitemap file anyway. All you need to do is add it in your Google/Bing webmasters tools. If you have a virtual robots.txt file then your sitemap url is automatically inserted into it. Else just put this in your robots.txt Sitemap: http://whateveryoursiteis.com/sitemap.xml.gz Code (markup):
You can use this code to allow any bot to crawl your site User-agent: * Sitemap: URL/sitemap.xml if you allow Bing and Google only, you can use following code. User-agent: googlebot User-agent: google User-agent: bingbot User-agent: bing User-agent: * Disallow: /
To allow all robots complete access write below code in your robots.txt file User-agent: * Disallow: Sitemap: http://www.yoursite.com/sitemap.xml To exclude all robots from the entire server write below code in your robots.txt file User-agent: * Disallow: /
And why do you have to put such things in the robots.txt? Isn't robots.txt used ONLY if you want to EXCLUSE something from being INDEXED?
I completely agree with your statement. We need to follow some steps if we want to allow particular search engine to crawl our website.
If you didn't commend anything to Robots.txt then it ll didn't index your website?? Well one thing I must want to clear here we are using robots.txt when we want to restrict any page or folder from search engine, If you didn't do any commend to robots then it ll automatically crawl your complete website no need add any code for this.
Aryans, I think you didn't read my reply seriously. I was saying that if you want to allow particular search engine to crawl your website, then you need to mention the search engine name in your robots.txt file. For an example, in robots file we use the * star for allowing all search engines, but for a particular search engine you need to mention the search engine boot name at the place of * star. Hope you understand my point of views, and thanks for considering my comment dude.
My robots.txt only specifies which directories shouldn't be crawled and points to my Sitemap. That's all that is needed to be honest. Majority of traffic today comes from Google. Google will index your site, unless you explicitly tell Google to de-index it. Bing will be the second to crawl it, it would take a few weeks, but once that is done, all is well.