I write with this following code: User-agent: MJ12Bot Disallow: / User-agent: googlebot Allow: / User-agent: msnbot Allow: / User-agent: bingbot Allow: / However, when I check at robots.txt checker it tell me incorrect.
User-agent: * Disallow: /MJ12Bot this will not crawl your MJ12Bot folder for all search engine spider. it's didn't able to crawl MJ12Bot in to your site. you can block particular search engine spider to get crawl your site particular folders
Yes it can crawl google, bing and yahoo bots by default they already crawl your site. george is right. You just need to add the: User-agent: * Disallow: /MJ12Bot You don't need to add allow google, yahoo, bing, etc.
Thanks for all Because when I put this to my robots.txt file and check with robots.txt checker on http://www.searchenginepromotionhelp.com/m/robots-text-tester. They tell me some errors.
That validator looks broken. Providing you're wanting to tell all spiders to ignore the /MJ12Bot folder on your site, then the robots.txt file George proposed is correct. However I think it's probably more likely that you want to blog the MJ12Bot from crawling your site? If that's the case then your robots.txt file should be: User-agent: MJ12Bot Disallow: / That's all it needs to be - you don't need to put allow statements for Google etc.
Thanks Yes, I want to block MJ12Bot that crawl my site but allow google, bing and etc. I think you are correct!
If you have trouble use Google Webmasters tool, they make it easy to find your robots.txt...then use Google to do a search for a robots.txt generator for .htaccess. Good luck.
What to put in robots.txt or .htaccess so that googlebot catches just one vesrion of URLs with end .html I see googlebot catches my URLs without html in the end of urls, so it creates 404 errors, I checked for urls that comes the catched urls from, all are fine, I mus also mention that in wordpress i set permanent urls with .html ??
Google bot only gives you 404 if you have renamed that particular page or deleted it completely. However robots.txt example is already given above and you do not need to change the .htaccess , there are some plugins that can use it itself by writing their configuration on that file