1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

How to write a robots.txt for become all robots exclude MJ12Bot?

Discussion in 'Search Engine Optimization' started by bass, Feb 26, 2011.

  1. #1
    I write with this following code:

    User-agent: MJ12Bot
    Disallow: /

    User-agent: googlebot
    Allow: /

    User-agent: msnbot
    Allow: /

    User-agent: bingbot
    Allow: /

    However, when I check at robots.txt checker it tell me incorrect.
     
    bass, Feb 26, 2011 IP
  2. georgenbowser

    georgenbowser Well-Known Member

    Messages:
    173
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    103
    #2
    User-agent: *
    Disallow: /MJ12Bot

    this will not crawl your MJ12Bot folder for all search engine spider.
    it's didn't able to crawl MJ12Bot in to your site. you can block particular search engine spider to get crawl your site particular folders
     
    georgenbowser, Feb 26, 2011 IP
  3. bass

    bass Active Member

    Messages:
    738
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    53
    #3
    Thanks

    But this code can crawl for google, bing and yahoo bots?
     
    bass, Feb 26, 2011 IP
  4. MikeBuyco

    MikeBuyco Member

    Messages:
    702
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    35
    #4
    Yes it can crawl google, bing and yahoo bots by default they already crawl your site. george is right. You just need to add the:

    User-agent: *
    Disallow: /MJ12Bot

    You don't need to add allow google, yahoo, bing, etc.
     
    MikeBuyco, Feb 26, 2011 IP
  5. bass

    bass Active Member

    Messages:
    738
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    53
    #5
    bass, Feb 26, 2011 IP
  6. iproficientseo

    iproficientseo Peon

    Messages:
    50
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6

    thanks for sharing ...
     
    iproficientseo, Feb 26, 2011 IP
  7. carleisenstein

    carleisenstein Peon

    Messages:
    253
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #7
    That validator looks broken. Providing you're wanting to tell all spiders to ignore the /MJ12Bot folder on your site, then the robots.txt file George proposed is correct.

    However I think it's probably more likely that you want to blog the MJ12Bot from crawling your site? If that's the case then your robots.txt file should be:

    User-agent: MJ12Bot
    Disallow: /

    That's all it needs to be - you don't need to put allow statements for Google etc.
     
    carleisenstein, Feb 26, 2011 IP
  8. bass

    bass Active Member

    Messages:
    738
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    53
    #8
    Thanks

    Yes, I want to block MJ12Bot that crawl my site but allow google, bing and etc.

    I think you are correct!
     
    bass, Feb 26, 2011 IP
  9. davidpolanco

    davidpolanco Member

    Messages:
    35
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #9
    If you have trouble use Google Webmasters tool, they make it easy to find your robots.txt...then use Google to do a search for a robots.txt generator for .htaccess.

    Good luck.
     
    davidpolanco, Feb 26, 2011 IP
  10. Nasiha

    Nasiha Member

    Messages:
    5
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #10
    What to put in robots.txt or .htaccess so that googlebot catches just one vesrion of URLs with end .html
    I see googlebot catches my URLs without html in the end of urls, so it creates 404 errors, I checked for urls that comes the catched urls from, all are fine, I mus also mention that in wordpress i set permanent urls with .html ??
     
    Nasiha, Feb 13, 2016 IP
  11. Anubhav-soin

    Anubhav-soin Member

    Messages:
    41
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    36
    #11
    Google bot only gives you 404 if you have renamed that particular page or deleted it completely. However robots.txt example is already given above and you do not need to change the .htaccess , there are some plugins that can use it itself by writing their configuration on that file
     
    Anubhav-soin, Feb 18, 2016 IP