robots.txt for wordpress

Discussion in 'WordPress' started by InnovationZen, Feb 4, 2007.

  1. #1
    I am trying to create a robots.txt file customized for wordpress blogs.

    here is what I have found across the net:

    User-agent: *
    Disallow: /wp-
    Disallow: /uploads/
    Disallow: /feed/
    Disallow: /comments/feed
    Disallow: /feed/$

    does anyone familiar with those files know if that code is right and if I should include something else?

    some people told me that Googlebot ignores the "User-agent: *", is it true?
     
    InnovationZen, Feb 4, 2007 IP
  2. nirghum

    nirghum Banned

    Messages:
    44
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    yes, the code you gave is correct and most probably that is the best policy to save your bandwidth from robot being taken unwanted files of your blog. But that is not true that googlebot avoid *. But if you still have some confusion then i would suggest use another same code after or below that code by replacing * to Googlebot
     
    nirghum, Feb 4, 2007 IP
  3. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Not true, but it is a bit complicated:

    1. Googlebot only ignores "User-agent: *" when there is a "User-agent: Googlebot" directive. This is compliant with the robots.txt standard.

    2. You should not use * or $ within the "Disallow:" directives that follow "User-agent: *". These directives should not include proprietary syntax. If you want to use Google proprietary syntax, then you need to use "User-agent: Googlebot".

    3. Anyway, "Disallow: /feed/$" is not necessary as it is included in "Disallow: /feed/".

    Jean-Luc
     
    Jean-Luc, Feb 4, 2007 IP
  4. MarRome

    MarRome Peon

    Messages:
    865
    Likes Received:
    92
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Try this,

    User-agent: *
    Disallow: */feed*
    Disallow: */wp-admin
    Disallow: */wp-content
    Disallow: */wp-includes
    Disallow: *wp-login.php

    Good Luck
     
    MarRome, Feb 4, 2007 IP
  5. InnovationZen

    InnovationZen Well-Known Member

    Messages:
    285
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    108
    #5
    thanks guys,

    I will try MarRome's robot.txt
     
    InnovationZen, Feb 5, 2007 IP
  6. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #6
    May I suggest you to first have a look at the robots.txt specification.

    Jean-Luc
     
    Jean-Luc, Feb 5, 2007 IP
  7. InnovationZen

    InnovationZen Well-Known Member

    Messages:
    285
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    108
    #7
    i checked the specifications, you are concerned that the "Disallow: */wp-admin" might not work due to the * in front of it?
     
    InnovationZen, Feb 5, 2007 IP
  8. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Yep. It does not work with most robots, while "Disallow: /blog_directory/wp-admin/" works with all robots.

    Jean-Luc
     
    Jean-Luc, Feb 5, 2007 IP
  9. InnovationZen

    InnovationZen Well-Known Member

    Messages:
    285
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    108
    #9
    alright,

    sticking back to my old file then,

    here is what it looks so far:

    User-agent: *
    Disallow: /wp-admin/
    Disallow: /wp-content/
    Disallow: /wp-includes/
    Disallow: /feed
    Disallow: /comments
     
    InnovationZen, Feb 5, 2007 IP
  10. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #10
    You could simplify the "wp-..." part, like this:
    User-agent: *
    Disallow: /wp-
    Code (markup):
    Regarding "/feed/" and "/comments", there are two potential problems:
    - not all feeds are in the root directory of your blog
    - I do not know what feed readers do when they come in a blog where robots.txt disallows access to the feeds.
    For these reasons, I do not disallow the feeds in my blogs.

    Jean-Luc
     
    Jean-Luc, Feb 5, 2007 IP
  11. apachehtaccess

    apachehtaccess Guest

    Messages:
    82
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Here is my robots.txt for my Wordpress 2.1 blog.

    Check out the detailed article about using robots.txt on wordpress for SEO.

    
    User-agent: *
    # disallow files in /cgi-bin
    Disallow: /cgi-bin/
    Disallow: /comments/
    Disallow: /z/j/
    Disallow: /z/c/
    
    # disallow all files ending in .php
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.txt$
    
    
    
    #disallow all files in /wp- directorys
    Disallow: /wp-*/
    
    # disallow all files with ? in url
    Disallow: /*?
    
    Disallow: /stats*
    Disallow: /dh_
    
    Disallow: /about/legal-notice/
    Disallow: /about/copyright-policy/
    Disallow: /about/terms-and-conditions/
    Disallow: /about/feed/
    Disallow: /about/trackback/
    
    Disallow: /contact/
    Disallow: /tag
    Disallow: /docs*
    Disallow: /manual*
    Disallow: /category/uncategorized*
    
    Code (markup):
     
    apachehtaccess, Feb 7, 2007 IP
  12. InnovationZen

    InnovationZen Well-Known Member

    Messages:
    285
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    108
    #12
    hmm that is a good one, thanks
     
    InnovationZen, Feb 7, 2007 IP
  13. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #13
    Sorry for waken up an old post but since there is already a post here about things i want to ask i think its better to start there then open a new one.

    On this page for WordPress i found this text

    User-agent: *
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /trackback
    Disallow: /feed
    Disallow: /comments
    Disallow: /category/*/*
    Disallow: */trackback
    Disallow: */feed
    Disallow: */comments
    Disallow: /*?*
    Disallow: /*?
    Allow: /wp-content/uploads

    # Google Image
    User-agent: Googlebot-Image
    Disallow:
    Allow: /*

    # Google AdSense
    User-agent: Mediapartners-Google*
    Disallow:
    Allow: /*

    # Internet Archiver Wayback Machine
    User-agent: ia_archiver
    Disallow: /

    # digg mirror
    User-agent: duggmirror
    Disallow: /

    Is this to much or is it a good one?
     
    TheSyndicate, Oct 28, 2007 IP
  14. Minterest

    Minterest Well-Known Member

    Messages:
    2,694
    Likes Received:
    39
    Best Answers:
    0
    Trophy Points:
    180
    #14
    Why Not...

    User-agent: *
    Disallow:

    And allow all pages to be indexed........
     
    Minterest, Oct 31, 2007 IP
  15. @SHFAQ

    @SHFAQ Well-Known Member

    Messages:
    257
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    103
    #15
    Plz tell me how to stop search engines to crawl duplicate posts in my blog as its not good for my blog and google will penalize the blog, so give me a good example of making robots.txt, i am using %postname% permalink in my blog.
     
    @SHFAQ, Nov 26, 2007 IP
  16. apachehtaccess

    apachehtaccess Guest

    Messages:
    82
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #16
    apachehtaccess, Nov 26, 2007 IP