robots.txt usage and examples

Discussion in 'Search Engine Optimization' started by apachehtaccess, Feb 7, 2007.

  1. #1
    I am curious to see some example uses and implementations of robots.txt.. specifically implementations and the reasons behind them for increasing SEO. The robots.txt I am using is based on the example: SEO with robots.txt

    WordPress 2.1 robots.txt
    
    User-agent: * 
    
    # disallow all files in these directories
    Disallow: /cgi-bin/
    Disallow: /admin/
    Disallow: /comments/
    Disallow: /z/j/
    Disallow: /z/c/
    Disallow: /about/legal-notice/
    Disallow: /about/copyright-policy/
    Disallow: /about/terms-and-conditions/
    Disallow: /about/feed/
    Disallow: /about/trackback/
    Disallow: /contact/
    Disallow: /stats*
    Disallow: /tag
    Disallow: /category/uncategorized*
    
    # disallow all files ending with these extensions
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.txt$
    
    # disallow all files in /wp- directorys
    Disallow: /wp-*/
    
    # disallow all files with? in url
    Disallow: /*?
    
    Code (markup):
    Basically this helps get rid of duplicate content, low-quality content, css, javascript, php, etc.. but does allow search engines to read the articles, find images, find pdfs, etc.
    Anyone else have improvements or other robots.txt examples?
     
    apachehtaccess, Feb 7, 2007 IP
  2. Diablos

    Diablos Guest

    Messages:
    563
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Thanks for that example, robots.txt is one of the SEO fields I struggle with. I will look into using some of these.
     
    Diablos, Feb 7, 2007 IP
  3. apachehtaccess

    apachehtaccess Guest

    Messages:
    82
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I also use a custom robots.txt for phpBB


    phpBB robots.txt
    User-agent: * 
    # disallow all files with a? in url 
    Disallow: /*?* 
      
    # disallow all files ending in specific extension 
    Disallow: /*.php$ 
    Disallow: /*.js$ 
    Disallow: /*.inc$ 
    Disallow: /*.css$ 
    Disallow: /*.txt$ 
    
    # disallow these dirs
    Disallow: /js/
    Disallow: /css/
    Disallow: /cgi-bin/
    Disallow: /db/
    Disallow: /admin/
    Disallow: /cache/
    Disallow: /includes/
    Disallow: /templates/
    
    # disallow these files and dirs
    Disallow: /V
    Disallow: /stats*
    Disallow: /post
    Disallow: /member
    Disallow: /mx_
    
    # disallow these urls
    Disallow: /rss.php
    Disallow: /viewtopic.php
    Disallow: /viewforum.php
    Disallow: /index.php?
    Disallow: /posting.php
    Disallow: /groupcp.php
    Disallow: /search.php
    Disallow: /login.php
    Disallow: /profile.php
    Disallow: /memberlist.php
    Disallow: /faq.php
    Disallow: /common.php
    Disallow: /index.php
    Disallow: /memberlist.php
    Disallow: /modcp.php
    Disallow: /privmsg.php
    Disallow: /viewonline.php
    
    # disallow urls starting with quote
    Disallow: /"
    Code (markup):
    but this phpBB forum is different than the default because it has special optimizations already.
     
    apachehtaccess, Feb 7, 2007 IP
  4. Cryogenius

    Cryogenius Peon

    Messages:
    1,280
    Likes Received:
    118
    Best Answers:
    0
    Trophy Points:
    0
    #4
    You should only use wild cards on bots that support it (i.e. GoogleBot), using an appropriate UserAgent filter.

    It's probably not worth trying to make it too complicated - the search engines will filter out the duplicate and low quality content anyway.

    Cryo.
     
    Cryogenius, Feb 7, 2007 IP
  5. apachehtaccess

    apachehtaccess Guest

    Messages:
    82
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Normal phpBB robots.txt

    
    #
    # robots.txt for www.phpbbhacks.com
    #
    User-agent: * 
    Disallow: /forums/viewtopic.php 
    Disallow: /forums/viewforum.php 
    Disallow: /forums/index.php? 
    Disallow: /forums/posting.php 
    Disallow: /forums/groupcp.php 
    Disallow: /forums/search.php 
    Disallow: /forums/login.php 
    Disallow: /forums/privmsg.php 
    Disallow: /forums/post
    Disallow: /forums/profile.php 
    Disallow: /forums/memberlist.php 
    Disallow: /forums/faq.php 
    Disallow: /forums/archive
    
    Code (markup):
     
    apachehtaccess, Feb 7, 2007 IP