robot.txt help

Discussion in 'robots.txt' started by ialwaysforget, Mar 13, 2008.

  1. #1
    is this robot.txt correct for the wordpress blog
    because in my webmaster tools it shows that there are arnd 150 urls that are restricted by the robot.txt so how can i solve this problem....

    User-agent: *
    Disallow: /wp-content/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-
    Disallow: /feed/
    Disallow: /trackback/
    Disallow: /cgi-bin/
    
    User-agent: Googlebot
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$
    Disallow: /*.php*
    Disallow: */trackback*
    Disallow: /*?*
    Disallow: /z/
    Disallow: /wp-*
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.txt$
    Allow: /wp-content/uploads/
    
    User-agent: Googlebot-Image
    Allow: /*
    Code (markup):

     
    ialwaysforget, Mar 13, 2008 IP
  2. nessie

    nessie Active Member

    Messages:
    284
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    80
    #2
    Looks like you block virtually everything apart from /wp-content/uploads/ which usually needs to be blocked.
     
    nessie, Mar 13, 2008 IP
  3. ialwaysforget

    ialwaysforget Peon

    Messages:
    222
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    can u correct it so it allows everything and i htink thats the reason that google cant crawl in to my website
     
    ialwaysforget, Mar 13, 2008 IP
  4. nessie

    nessie Active Member

    Messages:
    284
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    80
    #4
    Yeah. How can they as you tell the poor Googlebot to go away ;)

    Try using only the following; This would block any spider [honours robots.txt] from accessing what it shouldn't in a typical wordpress blog.

    
    User-agent: *
    Disallow: /wp-content/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-
    Disallow: /feed/
    Disallow: /trackback/
    Disallow: /cgi-bin/
    
    Code (markup):
    Don't forget to resubmit your sitemap with google webmaster tools after correcting the robots.txt
     
    nessie, Mar 13, 2008 IP
  5. ialwaysforget

    ialwaysforget Peon

    Messages:
    222
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #5
    what if i remove the file from the directory?
     
    ialwaysforget, Mar 13, 2008 IP
  6. worldpresident

    worldpresident Banned

    Messages:
    163
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #6
    User-agent: *
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /trackback
    Disallow: /comments
    Disallow: /category/*/*
    Disallow: */trackback
    Disallow: */comments
    Disallow: /*?*
    Disallow: /*?
    Allow: /wp-content/uploads

    # Google Image
    User-agent: Googlebot-Image
    Disallow:
    Allow: /*

    # Google AdSense
    User-agent: Mediapartners-Google*
    Disallow:
    Allow: /*

    # Internet Archiver Wayback Machine
    User-agent: ia_archiver
    Disallow: /

    # digg mirror
    User-agent: duggmirror
    Disallow: /

    Sitemap: http://www.yoursite.com/sitemap.xml
     
    worldpresident, Mar 13, 2008 IP
    manish.chauhan likes this.
  7. nessie

    nessie Active Member

    Messages:
    284
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    80
    #7
    Your wish. My suggestion is to keep the minimal one I gave you. Either way resubmit the site via webmaster tools.
     
    nessie, Mar 13, 2008 IP
  8. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #8
    Can I update by blogger blog robots.txt??
     
    manish.chauhan, Apr 6, 2008 IP