Robots.txt question/problem in Wordpress

Discussion in 'Search Engine Optimization' started by rgordon83, May 29, 2007.

  1. #1
    i logged into google webmaster tools for my new site http://theguitarresource.com and i got a notice that says "URL restricted by robots.txt" For the url "http://theguitarresource.com", which is my home page. I assume this is bad. But here is what my robots.txt file looks like
    User-agent: *
    # disallow all files in these directories
    Disallow: /cgi-bin/
    Disallow: /z/j/
    Disallow: /z/c/
    Disallow: /stats/
    Disallow: /dh_
    Disallow: /about/
    Disallow: /contact/
    Disallow: /tag/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /contact
    Disallow: /manual
    Disallow: /manual/*
    Disallow: /phpmanual/
    Disallow: /category/


    User-agent: Googlebot
    # disallow all files ending with these extensions
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.gz$
    Disallow: /*.wmv$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$

    # disallow all files with ? in url
    Disallow: /*?*

    # disable duggmirror
    User-agent: duggmirror
    Disallow: /

    # allow google image bot to search all images
    User-agent: Googlebot-Image
    Disallow:
    Allow: /*

    # allow adsense bot on entire site
    User-agent: Mediapartners-Google*
    Disallow:
    Allow: /*


    so i'm not sure what to change or why it says my homepage is restricted. please help!!!
     
    rgordon83, May 29, 2007 IP
  2. WebGeek182

    WebGeek182 Active Member

    Messages:
    510
    Likes Received:
    28
    Best Answers:
    0
    Trophy Points:
    95
    #2
    First off, I'd recommend removing the line:
    Disallow: /*.php$

    Everything in WordPress is PHP so that could disallow most pages from being indexed.
     
    WebGeek182, May 29, 2007 IP
  3. sweetfunny

    sweetfunny Banned

    Messages:
    5,743
    Likes Received:
    467
    Best Answers:
    0
    Trophy Points:
    0
    #3
    This line is a really bad one too

    Disallow: /category/

    All the links on your right navigation menu for your subpages are in the /category/ tree.

    With the .php line i see what they have done it's not too bad as the URL's are rewritten to just domain.com/subdir/page

    So they have added the exclude .php to stop domain.com/subdir/page/index.php being indexed.

    All up it's a pretty sloppy Robots file. I'd remove it all and just exclude crucial things like includes, admin etc then use .htaccess to rewrite the index.php extensions as a 301 to the subfolder.

    This isn't done, if you go to any subpage then add index.php to the URL the same page will reload with the index.php still in the browser.
     
    sweetfunny, May 29, 2007 IP
  4. oseymour

    oseymour Well-Known Member

    Messages:
    3,960
    Likes Received:
    92
    Best Answers:
    0
    Trophy Points:
    135
    #4
    I agree with sweetfollow...Its a really sloppy robots file....is that the one that came with wordpress?

    If it is I am going to pay better attention to it
     
    oseymour, May 29, 2007 IP
  5. rgordon83

    rgordon83 Peon

    Messages:
    671
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    0
    #5
    WP did not come with one, but i got it as an example of the WP site. See the thing is i really don't know anything about a robots.txt file, so i just used the one they provided. anyone know a better place to get one??

    Also, i have a feeling they disallowed the /category b/c that way it wont have duplicate content from the category pages vs. the indavidual post page...but i could be wrong..
     
    rgordon83, May 29, 2007 IP
  6. rgordon83

    rgordon83 Peon

    Messages:
    671
    Likes Received:
    22
    Best Answers:
    0
    Trophy Points:
    0
    #6
    is this better?

    User-agent: *
    Disallow: /wp-content/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-
    Disallow: /feed/
    Disallow: /trackback/
    Disallow: /cgi-bin/


    User-agent: Googlebot
    # disallow all files ending with these extensions
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.gz$
    Disallow: /*.wmv$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$

    # allow google image bot to search all images
    User-agent: Googlebot-Image
    Disallow:
    Allow: /*

    # allow adsense bot on entire site
    User-agent: Mediapartners-Google*
    Disallow:
    Allow: /*
     
    rgordon83, May 29, 2007 IP
  7. LuckyPimp

    LuckyPimp Peon

    Messages:
    85
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Be careful with disallow, you can screw yourself quick
     
    LuckyPimp, Jul 6, 2008 IP