1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Can someone check my Robot.txt My site is still not indexed yet.

Discussion in 'robots.txt' started by covetousrat, Mar 27, 2011.

  1. #1
    1. Can any1 help me to check my robot.txt. It's been around 3 weeks since my new website. The only page indexed is the main home page. All the content pages has not been indexed yet. I am using wordpress Platinum SEO. This is my website.

    www.holyhamsters.com

    User-agent: *
    Disallow: /cgi-bin
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /trackback
    Disallow: /feed
    Disallow: /comments
    Disallow: /category/*/*
    Disallow: */trackback
    Disallow: */feed
    Disallow: */comments
    Disallow: /*?*
    Disallow: /*?
    Allow: /wp-content/uploads

    User-agent: Googlebot
    Disallow: /*.pdf$
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$
    Disallow: /*.php*
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.txt$
    Disallow: /*?*
    Disallow: /wp-*
    Disallow: */feed/
    Disallow: */trackback/
    Disallow: /cgi-bin/
    Disallow: /go/
    Allow: /wp-content/uploads/

    User-agent: Googlebot-Image
    Allow: /*

    2. Is my sitemap correct?
    www.holyhamsters.com/sitemap.xml

    3. On google webmasters I have submitted both www.holyhamsters.com and holyhamsters.com. However, on the one without www, I am getting an error cause my sitemap appears to be in the www.holyhamsters.com/sitemap.xml. Any fix to this?

    Sorry this is my first wordpress website.
     
    covetousrat, Mar 27, 2011 IP
  2. kevinnn

    kevinnn Peon

    Messages:
    39
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Ok, your robots.txt looks terrible (sorry). Why are you disallowing all the submaps from your Wordpress site? Ofcourse, Google will not index your Wordpress!

    The standard configuration that a Wordpress makes is good enough. You don't need to add too much stuff yourself (maybe a SEO pack plugin to add meta tags).

    2). You can validate your XML sitemap on xml-sitemaps.com
    I did this for you, and you don't have an error, which means that your sitemap is working. As mentioned above: if all the pages are in your sitemap, but you disallow Google to crawl them, then still Google will not show it in their search engine.

    3). In the GOogle Webmaster Tools you can change your preferences for each domain. In your settings, you can select a preference: use the domain with or without www. So you don't need to add 2 'sites' for the same domain. (as a sitenote: this can be targetted as duplicate content, which is also a reason not indexing your site).

    Many sites make this mistake: their website can be found by going to myown#website.com or with the 'www' variant.
    Google will see this as duplicate content, be sure to make a redirect to your main domain.

    For exemple: digitalpoint.com will be redirected to the same domain with 'www'.

    Good luck!
     
    kevinnn, Mar 31, 2011 IP
  3. covetousrat

    covetousrat Greenhorn

    Messages:
    39
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #3
    Kevinn, thanks for your indepth reply.

    Btw I got the robot.txt from one of the search result from google. This is what he recommended from his blog. So I just copy and paste them. Do you have any good robot.txt that I can use?
     
    covetousrat, Mar 31, 2011 IP
  4. Alan Smith

    Alan Smith Active Member

    Messages:
    1,263
    Likes Received:
    12
    Best Answers:
    0
    Trophy Points:
    78
    #4
    You cannot use any copy pasted robots.txt file. You have to make it urself as per your website. Google helps in this issue.
     
    Alan Smith, Apr 7, 2011 IP
  5. kevinnn

    kevinnn Peon

    Messages:
    39
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Actually I never disable a map or file in the robots.txt

    I never had problems with it, but I also never store private information on a server.

    Ow yes, before I forgot: maybe it's smart to put your pages which have an email form in the robots.txt (disallow). Just to prevent spamming: if you disallow a spider to crawl that page, it won't be indexed. However, you can also arrange this with the robots meta tag.
     
    kevinnn, Apr 8, 2011 IP
  6. xavianer

    xavianer Active Member

    Messages:
    175
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    55
    #6
    Try to use the default robots.txt. That's what I used and google indexed my website.
     
    xavianer, Apr 19, 2011 IP
  7. clive2

    clive2 Peon

    Messages:
    343
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    I think you make a very complicated robots.txt
    I like to say that you make default robots.txt and submit it in google webmaster, that works fine and your site get lot of pages indexed.
     
    clive2, Apr 25, 2011 IP
  8. visitech

    visitech Peon

    Messages:
    27
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Here is simple robot.txt

    User-agent: *
    Allow: /
    Disallow: /wp-content/cache/
    Disallow: /wp-content/themes/
    Disallow: /wp-content/plugins/
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-login.php

    User-agent: Mediapartners-Google
    Allow: /

    User-agent: Adsbot-Google
    Allow: /

    User-agent: Googlebot-Image
    Disallow: /wp-content

    User-agent: Googlebot-Mobile
    Allow: /

    Sitemap: /sitemap.xml
     
    visitech, May 2, 2011 IP
  9. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #9
    your robots.txt seems fine to me except an error that is Disallow: /category/*/*, I understand that you do not want to get indexed the category pages to avoid duplicate content, but this would also disallow crawlers to follow the posts on those pages and your posts might take longer time to get indexed. The better solution in this case is by using Ultimate Noindex Nofollow Tool plugin in your wordpress website. This would allow you to add a meta tag <meta name="robots" content="noindex,follow" /> on your categories page, so your categories page would not be indexed but the posts on those pages would be followed. You can find the plugin at http://wordpress.org/extend/plugins/ultimate-noindex-nofollow-tool/
     
    manish.chauhan, May 3, 2011 IP
  10. covetousrat

    covetousrat Greenhorn

    Messages:
    39
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #10
    I have removed the robot.txt and just leave it blank. Seems to be doing fine for the moment.
     
    covetousrat, May 4, 2011 IP
  11. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #11
    It would work fine but it is not the solution, your category and tags pages that holds the posts content would also started being indexed very soon that you should avoid by using a robots.txt and Ultimate Noindex Nofollow Tool plugin.
     
    manish.chauhan, May 4, 2011 IP
  12. us2006

    us2006 Greenhorn

    Messages:
    26
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    11
    #12
    thank you very much
     
    us2006, May 5, 2011 IP