1. Can any1 help me to check my robot.txt. It's been around 3 weeks since my new website. The only page indexed is the main home page. All the content pages has not been indexed yet. I am using wordpress Platinum SEO. This is my website. www.holyhamsters.com User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /category/*/* Disallow: */trackback Disallow: */feed Disallow: */comments Disallow: /*?* Disallow: /*? Allow: /wp-content/uploads User-agent: Googlebot Disallow: /*.pdf$ Disallow: /*.php$ Disallow: /*.js$ Disallow: /*.cgi$ Disallow: /*.xhtml$ Disallow: /*.php* Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.txt$ Disallow: /*?* Disallow: /wp-* Disallow: */feed/ Disallow: */trackback/ Disallow: /cgi-bin/ Disallow: /go/ Allow: /wp-content/uploads/ User-agent: Googlebot-Image Allow: /* 2. Is my sitemap correct? www.holyhamsters.com/sitemap.xml 3. On google webmasters I have submitted both www.holyhamsters.com and holyhamsters.com. However, on the one without www, I am getting an error cause my sitemap appears to be in the www.holyhamsters.com/sitemap.xml. Any fix to this? Sorry this is my first wordpress website.
Ok, your robots.txt looks terrible (sorry). Why are you disallowing all the submaps from your Wordpress site? Ofcourse, Google will not index your Wordpress! The standard configuration that a Wordpress makes is good enough. You don't need to add too much stuff yourself (maybe a SEO pack plugin to add meta tags). 2). You can validate your XML sitemap on xml-sitemaps.com I did this for you, and you don't have an error, which means that your sitemap is working. As mentioned above: if all the pages are in your sitemap, but you disallow Google to crawl them, then still Google will not show it in their search engine. 3). In the GOogle Webmaster Tools you can change your preferences for each domain. In your settings, you can select a preference: use the domain with or without www. So you don't need to add 2 'sites' for the same domain. (as a sitenote: this can be targetted as duplicate content, which is also a reason not indexing your site). Many sites make this mistake: their website can be found by going to myown#website.com or with the 'www' variant. Google will see this as duplicate content, be sure to make a redirect to your main domain. For exemple: digitalpoint.com will be redirected to the same domain with 'www'. Good luck!
Kevinn, thanks for your indepth reply. Btw I got the robot.txt from one of the search result from google. This is what he recommended from his blog. So I just copy and paste them. Do you have any good robot.txt that I can use?
You cannot use any copy pasted robots.txt file. You have to make it urself as per your website. Google helps in this issue.
Actually I never disable a map or file in the robots.txt I never had problems with it, but I also never store private information on a server. Ow yes, before I forgot: maybe it's smart to put your pages which have an email form in the robots.txt (disallow). Just to prevent spamming: if you disallow a spider to crawl that page, it won't be indexed. However, you can also arrange this with the robots meta tag.
I think you make a very complicated robots.txt I like to say that you make default robots.txt and submit it in google webmaster, that works fine and your site get lot of pages indexed.
Here is simple robot.txt User-agent: * Allow: / Disallow: /wp-content/cache/ Disallow: /wp-content/themes/ Disallow: /wp-content/plugins/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-login.php User-agent: Mediapartners-Google Allow: / User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Image Disallow: /wp-content User-agent: Googlebot-Mobile Allow: / Sitemap: /sitemap.xml
your robots.txt seems fine to me except an error that is Disallow: /category/*/*, I understand that you do not want to get indexed the category pages to avoid duplicate content, but this would also disallow crawlers to follow the posts on those pages and your posts might take longer time to get indexed. The better solution in this case is by using Ultimate Noindex Nofollow Tool plugin in your wordpress website. This would allow you to add a meta tag <meta name="robots" content="noindex,follow" /> on your categories page, so your categories page would not be indexed but the posts on those pages would be followed. You can find the plugin at http://wordpress.org/extend/plugins/ultimate-noindex-nofollow-tool/
It would work fine but it is not the solution, your category and tags pages that holds the posts content would also started being indexed very soon that you should avoid by using a robots.txt and Ultimate Noindex Nofollow Tool plugin.