1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Error with robots.txt

Discussion in 'Google Sitemaps' started by Beattle, May 5, 2012.

  1. #1
    I am having a problem with Google Webmaster Tools and submiting a sitemap for one of my sites. I have 3 virtual sites and the master site running on a Wordpress Multisite install. All of the sites except the master site are in virtual subdomains.

    I have submited sitemaps for two of the subdomain sites successfully, but not for the master site, as I have not decided exactly what to do with it yet. The master site has no pages and no posts at the moment. It is not blocked at all.

    When i tried to submit the sitemap for the last virtual subdomain site that I created, Webmaster Tools returned that the site was completely blocked by robots.txt, and shows it's robots.txt results as:

    User-agent: *
    Disallow:
    SEMrush
    I actually have a real robots.txt file in the root of the domain, where Wordpress is installed. I have tested the robots.txt file with an on-line tester, and it checks out fine.

    Here is the text in my robots.txt file:

    User-agent: *
    Allow: /
    Disallow: /cgi-bin/
    Disallow: /wp-admin/
    Disallow: /wp-content/cache/
    Disallow: /wp-includes/
    Disallow: /?wptheme
    Disallow: /?p=

    Somehow, Google seems to be seeing a different robots file than the one I have created. The real robots.txt file should over ride any virtual robots.txt file anyway from what I have read. I have Wordpress for all of the sites set to allow bots to index the sites. and I am using the Yoast Wordpress SEO plugin. I have used this plugin on several sites and had no problems with it.

    The main site is http://galaxyindigo.com and the one I am having trouble with is http://traveldeals.galaxyindigo.com

    Any suggestions? :confused:
     
    Beattle, May 5, 2012 IP
    SEMrush
  2. Anil Strivastava

    Anil Strivastava Peon

    Messages:
    102
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Sounds like (from other discussions) you may be stuck requiring a dynamic robot.txt file which detects what domain the bot is on and changes the content accordingly. This means the server has to run all .txt file as (I presume) PHP.
    Or, you could conditionally rewrite the /robot.txt URL to a new file according to sub-domain
    RewriteEngine on
    RewriteCond %{HTTP_HOST} ^subdomain.website.com$
    RewriteRule ^robotx\.txt$ robots-subdomain.txt
    Then add:
    User-agent: *
    Disallow: /
    to the robots-subdomain.txt file
    (untested)
     
    Anil Strivastava, May 8, 2012 IP
  3. Beattle

    Beattle Member

    Messages:
    9
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    31
    #3
    I think I have this setup properly, although what seems to have happened is that I had the one site set to private with the Wordpress privacy setting while I put the site together, and that is when the googlebot first visited the site. I did not have the real robots.txt file in the root of the site at that time. Perhaps Google keeps that original Disallow: / setting during setup hanging around until they feel like going back rather than when we request a crawl. I have checked what testing sites see when given the url, and the results are what I want, but google is/was reporting something that is not there. I think that from now on, if I want to set up a new site and keep it private during set up that I should manually add the setting to the robots.txt file. I could then remove that one line after setup.
     
    Beattle, May 9, 2012 IP