Google doesn't give a * about robots.txt

Discussion in 'Search Engine Optimization' started by lopes, May 21, 2009.

  1. #1
    Seriously, I'm pissed off.

    I have a domain that's for tests only, and G is indexing more pages of it everyday. The content of it's robots.txt is:
    -----------------------
    User-Agent: *
    Disallow: /
    -----------------------
    Please somebody tell me how to completely remove my test domain from G, or it will start hurting my actual domain ratings(duplicated content).
    Thanks....
     
    lopes, May 21, 2009 IP
  2. ~kev~

    ~kev~ Well-Known Member

    Messages:
    2,866
    Likes Received:
    194
    Best Answers:
    0
    Trophy Points:
    110
    #2
    Besides the robots.txt file, you can put noindex in the header of your pages. I dont remember the exact code, so you will have to look it up. But its a header tag that tells search engines not to index that page.
     
    ~kev~, May 21, 2009 IP
  3. lopes

    lopes Well-Known Member

    Messages:
    230
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    120
    #3
    Thanks for the reply! The problem is that I'd have to noindex all pages... Doesn't Google provide a form for webmasters to remove their domains? That'd be the best IMO.
     
    lopes, May 21, 2009 IP
  4. Microdot

    Microdot Well-Known Member

    Messages:
    72
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    120
    #4
    You could use a 301 redirect

    Webmaster tools remove tool

    Or password protect that directory

    That'll stop 'em.
     
    Microdot, May 21, 2009 IP
  5. keym4k3r

    keym4k3r Peon

    Messages:
    250
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #5
    You should not rely on robots.txt file only.
    I had some cases that while google visiting the pages
    the server didn't show robots file, or show it blank.
    So the spider seeing no restrictions indexed everything.
    If you want to restrict the whole content, yes, you should put meta tags to all the pages in the header.

    BTW in webmasters there is an option to remove pages from an index.
     
    keym4k3r, May 21, 2009 IP
  6. ~kev~

    ~kev~ Well-Known Member

    Messages:
    2,866
    Likes Received:
    194
    Best Answers:
    0
    Trophy Points:
    110
    #6
    Even if you go into your google webmaster tools and request that the pages be removed - it clearly states that unless you take measures to stop the indexing, the pages will get indexed again. So even if you ask google to remove the page, its still going to be indexed again.

    You could use htaccess and htpasswd to password protect the directory. Just do some google searches for htpasswd and you should be able to find something. I'am using this same method to protect a backup script for my sites.

    Once you have the file in place, when someone goes to view the site, or any directory that is under the .htaaccess file, they will be promoted for a username and password.
     
    ~kev~, May 21, 2009 IP
  7. lopes

    lopes Well-Known Member

    Messages:
    230
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    120
    #7
    I'm using the .htaccess method now, big thanks to all you guys, you saved me a lot of time!!
     
    lopes, May 21, 2009 IP