placed a robots.txt, but google is still indexing my website :(

Discussion in 'Search Engine Optimization' started by rohit_tripathi60, Dec 2, 2009.

  1. #1
    I have placed a robots txt file to prevent google crawling my website. but its still crawling it and my website is coming in google search result. I have done soemhing wrong? my robots.txt file is

    User-Agent: *
    Disallow: http://beta.healthcarejobssource.com/
     
    rohit_tripathi60, Dec 2, 2009 IP
  2. SunstarShop

    SunstarShop Peon

    Messages:
    582
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Why you just deny home page, not deny all of them. I think deny the directory is more effective!
     
    SunstarShop, Dec 2, 2009 IP
  3. bodmov

    bodmov Peon

    Messages:
    116
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    try also to forbid it with meta tags
     
    bodmov, Dec 2, 2009 IP
  4. rohit_tripathi60

    rohit_tripathi60 Active Member

    Messages:
    303
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    51
    #4
    how to do that? I dont want a single page from my webiste to b listed on any search engine till i complete it
     
    rohit_tripathi60, Dec 2, 2009 IP
  5. Tompxx

    Tompxx Peon

    Messages:
    38
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Placed again robots txt file in proper way..
     
    Tompxx, Dec 2, 2009 IP
  6. rohit_tripathi60

    rohit_tripathi60 Active Member

    Messages:
    303
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    51
    #6
    whats the proper way accordint to you?
     
    rohit_tripathi60, Dec 3, 2009 IP
  7. joelchrist

    joelchrist Banned

    Messages:
    1,646
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    78
    #7
    Is your site still under construction?
     
    joelchrist, Dec 3, 2009 IP
  8. rohit_tripathi60

    rohit_tripathi60 Active Member

    Messages:
    303
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    51
    #8
    yes and i dont want any of my web pages to get indexed by Google or any search engine.. please help anyone
     
    rohit_tripathi60, Dec 3, 2009 IP
  9. mudassir786

    mudassir786 Peon

    Messages:
    96
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    use the google webmaster tools and remove all the indexed pages through that ...
     
    mudassir786, Dec 3, 2009 IP
  10. sherone

    sherone Well-Known Member

    Messages:
    1,539
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    130
    #10
    try to deny access from your <meta> tag.
     
    sherone, Dec 4, 2009 IP
  11. shailendra

    shailendra Peon

    Messages:
    1,225
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Since now the pages have been indexed you will have to use removal tools from Google Webmaster Account to get the URLs out of index. In addition to robots.txt file you will have to apply the restrictions at your page level within the meta tags.
     
    shailendra, Dec 4, 2009 IP
  12. sally gomes

    sally gomes Member

    Messages:
    117
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    28
    #12
    if robots.txt is not working for you, then add robot meta tag to each page with nofollow and noindex attributes to prevent google from crawling and indexing your pages.
     
    sally gomes, Dec 4, 2009 IP
  13. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #13
    Your robots.txt file is as follows:

    This is an INVALID robots.txt. You cannot Disallow a protocol or a domain.

    Your robots.txt goes in the root of your web. I can ONLY disallow folders and files on that web. Robots could care less what protocol is used and what domains and/or subdomains resolve to that root web folder.

    The correct way to disallow your entire beta site is in the root folder of beta.healthcarejobssource.com place a robots.txt that looks like:

    This says, "Disallow crawling of anything from the root folder down.

    WARNING: IF beta.healthcarejobssource.com resolves to exactly the same folder as healthcarejobssource.com then the above Disallow: / will prevent them from crawling your production web as well.

    But if the DNS for healthcarejobssource.com resolves to a folder called ROOT on your webserver and beta.healthcarejobssource.com resolves to some subfolder like ROOT/BETA on your webserver (or to a totally different web server) then placing the above robots.txt in the ROOT/BETA folder (or on the separate server for beta.healthcarejobssource.com) will work.

    PS: Once you have the robots.txt in place THEN you'll need to go to Google Webmaster Tools, go through the Google site verification process for the beta.healthcarejobssource.com site to prove you're the webmaster. And then request a URL removal for that entire site.

    NOTE: Google can STILL continue to show your beta pages in their index if other sites are linking to those pages even if you have the beta site blocked by robots.txt AND you get Google to remove the URLs from the index. If Google feels your page is relevant to a search they can STILL show the URL and construct a title to show in the SERPs based on the link text used to link to it without ever crawling those pages. These types of entries show in the SERPs with no snippet.

    Best solution IMO:

    An even better approach to prevent Google indexing your site would be to NOT disallow the crawl and simply render ALL beta.healthcarejobssource.com pages with a <meta name="robots" content="noindex"> element in the <head> of each page. The only way they can find the meta noindex element on the beta pages is if you ALLOW them to crawl. The meta noindex will also cause them to remove the pages from the index without having to submit a URL removal request.
     
    Last edited: Dec 4, 2009
    Canonical, Dec 4, 2009 IP
  14. orphelin

    orphelin Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #14
    lol, it's should know, even newbies :)
     
    orphelin, Dec 7, 2009 IP
  15. rohit_tripathi60

    rohit_tripathi60 Active Member

    Messages:
    303
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    51
    #15

    Thanks for the detailed overview. the website has more than 1000 pages wo would it b possible to place noindex tags on all of them?
     
    rohit_tripathi60, Dec 7, 2009 IP
  16. seoperson

    seoperson Peon

    Messages:
    501
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #16
    IMHO, you can disallow the root directory with disallow command as you have already done so and ask for a removal request from google webmaster central located at http://www.google.com/webmasters/tools

    that will be much
    Hope it helps !
     
    seoperson, Dec 7, 2009 IP