How to deindex my pages from google

Discussion in 'Search Engine Optimization' started by BeeArcade, Jul 14, 2009.

  1. #1
    BeeArcade, Jul 14, 2009 IP
  2. theapparatus

    theapparatus Peon

    Messages:
    2,925
    Likes Received:
    119
    Best Answers:
    0
    Trophy Points:
    0
    #2
    theapparatus, Jul 14, 2009 IP
  3. choice

    choice Prominent Member

    Messages:
    5,444
    Likes Received:
    490
    Best Answers:
    0
    Trophy Points:
    350
    Digital Goods:
    2
    #3
    you can put in a removal request for some of your pages in google webmaster tools
     
    choice, Jul 14, 2009 IP
  4. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #4
    First thing you need to do is put something in place to keep the pages you plan to remove from being re-indexed. You have 2 easy choices:

    1) add a <meta name="robots" content="noindex"> element to the <head> of any page you don't want indexed
    2) add entries to robots.txt to disallow: each page you don't want indexed like:

    THEN and ONLY then should you request URL removal via Google's Webmaster Tools. Otherwise, they will remove it and then reindex it the next time they crawl your site.

    NOTE: I prefer to use the <meta name="robots" content="noindex"> element over robots.txt for 2 reasons:

    1) If you screw it up, you only effect the 1 page that contained the <meta name="robots" content="noindex"> where screwing up your robots.txt can affect LOTS of pages on your site.
    2) Using robots.txt to block a page means that the page in question is not only NOT indexed but EVERY link on the page in question is NOFOLLOWed. <meta name="robots" content="noindex"> keeps it from being indexed but allows the links on that page to be followed.
     
    Canonical, Jul 14, 2009 IP
  5. theapparatus

    theapparatus Peon

    Messages:
    2,925
    Likes Received:
    119
    Best Answers:
    0
    Trophy Points:
    0
    #5
    The robots.txt method may not work. If you click on the example links, the out.php file contains additional content to the right within the url. You would have to mark that with a wildcard and folks have reported that Google doesn't always pay attention to those rules within htaccess.

    That's why I suggested the meta tag and the noindex method.
     
    theapparatus, Jul 14, 2009 IP
  6. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #6
    There is ALWAYS an "assumed" wildcard at the end of a Disallow directive. You just should not use them in the middle of a URL. While Google supports wildcards in the middle of the URL on a limited basis, other engines do not... So:

    is always logically equivalent to Disallow: /directory1/* (i.e. don't crawl any URL that starts w/ "/directory1/" followed by any string).

    If this directive is placed in the robot.txt in the root folder of http://www.example.com/ then it will prevent compliant crawlers from indexing URLs like:

    but it would NOT block requests for

    without a trailing '/'. If you wanted to block all of the above PLUS http://www.example.com/directory1 then you would need to use:

    instead. However, beware because this is logically equivalent to Disallow: /directory1* which will also block requests for URLs like:

    since all of them start w/ http://www.example.com/directory1. This might not be the desired effect.

    Similarly:

    is logically equivalent to Disallow: /out.php* which would prevent robots from crawling URLs like

    If you wanted to allow robots to index http://www.example.com/out.php but block robots from indexing urls of the form http://www.example.com/out.php followed by one or more querystring parameters then use:

    This is equivalent to Disallow: /out.php?* which would allow the robots to index http://www.example.com/out.php but prevent them from indexing URLs like:

    Similarly:

    would be equivalent to /out.php?id=11* and would block:

    but it will NOT block:

    Hopefully you're starting to see the pattern...
     
    Canonical, Jul 14, 2009 IP
  7. bank

    bank Peon

    Messages:
    185
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Do "not" use robots.txt otherwise the page title will stay indexed (with no description in the SERP's) provided pages still link to the blocked page.

    Use the meta noindex.
     
    bank, Jul 14, 2009 IP
  8. LisaRole

    LisaRole Peon

    Messages:
    21
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Try editing your meta in an seo friendly way. And try to follow organic SEO as well.
     
    LisaRole, Jul 14, 2009 IP
  9. ericajoieake

    ericajoieake Guest

    Messages:
    556
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #9
    good strategy by canonical, but I think using noindex to your page much better to deindex it.
     
    ericajoieake, Jul 14, 2009 IP
  10. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #10
    I agree that <meta name="robots" content="noindex"> is the better option for MANY reasons... I even said so in my 1st post above. I was simply explaining to theapparatus how robots.txt Disallows: work and how it "could" be used to prevent the two pages in question from being reindexed, not to get it deindexed. :)
     
    Canonical, Jul 14, 2009 IP
  11. rena

    rena Peon

    Messages:
    1,987
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Well said in the replay.. robot.txt is the best ways... all even met tag telling not to crawl in the future.. means Google will stop crawling but not remove the index immediately after read the tag or robot.txt.. will take some time to deindex....but some site may take long time
     
    rena, Jul 14, 2009 IP
  12. BeeArcade

    BeeArcade Active Member

    Messages:
    347
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    55
    #12
    thanks for your suggestions , i have requested google to deindex it in google webmaster tools, because i found it, the most easy method.

    Thanks
     
    BeeArcade, Jul 15, 2009 IP