1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

How to STOP indexing web pages

Discussion in 'robots.txt' started by Arcos, Dec 23, 2005.

  1. #1
    I need to stop the search engines indexing certain pages containing sensitive information that is not for general public.

    Can someone please tell me how I can do this?

    Thanks
     
    Arcos, Dec 23, 2005 IP
  2. Smyrl

    Smyrl Tomato Republic Staff

    Messages:
    13,740
    Likes Received:
    1,702
    Best Answers:
    78
    Trophy Points:
    510
    #2
    I would put sensitive material in password protected directory.

    -----------
    mdvaldosta, I changed my advice regarding robots.txt since not all robots obey.

    Shannon
     
    Smyrl, Dec 23, 2005 IP
  3. mdvaldosta

    mdvaldosta Peon

    Messages:
    4,079
    Likes Received:
    362
    Best Answers:
    0
    Trophy Points:
    0
    #3
    disallow the folders or pages in robots.txt like Shannon suggested, and I'd put nofollow condoms on the links pointing to the sensitive information.
     
    mdvaldosta, Dec 23, 2005 IP
  4. Arcos

    Arcos Peon

    Messages:
    474
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Thanks for the prompt feedback.

    The pages are password protected but, if you carry out a search on MSN for example for a specific item that does not belong in the public domain, it comes straight through along with another number of items in that directory!!

    No you cant acces the directory but you can see a proportion of what is in there.

    As for robot.txt, its a bit beyond me that one and as for condoms, well I have a 6 month old boy at 42 so not sure what they are either!!;)
     
    Arcos, Dec 23, 2005 IP
  5. Smyrl

    Smyrl Tomato Republic Staff

    Messages:
    13,740
    Likes Received:
    1,702
    Best Answers:
    78
    Trophy Points:
    510
    #5
    Well I would suggest you follow mdvaldosa's advice with robots.txt directive (do search for robots.txt tutorial) as well as using no follow links.

    I have had good luck keeping information private as long as there is absolutely no page online with link to web or page. I suggest you immediately change name of folder or pages so existing links in search engines will not work. I have read search engines will index password protected areas based on links into area.

    Good luck.

    Shannon
     
    Smyrl, Dec 23, 2005 IP
  6. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Sorry, but robots.txt is NOT the way to protect confidential files. Never try to use it to protect sensitive information. :eek:

    If you do not want your directories to be visible (and listed by search engines), add an index.html in these directories. The content of these index.html can be anything (for example, an invitation to go to the home page of the site).

    Jean-Luc
     
    Jean-Luc, Dec 23, 2005 IP
  7. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #7
    There's nothing to stop you from using ALL of the above methods, by the way - especially if yuou REALLY do not want those files indexed.
     
    minstrel, Dec 23, 2005 IP
  8. sufyaaan

    sufyaaan Banned

    Messages:
    218
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #8
    You can forcefully deny access to those pages by editing your .htaccess file on your Apache webserver.
     
    sufyaaan, Dec 24, 2005 IP
  9. danijelpendjer

    danijelpendjer Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=93710&from=61050&rd=1

    In short:

    <html>
    <head>
    <title>...</title>
    <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
    </head>
     
    danijelpendjer, Jul 6, 2011 IP
  10. TheCube

    TheCube Peon

    Messages:
    26
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Password protect your directory; and for peace of mind remove with robots.txt and in HTML. If it has been indexed - remove with web master central.
     
    TheCube, Jul 9, 2011 IP
  11. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #11
    Why don't you restrict those pages with htaccess... allow your IPs so that you can access those pages and restrict all other IPs. This would be quite simple solution.
     
    manish.chauhan, Aug 1, 2011 IP
  12. frye.nora

    frye.nora Peon

    Messages:
    48
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #12
    in fact only .htaccess restriction work to disallow any robots / any access. robots still crawl any links even you set noindex meta, dissallow it in robots.txt. they still sniff our sensitive data, they're collecting it.

    when they say they don't index it, it only means that they don't show it in the index BUT they still have it. watch your access logs once a while to get my point.
     
    frye.nora, Aug 1, 2011 IP
  13. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #13
    Exactly..!! Thats the point
     
    manish.chauhan, Aug 1, 2011 IP
  14. risteard

    risteard Peon

    Messages:
    9
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #14
    make the robot.txt with proper utilization of syntax.
     
    risteard, Aug 24, 2011 IP
  15. jhon786

    jhon786 Peon

    Messages:
    62
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #15
    think you need to put robot.txt file to your site, this will disallow google bot to crawl your webpage which you don't want to get indexed.
    rest wait for more answers here, may be we will get some more information.
    [TABLE="width: 335"]
    [TR]
    [TD="width: 335"][/TD]
    [/TR]
    [/TABLE]
     
    jhon786, Aug 25, 2011 IP