Question on using robots.txt to block Googlebot

Discussion in 'robots.txt' started by m021478, May 20, 2008.

  1. #1
    Google help documentation about removing/preventing its spider from indexing all, or a portion , of your site, it mentions:

    -------
    For example, if you're manually creating a robots.txt file, to block Googlebot from crawling all pages under a particular directory (for example, lemurs), you'd use the following robots.txt entry:

    User-agent: Googlebot
    Disallow: /lemurs/
    -------

    Does this mean, for example, that if my domain name was johndoe-dot-com, and I wanted to block all items contained within a particular directory on my site (which in this example, let's call "some_directory"wink, which I would normally access misc files contained in that directory by visiting www.johndoe-dot-com/some_directory/somefile.doc, then I would configure a .txt file with the aforementioned robots.txt configured as such:

    User-agent: Googlebot
    Disallow: /some_directory/

    Probably a stupid question, but I did want to double check just to be sure...

    Any suggestions would be greatly appreciated... Thanks!
     
    m021478, May 20, 2008 IP
  2. Eka

    Eka Peon

    Messages:
    158
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I dont understand what exactly is your question. Please try again!
     
    Eka, May 21, 2008 IP
  3. make-it-yourself.me.uk

    make-it-yourself.me.uk Active Member

    Messages:
    310
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    60
    #3
    hi, not that i know much about them BUT,

    you use a robot.txt

    and someone dodgy comes along to your site and you have disallow on there with a index file in that directory which is not pass word protected,

    you lose,

    so when using the robots.txt i would say dont have a index.html/index.php or default pages there as it will reveal everything on using the disallowed folders in the url bar,

    hope this help out,

    and you are correct in the way you are talking about it, and the robots.txt goes in the root folder (public_html/robots.txt)

    thats what i know
     
    make-it-yourself.me.uk, May 21, 2008 IP
  4. astup1didiot

    astup1didiot Notable Member

    Messages:
    5,926
    Likes Received:
    270
    Best Answers:
    0
    Trophy Points:
    280
    #4
    Yes, that is the correct format.

    
    User-agent: Googlebot
    Disallow: /some_directory/
    
    Code (markup):
     
    astup1didiot, May 21, 2008 IP
  5. MeetHere

    MeetHere Prominent Member

    Messages:
    15,399
    Likes Received:
    994
    Best Answers:
    0
    Trophy Points:
    330
    #5
    MeetHere, May 21, 2008 IP
  6. m021478

    m021478 Peon

    Messages:
    10
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Thanks for your response...I have to admit though, I definitely feel more confused now that I did before (not your fault...rather, my domain\hosting newbie-ness has finally caught up to me)...

    Perhaps this is slightly off-topic, but it does pertain to the reason I asked the original question in the first place...How then would I go about creating a password protected directory on my website, which I could access via any web browser any where in the world (after entering in a password, of course) which I could dump files into from my home FTP client?

    I am currently signed up with Yahoo! Small Business as my Domain Host...Any ideas?

    Thanks!
     
    m021478, May 22, 2008 IP
  7. enous

    enous Well-Known Member

    Messages:
    1,500
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    158
    #7
    Why ban the bot. I use dofollow all..
     
    enous, May 22, 2008 IP
  8. make-it-yourself.me.uk

    make-it-yourself.me.uk Active Member

    Messages:
    310
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    60
    #8


    to password the directory it should be in your control panel,
    if not use a .httacess file, google that, there is alot of content on the subject
     
    make-it-yourself.me.uk, May 24, 2008 IP
  9. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #9
    If you allow all bots then many spammy bots could access your website thats primary job is to harvest email addresses from the website for spamming purpose and also these bots eat too much bandwidth of your website...:)
     
    manish.chauhan, May 26, 2008 IP