.htaccess and robots.txt

Discussion in 'Apache' started by Darik, Apr 17, 2006.

  1. #1
    Hi,

    I'm trying to disallow robots.txt from viewing with .htaccess like that:

    <Files .htaccess>
    order allow,deny
    deny from all
    </Files>

    <Files robots.txt>
    order allow,deny
    deny from all
    </Files>


    Now, my question is : could the robots read the robots.txt and if not does it mean that they will index the whole site?

    Thanks.
     
    Darik, Apr 17, 2006 IP
  2. fsmedia

    fsmedia Prominent Member

    Messages:
    5,163
    Likes Received:
    262
    Best Answers:
    0
    Trophy Points:
    390
    #2
    By default, no one can view .htaccess except if you have FTP or SSH access tot he account. By default, everyone can view robots.txt. In my opinion, what you're trying to do is pointless and it will work right out of the box without having anything in there at all.
     
    fsmedia, Apr 17, 2006 IP
  3. Darik

    Darik Guest

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Hi,

    Well, that's exactly what I don't want.

    Thanks for the replay.
     
    Darik, Apr 17, 2006 IP
  4. fsmedia

    fsmedia Prominent Member

    Messages:
    5,163
    Likes Received:
    262
    Best Answers:
    0
    Trophy Points:
    390
    #4
    Why not just remove the file then? There's no point of having the file if no one can view it.
     
    fsmedia, Apr 17, 2006 IP
  5. Darik

    Darik Guest

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    I'm sorry if my question was poorly worded. English is not my mother tongue.


    What I'm trying to do is :

    1) allow robots to read robots.txt
    2) I don't want someone to go to http://mysite.com/robots.txt and see what I'm trying to hide from indexing.

    Is there a way to restrict the access to this file?
     
    Darik, Apr 17, 2006 IP
  6. fsmedia

    fsmedia Prominent Member

    Messages:
    5,163
    Likes Received:
    262
    Best Answers:
    0
    Trophy Points:
    390
    #6
    The only way you would be able to restrict that would be to use IP addresses to deny or allow things. This gets sticky because robots are constantly changing and adding more IP addresses. You are better off either having it or not having it. You could easily just use the .htacess to block anyone from seeing a specific /folder/ though. You could either use .htaccess to block it entirely or use https (secure) and search engines would not index it.

    To do what you're trying is pretty much impossible and not a good idea. While everything is 'possible', it's a lot harder than you think and it's not worth investing that much effort into it. Just make the robots.txt file and either make it viewable to all or viewable to no one. Sorry.
     
    fsmedia, Apr 17, 2006 IP
  7. Mystique

    Mystique Well-Known Member

    Messages:
    2,579
    Likes Received:
    94
    Best Answers:
    2
    Trophy Points:
    195
    #7
    robots.txt and other files can be excluded from default directory listing if you add at the very top of the .htaccess this:

    IndexIgnore .htaccess robots.txt
    Options All -Indexes

    If you deny access to robots.txt, some Search Engines look for this file as the first thing when they crawl your site, so you are kicking them off or simply they will ignore your robots exclusions.

    As for .htaccess file, use the setting that should be used by the owner of your web server in the apache configuration file:

    <Files ~ "^\.ht">
    Order allow,deny
    Deny from all
    </Files>

    this prevent from being read both, .htaccess and htpasswd when exists in a password protected directory
     
    Mystique, Apr 17, 2006 IP
  8. Darik

    Darik Guest

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    @ Mystique and fsmedia!

    Thanks for your help!
     
    Darik, Apr 17, 2006 IP