robots.txt disallow

Discussion in 'robots.txt' started by girbaud, Jan 13, 2006.

  1. #1
    hi.
    i just want to ask if my codes are correct??

    User-agent: *
    Disallow:


    User-agent: *
    Disallow: /public_html/configure/


    User-agent: *
    Disallow: /public_html/images/
     
    girbaud, Jan 13, 2006 IP
  2. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Your three robots.txt files are correct.

    The first version does not disallow anything. The second and third versions disallow access to the mentioned directory.

    Jean-Luc
     
    Jean-Luc, Jan 13, 2006 IP
  3. girbaud

    girbaud Peon

    Messages:
    293
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #3
    is robots.txt important to my site?
    do i need to have one?
     
    girbaud, Jan 13, 2006 IP
  4. ServerUnion

    ServerUnion Peon

    Messages:
    3,611
    Likes Received:
    296
    Best Answers:
    0
    Trophy Points:
    0
    #4
    You do not need the first entry. Good luck
     
    ServerUnion, Jan 13, 2006 IP
  5. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #5
    You do not need to have one. You only need one if you want to ask the robots not to visit some parts of your site. If you don't have a robots.txt, the robots will be quite happy:D , because it means they are allowed to visit all your site.

    If there is no robots.txt in your site, each tentative access of the robots to read it will return a 404 error (file not found). This is not a problem at all for the robots, but it will show up in your stats as a series of 404 errors. To avoid that, you can use an empty robots.txt file or a robots.txt file that does not disallow anything, like your first example.

    Some robots do not follow the instructions in robots.txt.

    Jean-Luc
     
    Jean-Luc, Jan 13, 2006 IP
  6. girbaud

    girbaud Peon

    Messages:
    293
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #6
    okay. thanks for all the responses.
    is it true that robots can get you banned in search engines?
     
    girbaud, Jan 13, 2006 IP
  7. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #7
    What do you mean ?

    A robot is a computer program operated by a search engine, a research organization, a University, an individual. To some extend, this computer program visits your web site like a human user would do.

    Search engines use robots to know what the contents of your pages is. So you better allow them to visit your site, if you want search engine to let the world know that your site exists.

    Some "bad robots" search for email addresses or known vulnerabilities in your web site. These robots do not respect the robots.txt standard anyway. For time being, you should probably not worry about them.

    Hidden contents, some types of "unfair" link exchanges,... can get you banned from search engines. "Bad robots" cannot.

    Jean-Luc
     
    Jean-Luc, Jan 13, 2006 IP
  8. girbaud

    girbaud Peon

    Messages:
    293
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #8
    i've used a site that generates robots.txt... when im done with filling up the fields....

    " Now you just copy the text above, create a new .txt file called robots.txt, paste the above text in it and upload it to the root your server ! That's all there is, be careful with editing, because a mistake can ban your site to the search engines for a long time. Last but not least, we are never responsible for the results of our tool, we do our best to make quality tools, but after all it's up to you to use our generated robots.txt file."


    what does it mean by this "That's all there is, be careful with editing, because a mistake can ban your site to the search engines for a long time."
     
    girbaud, Jan 13, 2006 IP
  9. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #9
    User-agent: *
    Disallow: /
    Code (markup):
    means that you do not want any robot in any directory of your site.

    A search engine will not "ban" your site if it sees this, but it will respect your disallow instruction. At the same time, it will probably decide to postpone its next visit to your site to an undefined date.

    Jean-Luc
     
    Jean-Luc, Jan 13, 2006 IP
  10. girbaud

    girbaud Peon

    Messages:
    293
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #10
    thanks a lot Jean-Luc :)
     
    girbaud, Jan 13, 2006 IP
  11. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #11
    Do it like this:

    User-agent: *
    Disallow: /public_html/configure/
    Disallow: /public_html/images/
    
    Code (markup):
    but also check that "public_html" is a real directory on your site. Many sites these days use virtual hosting, so that a domain like www.somesite.com is physically situated in a directory like /~somesite/public_html/ but you wouldn't use that in the robots.txt file -- you'd just use the root directory and subdirectories for your domain. Thus, for a typical site, the robots.txt file would look like this for yout example above:

    User-agent: *
    Disallow: /configure/
    Disallow: /images/
    
    Code (markup):
     
    minstrel, Jan 14, 2006 IP
  12. girbaud

    girbaud Peon

    Messages:
    293
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #12
    that's a nice advise minstrel! thanks for that one.
    "public_html" is a real directory on my site, so i'll be using the smaller code. its more simple than the one i used.
     
    girbaud, Jan 16, 2006 IP
  13. genny2006

    genny2006 Peon

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Hey guys,

    I have a forum and it has been getting so much spam i can barely keep up:( . I just wanted to know, do i really only need a robots.txt file to block? Thats it?:confused:
    This would save me so much time.
    I need an answer quick please.

    Thank you

    Genny2006
     
    genny2006, Mar 8, 2006 IP
  14. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Hi,

    robots.txt does not block anything. It "asks" robots not to visit some pages. Bad intended robots don't read it or read it to try to discover targets for their attacks.

    You need other methods to fight spam. By the way, what kind of spam do you have in your forum ?

    Jean-Luc
     
    Jean-Luc, Mar 8, 2006 IP
  15. genny2006

    genny2006 Peon

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #15
    I get all kinds of spam, from weird symbols to porn.

    How can i fight this???

    Genny
     
    genny2006, Mar 8, 2006 IP
  16. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #16
    These robots register in your forum and then they post all kind of stuff ?

    Try to find out if there are repetitive elements in these multiple registrations: same IP-range, same words used,... Then your actions will depend on what you are able to do as admin. Maybe you can block access to some of these IP-ranges for a while or force people to copy some text to register.

    Jean-Luc
     
    Jean-Luc, Mar 8, 2006 IP
  17. genny2006

    genny2006 Peon

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #17
    They dont register, they just go in and post.
    I can now spend half my days just removing all the junk they post.

    I have 2 ideas of things to do but im trying to find pros and cons. The site is made in PHP. I was thinking of doing an image validation when the user clicks post. Or someone said that i can use the Microtime() function, which would differenciate the time interval that the robots use and the users use. and if it takes a certain amount of time either let it through or not.

    Genny
     
    genny2006, Mar 8, 2006 IP
  18. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #18
    Is this a forum? or a blog? if so, what software are you using?
     
    minstrel, Mar 8, 2006 IP
  19. genny2006

    genny2006 Peon

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #19
    Its a forum, this is the site, homeressources.com/forum/index.php, i dont know about any software.
     
    genny2006, Mar 8, 2006 IP
  20. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #20
    minstrel, Mar 8, 2006 IP