Is robots.txt really necessary??

Discussion in 'robots.txt' started by paulinmargarita, Jun 23, 2007.

  1. #1
    I've just started playing with a different seo programme, IBP. I see on their page report they say the following.

    'Your web pages uses the meta robots tag to allow search engines to index your web page. Actually you can remove this tag as search engines will still index your web page if this tag is missing'

    Is this correct??
     
    paulinmargarita, Jun 23, 2007 IP
  2. DeViAnThans3

    DeViAnThans3 Peon

    Messages:
    785
    Likes Received:
    83
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Robots.txt is absolutely not necessary, however it is recommended to use.
    But it is not necessary; if there is no robots.txt file, the SE will index all of the pages the SE can find on your website.
     
    DeViAnThans3, Jun 24, 2007 IP
  3. iRAY

    iRAY Peon

    Messages:
    21
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I hardly recommend you to use robots.txt (including complete blank one), because some web hostings do not correctly return HTTP 404 when robots.txt is not found and Google in this case does not start to index your site (really - I am sure - it is info from Google staff).
     
    iRAY, Jun 24, 2007 IP
  4. trichnosis

    trichnosis Prominent Member

    Messages:
    13,785
    Likes Received:
    333
    Best Answers:
    0
    Trophy Points:
    300
    #4
    robots.txt is required . it's also listed in google webmaster qualiry guides
     
    trichnosis, Jul 3, 2007 IP
  5. explorer

    explorer Well-Known Member

    Messages:
    463
    Likes Received:
    40
    Best Answers:
    0
    Trophy Points:
    110
    #5
    This isn't strictly a robots.txt issue. You're talking about a meta tag that goes within the <head></head> part of your pages.

    What IBP say is absolutely correct. You don't need this tag for search engines to index your page.
     
    explorer, Jul 4, 2007 IP
  6. Phaethon

    Phaethon Peon

    Messages:
    113
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #6

    What are you talking about man? robots.txt is a joke. It's like saying the the bots "I'd prefer you stay out, but if you want to come in, there's really nothing I can do to stop you". It's utterly pointless, and really doesn't do much of anything to my understanding.
     
    Phaethon, Jul 6, 2007 IP
  7. chuckd1356

    chuckd1356 Active Member

    Messages:
    770
    Likes Received:
    31
    Best Answers:
    0
    Trophy Points:
    70
    #7
    Do you have anything to back that up?
     
    chuckd1356, Jul 7, 2007 IP
  8. Dan Schulz

    Dan Schulz Peon

    Messages:
    6,032
    Likes Received:
    436
    Best Answers:
    0
    Trophy Points:
    0
    #8
    The robots.txt protocol is not a joke. It's a useful tool to prevent certain bots from spidering certain pages/sections of sites while allowing others through. It's also a lot easier to modify a robots.txt file than it is to update dozens, hundreds or even thousands of Web pages to make a single change. Also, unlike the META tag, the robots.txt file can block SPECIFIC search engine spiders from crawling and indexing your site (didn't I say that already?).

    And has already been mentioned, not having a robots.txt file can clutter your server logs with needless 404 error returns (just as not having a favicon.ico file will generate 404 errors since they get sent out with the rest of the Web page when it's requested by the user agent - which is in most cases a traditional Web browser).
     
    Dan Schulz, Jul 8, 2007 IP
  9. danieloffice

    danieloffice Peon

    Messages:
    472
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #9
    I am new to this, so need some help.

    I do not need to block any part of my site to being crawling by SE.

    But I do need to handle the issue of 404, as what you said "not having a robots.txt file can clutter your server logs with needless 404 error returns "

    So, could you please help to post a simple robot.txt here so that I just upload it to my website.

    regards
     
    danieloffice, Jul 9, 2007 IP
  10. explorer

    explorer Well-Known Member

    Messages:
    463
    Likes Received:
    40
    Best Answers:
    0
    Trophy Points:
    110
    #10
    Upload these two lines in a file named robots.txt:

    User-agent: *
    Disallow:


    This allows all bots in, everywhere.
     
    explorer, Jul 10, 2007 IP
  11. Dan Schulz

    Dan Schulz Peon

    Messages:
    6,032
    Likes Received:
    436
    Best Answers:
    0
    Trophy Points:
    0
    #11
    And be sure to put it in your main HTML folder (usually html_public, could be www/web or something else, depending on your server's OS version).
     
    Dan Schulz, Jul 10, 2007 IP
  12. MatthewN

    MatthewN Well-Known Member

    Messages:
    859
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    195
    #12
    Id go with a robots.txt file even if it was just the standard one that explorer mentioned above.
     
    MatthewN, Jul 10, 2007 IP
  13. Dan Schulz

    Dan Schulz Peon

    Messages:
    6,032
    Likes Received:
    436
    Best Answers:
    0
    Trophy Points:
    0
    #13
    If I recall correctly, you can replace Disallow: with Allow: /

    However, this isn't a very good idea. You're going to want to block SOME things, specifically paths to your stylesheets and scripts.
     
    Dan Schulz, Jul 10, 2007 IP
  14. leede

    leede Guest

    Messages:
    3,381
    Likes Received:
    128
    Best Answers:
    0
    Trophy Points:
    0
    #14
    It is necessary for normal action and important too.
     
    leede, Jul 10, 2007 IP
  15. antman

    antman Well-Known Member

    Messages:
    1,907
    Likes Received:
    106
    Best Answers:
    0
    Trophy Points:
    130
    #15
    Yes, there are things that I don't want bots to scan over that could be vulnerable to hackers :O
     
    antman, Jul 10, 2007 IP