Do I Need A Robots.txt?

Discussion in 'robots.txt' started by rayqsl, Oct 13, 2007.

  1. #1
    If I want the whole site to be visible to the bots and spiders, do I need the robots.txt? I think the answer is "no".

    Thanks in advance.
     
    rayqsl, Oct 13, 2007 IP
  2. chickens

    chickens Peon

    Messages:
    242
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You don't need a robots.txt file if you want a spider to access everything. Personally I create an empty file just to get rid of the 404 errors.

    Odds are the SEO people are going to say different though. On some of my sites I use the wikipedia robots.txt and it seems to work for me.
     
    chickens, Oct 13, 2007 IP
  3. kentuckyslone

    kentuckyslone Notable Member

    Messages:
    4,371
    Likes Received:
    367
    Best Answers:
    0
    Trophy Points:
    205
    #3
    Unless you have need of blocking spider access to certain files or folders a robots.txt is not needed. However there are other ways to block access when you need to.

    Keep in mind that if you do use a robots.txt to disallow crawling not all spiders will 'obey'
     
    kentuckyslone, Oct 13, 2007 IP
  4. rayqsl

    rayqsl Active Member

    Messages:
    91
    Likes Received:
    0
    Best Answers:
    1
    Trophy Points:
    53
    #4
    Thanks for the advice. I've heard that blocking certain folders attracts some bots to the folders. Is this true? I certainly wouldn't think that the big, reputable organisations would do this tho
     
    rayqsl, Oct 13, 2007 IP
  5. kentuckyslone

    kentuckyslone Notable Member

    Messages:
    4,371
    Likes Received:
    367
    Best Answers:
    0
    Trophy Points:
    205
    #5
    It is possible - and anything that is possible is likely

    The main purpose for blocking files or folders with robots.txt is just to keep Google and other SEs from indexing the pages. You wouldnt want admin folders, for example, to be indexed.
     
    kentuckyslone, Oct 13, 2007 IP
  6. rayqsl

    rayqsl Active Member

    Messages:
    91
    Likes Received:
    0
    Best Answers:
    1
    Trophy Points:
    53
    #6
    That's what I thought. Thanks for your help k
     
    rayqsl, Oct 13, 2007 IP
  7. countZZero

    countZZero Peon

    Messages:
    23
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Uhhhh... yes. There is (almost always) something in your /public_html/ or /www/ folder in need of protecting from public (bot - spider) view.

    Karl

    http://fastercats.com
    http://market-match.info
     
    countZZero, Oct 25, 2007 IP
  8. rayqsl

    rayqsl Active Member

    Messages:
    91
    Likes Received:
    0
    Best Answers:
    1
    Trophy Points:
    53
    #8
    How to robots actually work the way through your site?

    My limited knowledge says that they start from a home paqe and then start following all of the links until everything has been reached.

    Now I know that's simplistic because how does a bot know what is your home page because when I submit the URL to engines, they usually only want the root folder.

    So do they look at every file in the root folder and follow all of the links? If they do then I can see how they can find files that you don't want them to index (and publish).

    Is there a good (simple) article somewhere that gives this kind of info? The ones I've found just launch straight into telling you how to create a robt.txt file and are pretty vague on how the bots work.

    I suppose that not all bots work in the same way as well. So I might be OK as far as the Google bot goes but the Yahoo one might really do something that I don't want.

    I'm obviously concerned that the bots keep out of my content management area.

    :)
     
    rayqsl, Oct 26, 2007 IP
  9. inworx

    inworx Peon

    Messages:
    4,860
    Likes Received:
    201
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Use a blank html file if you don't want to iondex that particular folder.

    robots.txt is bypassed by few SEs.
     
    inworx, Oct 27, 2007 IP
  10. webrepair

    webrepair Peon

    Messages:
    41
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #10
    just create a robot.txt file that is empty. I would advise putting one up, even if it is empty. It will save the 404 errors and creating an empty robots.txt is advised anyway by the w3c.
     
    webrepair, Oct 29, 2007 IP
  11. rayqsl

    rayqsl Active Member

    Messages:
    91
    Likes Received:
    0
    Best Answers:
    1
    Trophy Points:
    53
    #11
    Thanks again for your words of wisdom
     
    rayqsl, Nov 2, 2007 IP
  12. Kuldeep1952

    Kuldeep1952 Active Member

    Messages:
    290
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    60
    #12
    If you are a webmaster who watches his server error logs,
    then it is a good idea to have the files robots.txt and
    favicon.ico on your server, otherwise the error log
    will be filled with 404 errors for these two files, and the
    actual errors will be drowned out.
     
    Kuldeep1952, Nov 2, 2007 IP