What is Robots.txt?

Discussion in 'Search Engine Optimization' started by CBuilder, Aug 25, 2007.

  1. #1
    Hello,

    I'd like to ask what is Robots.txt?

    Thanks
     
    CBuilder, Aug 25, 2007 IP
  2. kentuckyslone

    kentuckyslone Notable Member

    Messages:
    4,371
    Likes Received:
    367
    Best Answers:
    0
    Trophy Points:
    205
    #2
    The robots.txt is a text file that resides in your root directory. Its primary purpose is to limit search engine spiders as to what files and directories they can crawl.

    For example if you have an admin subdirectory that you do not want the spiders to crawl you can limit it out.

    You may also (attempt) to block a certain spider/bot from crawling any of your site. I say attempt because not all spiders will 'obey' the robots.txt file.

    The robots.txt is not required, but it is a very good idea to have it there, especially if you have subdirectories or files you do not want indexed.
     
    kentuckyslone, Aug 25, 2007 IP
  3. dcristo

    dcristo Illustrious Member

    Messages:
    19,796
    Likes Received:
    1,201
    Best Answers:
    7
    Trophy Points:
    470
    Articles:
    5
    #3
    It's used to block certain parts of your site being indexed in the search engines.
     
    dcristo, Aug 25, 2007 IP
  4. Dan Schulz

    Dan Schulz Peon

    Messages:
    6,032
    Likes Received:
    437
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I wouldn't use it to block the admin directory - I'd use a password protected directory structure followed by the standard HTML based password prompt instead. Then again, I'm not fond of using /admin/ for my admin area anyway.

    For more information visit www.robotstxt.org
     
    Dan Schulz, Aug 25, 2007 IP
  5. CBuilder

    CBuilder Well-Known Member

    Messages:
    257
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    138
    #5
    Thanks alot for your help.
     
    CBuilder, Aug 25, 2007 IP
  6. trichnosis

    trichnosis Prominent Member

    Messages:
    13,785
    Likes Received:
    333
    Best Answers:
    0
    Trophy Points:
    300
    #6
    trichnosis, Aug 26, 2007 IP
  7. Sutocu

    Sutocu Active Member

    Messages:
    938
    Likes Received:
    24
    Best Answers:
    0
    Trophy Points:
    60
    #7
    This is a very good explanation. I only want to add that another reason why you should have it is that you can direct search engine crawlers to your sitemap with it.
     
    Sutocu, Aug 26, 2007 IP
  8. vasildb

    vasildb Well-Known Member

    Messages:
    845
    Likes Received:
    31
    Best Answers:
    0
    Trophy Points:
    118
    #8
    So if you add the url of some of your pages it the robots.txt, it will be crawled?
    And if you don't have anything in that file, it means that your page will be not crawled?
     
    vasildb, Aug 26, 2007 IP
  9. Sutocu

    Sutocu Active Member

    Messages:
    938
    Likes Received:
    24
    Best Answers:
    0
    Trophy Points:
    60
    #9
    You don't link to your pages from robots.txt, but you can link to your sitemap. The format to use is

    Sitemap: http:// yourwebsite.com/sitemap.xml

    Note that full URL is required. For more information on sitemaps with robots.txt, see this post.
     
    Sutocu, Aug 26, 2007 IP
  10. vasildb

    vasildb Well-Known Member

    Messages:
    845
    Likes Received:
    31
    Best Answers:
    0
    Trophy Points:
    118
    #10
    Thanks for the info. I was a little bit confused with robots.txt.
     
    vasildb, Aug 26, 2007 IP