What is robots.txt?

Discussion in 'robots.txt' started by icare, Feb 12, 2006.

  1. #1
    How do I get or create on for my site?

    Please advise
     
    icare, Feb 12, 2006 IP
  2. Smyrl

    Smyrl Tomato Republic Staff

    Messages:
    13,740
    Likes Received:
    1,702
    Best Answers:
    78
    Trophy Points:
    510
    #2
    Do a Google search for robots.txt tutorial. Your robots.txt file can be created with any text editor. This file spells out files that may or may not be indexed. There are many non-obedient robots out there but Google, Yahoo, and MSN all obey you robots.txt command.

    These two lines allow all robots to index every page
    User-agent: *
    Disallow:

    These two lines keep all robots out.
    User-agent: *
    Disallow: /
     
    Smyrl, Feb 12, 2006 IP
  3. icare

    icare Peon

    Messages:
    714
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #3

    Even I fI google it it will show DP page on very top then Y not ask here, i had tried serching this on DP but couldnt find any answere which ican explain...:D
     
    icare, Feb 12, 2006 IP
  4. Smyrl

    Smyrl Tomato Republic Staff

    Messages:
    13,740
    Likes Received:
    1,702
    Best Answers:
    78
    Trophy Points:
    510
  5. Cristian Mezei

    Cristian Mezei Notable Member

    Messages:
    3,332
    Likes Received:
    355
    Best Answers:
    0
    Trophy Points:
    213
    #5
    I have this one in my bookmarks, together with this one.

    It might do you good, to read them :)
     
    Cristian Mezei, Feb 12, 2006 IP
  6. dashboard

    dashboard Peon

    Messages:
    13
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #6
    you can also use <meta name=robots.txt content=index,nofollow>
     
    dashboard, Feb 12, 2006 IP
  7. seoaddict

    seoaddict Peon

    Messages:
    216
    Likes Received:
    21
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Create .txt file. And save as robots.txt
    Here you can allow and disallow crawlers.
     
    seoaddict, Feb 13, 2006 IP
  8. mariush

    mariush Peon

    Messages:
    562
    Likes Received:
    44
    Best Answers:
    0
    Trophy Points:
    0
    #8
    I've added a robots.txt file just to keep out the 404 not found errors. It annoyed me because I was seeing them in awstats.

    My robots.txt is actualy:

    
    User-agent: *
    Disallow: /cgi-bin/
    
    Code (markup):
     
    mariush, Feb 13, 2006 IP
  9. JEET

    JEET Notable Member

    Messages:
    3,832
    Likes Received:
    502
    Best Answers:
    19
    Trophy Points:
    265
    #9
    That's not a ROBOTS.TXT . It's meta tags .
    And neither is it right .You cannot specify a file name in meta tags .
    <meta http-equiv="robots" content="index,follow" />
    is the right tag for the content and links on that particular page .

    Robots.txt is a simple text file which "GOOD" Crawler bots read to see which folders or files are allowed to index and which are not .
    It is placed in the main host folder inside "Public_html"

    User agent *
    Disallow /images

    will keep out search engines from your images folder .
    If you want everything to be available for indexing then create an empty "robots.txt" and put it in "public_html" folder .
    A blank notepad file named "robots.txt" ...

    If you don't have a "public_html" folder , then probably your host already has a robots.txt and you need not do anything . Your site is a folder inside "his public_html" which already has a robots.txt .

    But if you are getting a 404 not found error for robots.txt , then ask your host if he has that file . If no , then ask him to put one .

    This is what I have noticed from my logs .
    Hope that's right .

    Regards
    Jeet
     
    JEET, Feb 13, 2006 IP
  10. lionstarr

    lionstarr Peon

    Messages:
    276
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #10
    I know it as
    <meta name="robots" content="index, follow">
    You can say index - noindex in the first place: Allow search engines to index your site or don't.
    Then you can say follow or nofollow, to disallow Search Engines giving away your PageRank :)
    greetings,
    lionstarr
     
    lionstarr, Feb 21, 2006 IP
  11. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #11
    lionstarr, the meta tag you mention is not as good a solution as robots.txt for most websites:

    1. it has to be used on a page by page basis, i.e., for spiders that read and honor that meta tag, it only applies to the page that contains it

    2. it does not have the capability for excluding specific spiders or entire directories

    The only time one normally would use the meta tag is if you are on free hosting that won't allow you to place a robots.txt file in the root directory.
     
    minstrel, Feb 22, 2006 IP
  12. lionstarr

    lionstarr Peon

    Messages:
    276
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Of course it's not as good as a robots.txt!
    I only saw JEET Posting about <meta http_equiv and thought I tell you that I know it as <meta name="robots"> - maybe I'm wrong and I learn something or he's wrong and learns something!
     
    lionstarr, Feb 23, 2006 IP