What is the robots.txt file?

Discussion in 'Search Engine Optimization' started by julietegecy, Mar 24, 2010.

  1. #1
    Please can anyone explain me what is robots.txt file and its correct format?
     
    julietegecy, Mar 24, 2010 IP
  2. PhilipSEO

    PhilipSEO Notable Member

    Messages:
    467
    Likes Received:
    48
    Best Answers:
    4
    Trophy Points:
    225
    #2
    Robots.txt controls how Web bots/crawlers/spiders access and index your website. It uses what we in the trade call the Robots Exclusion Protocol. In short, before visiting one of your site's pages the bot looks it up in your robots.txt. If it finds something like

    User-agent: *
    Disallow: /

    -- this means that robots are not allowed to crawl your pages. Of course, this does not always work. For example, viruses and other malware ignore your robots.txt file. But it works for legitimate Web bots such as Googlebot and other search crawlers.

    The instructions in the file will depend on what you are trying to accomplish.
     
    PhilipSEO, Mar 24, 2010 IP
  3. addie1

    addie1 Guest

    Messages:
    46
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    "Robots.txt" is a regular text file that through its name, has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth
    . "Robots.txt" lets you tell Google just that.
     
    addie1, Mar 24, 2010 IP
  4. seo555

    seo555 Peon

    Messages:
    1,035
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #4
    use this:

    User-agent: *
    Allow:
     
    seo555, Mar 24, 2010 IP
  5. james.parker

    james.parker Peon

    Messages:
    631
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Basically Robot.txt file is used to protect out any web page not to be indexed out by the crawlers or bots of a search engines.
     
    james.parker, Mar 24, 2010 IP
  6. sopheap

    sopheap Peon

    Messages:
    18
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #6
    To be short, robot text is used to exclude any page that we dont want search engine to index.
     
    sopheap, Mar 24, 2010 IP
  7. bogs

    bogs Active Member

    Messages:
    2,142
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    80
    #7
    bogs, Mar 24, 2010 IP
  8. datasol

    datasol Active Member

    Messages:
    122
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    53
    #8
    Robots.txt file is allowed to robots to crawling your site pages.
     
    datasol, Mar 24, 2010 IP
  9. smsinhindi

    smsinhindi Peon

    Messages:
    561
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #9

    type http://www.yoursite.com/robots.txt and you will get your robots.txt file detail..
     
    smsinhindi, Mar 25, 2010 IP
  10. Jeff Collision

    Jeff Collision Peon

    Messages:
    1,020
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Sometimes, a content from your website can be copied to any blog submission pages. you can able to know that by checking. So you can disallow the duplicate copy of your content using robots.txt Then, there is no need to visit your cached pages to be visited by search engine bots. You can also disallow those pages using robots.txt. Make changes for all web spiders
    User-agent: *
    Disallow: /
     
    Jeff Collision, Mar 25, 2010 IP
  11. PhilipSEO

    PhilipSEO Notable Member

    Messages:
    467
    Likes Received:
    48
    Best Answers:
    4
    Trophy Points:
    225
    #11
    Gibberish of the week, makes no sense at all.
     
    PhilipSEO, Mar 25, 2010 IP
  12. meenka

    meenka Peon

    Messages:
    158
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #12
    It is a text file where you tell the crawler which pages they can crawl and which they can't crawl
     
    meenka, Mar 25, 2010 IP
  13. rashida

    rashida Active Member

    Messages:
    1,429
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    80
    #13
    You can create the robots txt file if you want any of your site web pages not to be indexed by search engines.
     
    rashida, Mar 25, 2010 IP
  14. ap09.com

    ap09.com Guest

    Messages:
    199
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #14
    ap09.com, Mar 26, 2010 IP
  15. freshware

    freshware Peon

    Messages:
    427
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #15
    Hello Friend,

    Yes , I am agree with your view .
     
    freshware, Mar 26, 2010 IP
  16. psharma

    psharma Prominent Member

    Messages:
    1,955
    Likes Received:
    85
    Best Answers:
    4
    Trophy Points:
    345
    #16
    A robot.txt file is a file which gives instructions to the server about how to handle requests from robots ( means bots or crawlers ). You can set it to allow rebots or deny them or partially allow some of them. You can also add instructions directly to robots, if they understand it they will follow it.

    There is some format to write robot.txt files and this file exists at this location www.websitename.com/robot.txt only.
    If you want to create one for your website simply upload a file by this name at this location. For contents, you may refers to some online robot.txt generator tools.
    CMS based websites ( all blog websites, all forum websites etc including wordpress, blogger, joomla ) have automatically a virtual robot.txt file so you need not to create it separately.
     
    psharma, Mar 26, 2010 IP
  17. tessflores

    tessflores Peon

    Messages:
    5
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #17
    Web site owners use the /robots.txt file to give instructions about their site to web robots.
    There are two important considerations when using /robots.txt:
    1. robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
    2. the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.
    Read more on redalkemi dot com
     
    tessflores, Apr 5, 2010 IP
  18. neiljhonson

    neiljhonson Peon

    Messages:
    315
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #18
    you need to know first what are robots.
    Robots are the software which work on AI(artificial Intellegence) they check the all web pages and cotents and index the most relevant information with respect to the keyowrds. Robots jump 1 page to another page by anchor tag and follows the path to collect the information.
    If you would not like that robots will follow any page or folder then you need to use this robots.txt file which instruct the crawler to follow or not the page.

    this file will be robots.txt notepad file.
    syntax will be as follows:

    User-agent: *
    Allow: /
    Disallow: /Scripts/
    Disallow: /HotelDetails/
    Disallow: /flash/
    Disallow: /FlashFiles/

    for more clarification at robots.txt you need to go for google robots instruction.
     
    neiljhonson, Apr 5, 2010 IP
  19. upshurcreative

    upshurcreative Guest

    Messages:
    418
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #19
    You can create a robots.txt file to prevent search engine spisers from consuming excessive amounts of bandwidth on your server and also to prevent potential copyright infringements. A roborts.txt files provides the search engine spiders with information about which pages should be crawled and indexed and which should not. It is a text file that resides in the root directory of your Web server. If you do not provide a robot.txt file, search engines spiders assume that the entire site should be crawled and indexed.
     
    upshurcreative, Apr 5, 2010 IP