robots.txt Analysis

Discussion in 'Search Engine Optimization' started by jones1982, Jun 4, 2012.

  1. #1
    One of a website showing this type of robots.txt. so please describe what is going on? should he delete anythings like Sitemap line?

    User-agent: *
    Allow: /
    Disallow:

    Sitemap: http://www.example.com/sitemap.xml
     
    jones1982, Jun 4, 2012 IP
  2. Anil Strivastava

    Anil Strivastava Peon

    Messages:
    102
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    you are allowing all robots to your pages ,

    if u want to hide some pages or php pages.

    Disallow :
    /[page].html
    /[file].pphp
    /[folder]
     
    Anil Strivastava, Jun 4, 2012 IP
  3. valen123

    valen123 Greenhorn

    Messages:
    334
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    23
    #3
    Robots.txt files (often erroneously called robot.txt, in singular) are created by webmasters to mark (disallow) files and directories of a web site that search engine spiders (and other types of robots) should not access.

    This robots.txt checker is a "validator" that analyzes the syntax of a robots.txt file to see if its format is valid as established by Robot Exclusion Standard (please read the documentation and the tutorial to learn the basics) or if it contains errors.
     
    valen123, Jun 4, 2012 IP
  4. seo-hosting.com

    seo-hosting.com Peon

    Messages:
    55
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    The ROBOTS referred to in the filename are web crawlers/spiders/bots. The robots.txt file is primarily used to ENCOURAGE two things; WHICH robots have access to WHICH folders and files on your website. Sometimes you may not wish to get all your website folders indexed by a search engine and so this file allows you to disallow access to those specific files and directories. Also sometimes you may not wish your website to get crawled by specific bots or any unknown or undesirable bots; you can control this too. It is important to understand that the robots.txt file can not CONTROL which bots scan your files as bots can simply choose to ignore your robots.txt file. Also the robots.txt file is freely visible to anyone wishing to read it and so may determine your site structure from it and use that for nefarious activities.
     
    seo-hosting.com, Jun 4, 2012 IP
  5. liamreed89

    liamreed89 Peon

    Messages:
    48
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    It is great when search engines frequently visit your site and index your content but often there are cases when indexing parts of your online content is not what you want. For instance, if you have two versions of a page (one for viewing in the browser and one for printing), you'd rather have the printing version excluded from crawling, otherwise you risk being imposed a duplicate content penalty. Also, if you happen to have sensitive data on your site that you do not want the world to see, you will also prefer that search engines do not index these pages (although in this case the only sure way for not indexing sensitive data is to keep it offline on a separate machine). Additionally, if you want to save some bandwidth by excluding images, stylesheets and javascript from indexing, you also need a way to tell spiders to keep away from these items.
     
    liamreed89, Jun 5, 2012 IP
  6. jewelraz

    jewelraz Active Member

    Messages:
    285
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    90
    #6
    First of all, I would like to thank you for creating this thread.

    User-agent: * (All Robots)
    Allow: / (Allowed everything)
    Disallow: (Nothing is disallowed)

    In above three lines it's telling all robots are allowed to the site. Their is nothing hidden or restricted for the robots. Robots can visit cgi-bin, trash files, pages, log files everything.
     
    jewelraz, Jun 5, 2012 IP
  7. jones1982

    jones1982 Active Member

    Messages:
    492
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    55
    #7
    jones1982, Jun 13, 2012 IP