What is robot.txt and how its works?

Discussion in 'Search Engine Optimization' started by sachin.coolboy, Mar 8, 2009.

  1. #1
    What is robot.txt and how its works ?
     
    sachin.coolboy, Mar 8, 2009 IP
  2. suhaana@maxinspire.co.in

    suhaana@maxinspire.co.in Peon

    Messages:
    91
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I have summed up all the robot txt info
    read here http:// forums.digitalpoint (.com) (showthread.php?t=1259401)
     
  3. measure9inva

    measure9inva Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Thank you.....
     
    measure9inva, Mar 8, 2009 IP
  4. sachin.coolboy

    sachin.coolboy Peon

    Messages:
    75
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    thank you....
     
    sachin.coolboy, Mar 8, 2009 IP
  5. rena

    rena Peon

    Messages:
    1,987
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Its a method to tell Google for indexing site.. especially used for not indexing ( not crawl) some fold or some pages in the site.. If go to Google webmaster tool get clear picture with example
     
    rena, Mar 8, 2009 IP
  6. prashantban

    prashantban Well-Known Member

    Messages:
    1,202
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    100
    #6
    Thanx alot for this...
     
    prashantban, Mar 8, 2009 IP
  7. HanhVu

    HanhVu Banned

    Messages:
    123
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    0
    #7
    HanhVu, Mar 8, 2009 IP
  8. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #8
    google.com/support/webmasters/bin/answer.py?hl=en&answer=40360

    Basically, it's a file that lives in the root of your web that friendly crawlers use to determine which URLs on your site they should NOT index. By default, if it does not exist they assume any page they can find on your site is available for indexing.

    NOTE: Bad crawlers will frequently ignore your robots.txt and index whatever they can find.
     
    Canonical, Mar 8, 2009 IP
  9. vengatowen

    vengatowen Well-Known Member

    Messages:
    568
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    170
    #9
    A robots.txt file restricts access to your site by search engine robots that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages. (All respectable robots will respect the directives in a robots.txt file, although some may interpret them differently. However, a robots.txt is not enforceable, and some spammers and other troublemakers may ignore it. For this reason, we recommend password protecting confidential information.)

    You need a robots.txt file only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file (not even an empty one).

    While Google won't crawl or index the content of pages blocked by robots.txt, Google may still index the URLs if Google find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results.

    In order to use a robots.txt file, you'll need to have access to the root of your domain (if you're not sure, check with your web hoster). If you don't have access to the root of a domain, you can restrict access using the robots meta tag.

    I hope you can understand the robot.txt file and its use.
     
    vengatowen, Mar 8, 2009 IP
  10. mrandrei

    mrandrei Peon

    Messages:
    1,133
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Robots.txt file is a set of instructions for visiting robots or spiders that index the content of a site. :)
     
    mrandrei, Mar 8, 2009 IP
  11. resaik_king

    resaik_king Active Member

    Messages:
    1,049
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    80
  12. tung148

    tung148 Active Member

    Messages:
    32
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    61
    #12
    tung148, Mar 8, 2009 IP
  13. SabQat

    SabQat Peon

    Messages:
    675
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #13
    a simple .txt file made using note pad like editor giving direction to google & other search engines which page to crawl or which not.
     
    SabQat, Mar 9, 2009 IP
  14. gred

    gred Member

    Messages:
    30
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    43
    #14
    one more source
    http://en.wikipedia.org/wiki/Robots.txt
     
    gred, Mar 12, 2009 IP
  15. Lovely

    Lovely Well-Known Member

    Messages:
    2,997
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    155
    #15
    Lovely, Mar 20, 2009 IP
  16. Seo_genius

    Seo_genius Member

    Messages:
    240
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    35
    #16
    Robot. txt tells the search engine which part of your website not to index. It basically cut off pucblic access to places like your website databse, shopping carts and other private information you don't want available to the general public. I am sure you have got very good recommendations earlier, However, i hope this helps further.

    Br
    Seo_genius
     
    Seo_genius, Mar 20, 2009 IP
  17. mmerlinn

    mmerlinn Prominent Member

    Messages:
    3,197
    Likes Received:
    819
    Best Answers:
    7
    Trophy Points:
    320
    #17
    I don't see any value in a separate file for instructing robots. All you need to do is use the robots META tag on every page that you want to restrict.

    Since you usually want over 99% of your pages indexed, setting the robots META tag to nofollow or noindex on your restricted pages should not be cumbersome at all.

    I have a large site (over 4000 pages) and there are only about 200 of those pages needing to be restricted. Adding a simple META tag to those page cured all GOOGLE problems. However, neither a robots.txt nor a META tag will stop rogue bots.
     
    mmerlinn, Mar 20, 2009 IP