Allow indexed only main page

Discussion in 'Search Engine Optimization' started by tnd, Oct 18, 2007.

  1. #1
    I would like to know if robots.txt can control robot to indexed only main page but not sub-pages or other page on site?
     
    tnd, Oct 18, 2007 IP
  2. monfis

    monfis Well-Known Member

    Messages:
    1,476
    Likes Received:
    31
    Best Answers:
    0
    Trophy Points:
    160
    #2
    Yes you can, here are some examples:

    What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Here follow some examples:

    To exclude all robots from the entire server
    User-agent: *
    Disallow: /

    To allow all robots complete access
    User-agent: *
    Disallow:

    Or create an empty "/robots.txt" file.

    To exclude all robots from part of the server
    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /tmp/
    Disallow: /private/

    To exclude a single robot
    User-agent: BadBot
    Disallow: /

    To allow a single robot
    User-agent: WebCrawler
    Disallow:

    User-agent: *
    Disallow: /

    To exclude all files except one
    This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "docs", and leave the one file in the level above this directory:
    User-agent: *
    Disallow: /~joe/docs/

    Alternatively you can explicitly disallow all disallowed pages:
    User-agent: *
    Disallow: /~joe/private.html
    Disallow: /~joe/foo.html
    Disallow: /~joe/bar.html
     
    monfis, Oct 18, 2007 IP