1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

How to restrict url from robots.txt file

Discussion in 'robots.txt' started by OSSEO, Sep 7, 2011.

  1. #1
    I want to know how to restrict link from robots txt file, I want to restrict below url . We did not made sub domain but sina.com.cn using our URL, I have checked from domain & hosting panel there is no any file and sub domain, How to restrict and fix it.

    Moreover anybody know why sina.com.cn using other website url or how we can protect our website by using robots.txt file or other way.




    https://tool.mykidslunchbox.com.au/forgot-password.aspx
    http://www.sina.com.cn.mykidslunchbox.com.au/forgot-password.aspx
    https://www.sina.com.cn.mykidslunchbox.com.au/how-it-works.aspx
    https://www.sina.com.cn.mykidslunchbox.com.au/contactus.aspx
    http://tool.mykidslunchbox.com.au/contactus.aspx
    http://tool.mykidslunchbox.com.au/benefits.aspx
    http://tool.mykidslunchbox.com.au/
     
    OSSEO, Sep 7, 2011 IP
  2. Icecube_media

    Icecube_media Peon

    Messages:
    656
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Hi,

    Disallow: /index.php or sub page. I hope it helps
     
    Icecube_media, Sep 7, 2011 IP
  3. OSSEO

    OSSEO Active Member

    Messages:
    1,430
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    53
    #3
    suppose i want to stop below url for indexing. is it ok

    Disallow: /tool.mykidslunchbox.com.au/forgot-password.aspx
    Disallow: /sina.com.cn.mykidslunchbox.com.au/forgot-password.aspx

    m i right ?
     
    OSSEO, Sep 7, 2011 IP
  4. pickledegg

    pickledegg Greenhorn

    Messages:
    34
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    16
    #4
    pickledegg, Sep 8, 2011 IP
  5. OSSEO

    OSSEO Active Member

    Messages:
    1,430
    Likes Received:
    6
    Best Answers:
    0
    Trophy Points:
    53
    #5
    This is really wonderful article and very helpful for me, but in this article author mention that you cant use "ALLOW" word but when i check Google.com/robots.txt file ,

    They are using why ? check this article line.

    " Don't use an "Allow" command in your robots.txt file. Only mention files and directories that you don't want to be indexed. All other files will be indexed automatically if they are linked on your site."
     
    OSSEO, Sep 8, 2011 IP
  6. Sonam Singh

    Sonam Singh Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    please use Robot file with your site so that no search Engine crawling done for the next few of the week and then you need to verify so the error will not be so longer there.
     
    Sonam Singh, Sep 8, 2011 IP
  7. christopherscott

    christopherscott Peon

    Messages:
    32
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Use robot.txt file and disallow that files you don't want to being indexed.
     
    christopherscott, Sep 15, 2011 IP
  8. jabz.biz

    jabz.biz Active Member

    Messages:
    384
    Likes Received:
    6
    Best Answers:
    1
    Trophy Points:
    70
    #8
    "allow" is a nonstandard extension of the protocol. Please use robots.txt only to disallow crawler access.

    User-agent: *
    Disallow: /

    equals

    User-agent: *
    allow: /

    whilst "allow" is not part of the robots exclusion standard (robots.txt)

    I have collected a full set of example implementations here: http://rield.com/cheat-sheets/robots-exclusion-standard-protocol
     
    jabz.biz, Sep 26, 2011 IP
  9. manig4

    manig4 Member

    Messages:
    107
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    26
    #9
    Is it possible in wordpress?
     
    manig4, Sep 27, 2011 IP
  10. amherstsowell

    amherstsowell Peon

    Messages:
    261
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Disallow internal page.
     
    amherstsowell, Sep 29, 2011 IP
  11. seoguys04

    seoguys04 Greenhorn

    Messages:
    49
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    18
    #11
    jabz.biz explained ti perfectly. But you do not need to put an Allow directive in the robots.txt file. Its not a part of exclusion standard. Robots.txt has never helped webmasters in achieving good ranks. This file is used to restrict robots from crawling a part of whole of the website. Remember there are bad robots as well, which do not always follow the directives of robots.txt. In this case using robots,txt does NOT mean a security system too. It is always better to password protect the folders and directories you do not want to be crawled. Anyways this is not what you have inquired of. Please keep in mind that robots..txt has got nothing to do with in ranking on SERP.
    Cheers!!!
     
    seoguys04, Oct 7, 2011 IP
  12. MyWebsiteNow

    MyWebsiteNow Peon

    Messages:
    71
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #12
    In order to use a robots.txt file, you'll need to have access to the root of your domain (if you're not sure, check with your web hoster). If you don't have access to the root of a domain, you can restrict access using the robots meta tag.
     
    MyWebsiteNow, Oct 17, 2011 IP