Robots.txt

Discussion in 'robots.txt' started by solarpanelsdirect, Nov 13, 2009.

  1. #1
    Can you use robots.txt to disallow crawling of outbound links?

    I know Disallow / would be acceptable
    but could i do something like:


    User-agent: *
    Disallow: www.somewebsite.com

    to prevent spidering of outbound links?
    or can this be done in mod_rewrite?
     
    solarpanelsdirect, Nov 13, 2009 IP
  2. MaverickMoney

    MaverickMoney Active Member

    Messages:
    519
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    85
    #2
    I'm not sure if it's possible using robots.txt, but this site might help answer your question.

    http://www.robotstxt.org/
     
    MaverickMoney, Nov 14, 2009 IP
  3. manish.chauhan

    manish.chauhan Well-Known Member

    Messages:
    1,682
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    110
    #3
    Hi solarpanelsdirect, You can not instruct crawlers in robots.txt nor in htaccess not to follow outbound links. However, you can do it using rel="nofollow" tag with the outbound links.
     
    manish.chauhan, Nov 17, 2009 IP
  4. varul

    varul Member

    Messages:
    34
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    41
    #4
    Yes, we can also control the robots by HTML nofollow attribute. However the technique will not work out for bandwidth loss. And the genuine robots only follow this kind of protocol standards, others may not.
    Preventing unwanted robots in any manner will help you to save your website's bandwidth.
     
    varul, Nov 30, 2009 IP
  5. Spawned

    Spawned Member

    Messages:
    207
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    26
    #5
    do you even have to have a robots.txt? If you do not what happens?
     
    Spawned, Dec 2, 2009 IP