How To Stop A Page Being Crawled

Discussion in 'Search Engine Optimization' started by pnisbet, Dec 21, 2006.

  1. #1
    Hi!

    I am a bit new to SEO. What's the best way to stop a page being crawled? I have a page containing ads and affiliate links that I would rather not be crawled until my site has been listed.

    I know of two possible ways: adding rel="nofollow" to the <a href=> tag in my navigation and using the robots.txt file. Which is the better and are there alternatives.

    Also, should I do this, or should I delete the page altogether until I am listed?

    Cheers,

    Peter
     
    pnisbet, Dec 21, 2006 IP
  2. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You should use nofollow when linking to pages you don't want crawling and you should block the page in robots.txt.
     
    mad4, Dec 21, 2006 IP
    seojig likes this.
  3. shopping

    shopping Peon

    Messages:
    260
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Delete your sitemap. If you use site map and put it on google or yahoo
     
    shopping, Dec 21, 2006 IP
  4. zokiii

    zokiii Peon

    Messages:
    656
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Just use robots.txt
     
    zokiii, Dec 21, 2006 IP
  5. just-4-teens

    just-4-teens Peon

    Messages:
    3,967
    Likes Received:
    168
    Best Answers:
    0
    Trophy Points:
    0
    #5
    instead of using nofollow on links (yahoo & msn may still follow these links) you can add this meta robots tag to the pages you dont want indexed.

    <meta name="robots" content="noindex,nofollow">
     
    just-4-teens, Dec 21, 2006 IP
  6. BlogLover

    BlogLover Peon

    Messages:
    133
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #6
    There is absolutely no need for that! Since you know how to
    use robots.txt that's all you need. This is the first file that
    the search engines look for before they go anywhere else on
    your site. Just block the individual page(s).
     
    BlogLover, Dec 21, 2006 IP
  7. Litho

    Litho Peon

    Messages:
    105
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #7
    That last question hints towards hiding the file from everything. If you have affilate links and ads, then the sites providing those links and ads will know of your page, regardless of the robots.txt file.

    Rather than delete them, place them in a password protected directory until you are ready. Nothing can get in except you.
     
    Litho, Dec 21, 2006 IP
  8. jaguar-archie2006

    jaguar-archie2006 Banned

    Messages:
    631
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Where did you get this?


    Google can still crawl a site even without a sitemap, sitemap is only G toys.
     
    jaguar-archie2006, Dec 21, 2006 IP
  9. jaguar-archie2006

    jaguar-archie2006 Banned

    Messages:
    631
    Likes Received:
    16
    Best Answers:
    0
    Trophy Points:
    0
    #9
    the best thing to do is block it by robots.txt as they say..
     
    jaguar-archie2006, Dec 21, 2006 IP
  10. princess06

    princess06 Banned

    Messages:
    241
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #10
    robot.txt is the solution
     
    princess06, Dec 21, 2006 IP
  11. salmonbones

    salmonbones Well-Known Member

    Messages:
    331
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    130
    #11
    Would this be do-able with .htaccess too? Or would that make it server wide and not just for an individual page?
     
    salmonbones, Dec 21, 2006 IP
  12. ico

    ico Peon

    Messages:
    126
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #12
    its easier this way, block the robots.txt and use nofollow
     
    ico, Dec 21, 2006 IP
  13. pnisbet

    pnisbet Active Member

    Messages:
    14
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    93
    #13
    Thanks everyone.

    I know now that robots.txt is the best. I knew about it, but wasn't sure how to apply it. just-4-teens showed me how.

    Cheers, and thanks.

    Pete
     
    pnisbet, Dec 21, 2006 IP
  14. just-4-teens

    just-4-teens Peon

    Messages:
    3,967
    Likes Received:
    168
    Best Answers:
    0
    Trophy Points:
    0
    #14
    the code i posted goes within the <head></head> section of the page and not within the robots.txt file.
     
    just-4-teens, Dec 22, 2006 IP