Need advice on how to block certain page using robots.txt

Discussion in 'robots.txt' started by mariush, Aug 12, 2006.

  1. #1
    On my game cheats site, www.tgdb.net, I have a "print" function, for each game platform.

    For example, I have /pc/print.php , /psx/print.php

    I'd like to use robots.txt to prevent bots from accesing these pages.

    I think this would work but I'm afraid to add it before asking advice:

    Dissallow : /*/print.php

    I got this from http://www.imdb.com/robots.txt where they use it to block certain folders, not files.

    So my question is, how would I enter a line in robots.txt, if possible, to prevent bots from downloading print.php in any subfolder of the main page, even if links have attributes ( example /pc/print.php?cheat=1000)

    Thanks..
     
    mariush, Aug 12, 2006 IP
  2. explorer

    explorer Well-Known Member

    Messages:
    463
    Likes Received:
    40
    Best Answers:
    0
    Trophy Points:
    110
    #2
    I've just looked at your robots.txt and I see you've not done anything about this yet. I would list the pages individually:

    User-agent: *
    Disallow: /pc/print.php
    Disallow: /psx/print.php

    etc
     
    explorer, Sep 15, 2006 IP
  3. mariush

    mariush Peon

    Messages:
    562
    Likes Received:
    44
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Thanks. That would surely work.

    I've decided to do nothing about it at this moment because Google seems to like those pages and indexed all of them. I was initially afraid that it may think I have duplicate content or something like that.

    Bandwith is not a problem at this moment .. so I'll let it "flow" for now..

    Your answer is much appreciated
     
    mariush, Sep 16, 2006 IP
  4. silky

    silky Peon

    Messages:
    1
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    The following as far as I am aware

    Disallow: /pc/print.php

    Would not block the robot from accessing the page with query string parameters such as

    /pc/print.php?cheat=1000

    Let me know if anyone knows any different
     
    silky, Apr 17, 2007 IP
  5. kirby009

    kirby009 Peon

    Messages:
    608
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #5
    good info. i tried it and it work. don't know this one.
     
    kirby009, Jun 12, 2007 IP
  6. trichnosis

    trichnosis Prominent Member

    Messages:
    13,785
    Likes Received:
    333
    Best Answers:
    0
    Trophy Points:
    300
    #6
    i think

    Dissallow : /*print.php*

    will solve your poblem. google and yahoo support *
     
    trichnosis, Jun 14, 2007 IP
  7. learn2success

    learn2success Peon

    Messages:
    18
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    i have 7 error in my Google web master tool and i have requested google to remove those pages because they are not exist any more.
    but google is saying i want to block these pages from robots.txt file as well.
    if i exclude the following would work for me.

    Dissallow : /category/jobs/marketing-jobs/
    Dissallow : /category/jobs/human-recourse-jobs/
    Dissallow : /seo-ranking-factors/
    Dissallow : /category/jobs/sales-jobs/
    Dissallow : /category/jobs/admin-jobs/
    Dissallow : /category/jobs/finance-jobs/
    Dissallow : /category/motivation/

    Please advise.
     
    learn2success, May 17, 2012 IP
  8. learn2success

    learn2success Peon

    Messages:
    18
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    learn2success, Jun 6, 2012 IP