Robots.txt disallow

Discussion in 'Search Engine Optimization' started by beeweb, Dec 29, 2006.

  1. #1
    Hi, I'm trying to block some pages for spidering, but I'm afraid that I'll block too much.

    I do want to block this part of my site:
    /index.php?topic=blabla

    But I do not want to block this:
    /index.php?topic=blabla&page=2

    So, should this do the trick, or do I block too much this way:

    User-agent: *
    /index.php?topic=blabla
     
    beeweb, Dec 29, 2006 IP
  2. Raizous

    Raizous Peon

    Messages:
    39
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    just put the Allow: /index.php?topic=.... and so on dude...
     
    Raizous, Dec 29, 2006 IP
  3. TheMadHat

    TheMadHat Peon

    Messages:
    88
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #3
    User-agent: *
    Disallow: /index.php?topic=*

    User-agent: *
    Allow: /index.php?topic=*&page=*
     
    TheMadHat, Dec 29, 2006 IP
  4. beeweb

    beeweb Peon

    Messages:
    54
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Eh, I believe there is no “Allow” command in the standard robots.txt protocol. So if "Allow" doesn't work, I'll block too much (just a few hundred pages too much...)

    Guess I'll have to find another way of solving this problem.
     
    beeweb, Dec 29, 2006 IP
  5. TheMadHat

    TheMadHat Peon

    Messages:
    88
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #5
    TheMadHat, Dec 29, 2006 IP
  6. Monty

    Monty Peon

    Messages:
    1,363
    Likes Received:
    132
    Best Answers:
    0
    Trophy Points:
    0
    #6
    No, "Allow" isn't a standard robots.txt command.
    Some bots recognize and respect it, but it's not a standard, only "disallow" is.
     
    Monty, Dec 29, 2006 IP
  7. beeweb

    beeweb Peon

    Messages:
    54
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Hey, that's great. Indeed lot's of SEO blogs etc. say that "Allow" isn't allowed, but since Google uses it and Google is 95% of my traffic I'll start adapting my robots.txt file right away.

    Happy New Year from the Netherlands!
     
    beeweb, Dec 29, 2006 IP
  8. TheMadHat

    TheMadHat Peon

    Messages:
    88
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #8
    It's supported by Yahoo as well, not sure about MSN
     
    TheMadHat, Dec 29, 2006 IP