Robots.txt disallow: /.../subpage ?

Discussion in 'Search Engine Optimization' started by vzup, Mar 28, 2009.

  1. #1
    Hi,

    Disallowing main urls is easy:

    Disallow: /myurl/

    But I need to disallow subpages which look like this:

    /myurl/page=1
    /myurl/page=2
    /myurl2/page=1
    /myurl2/page=2
    /something/page=1
    ... etc

    I just want to disallow all pages that have charackters like these in their urls:

    /page=

    But typing just this doesn't do the job.

    Is there some way like this:

    Disallow: /.../page=

    ???
     
    vzup, Mar 28, 2009 IP
  2. RockyMtnHi

    RockyMtnHi Active Member

    Messages:
    211
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    53
    #2
    This will do it:
    # Disallow: /*?

    You might also be interested in disallowing certain extensions like this:
    Disallow: /*.pdf$

    See a thread on it at the Search EngineWatch forum:
    http://forums.searchenginewatch.com/showthread.php?t=13457

    And note that only Google pays attentions to some of these commands.
     
    RockyMtnHi, Mar 28, 2009 IP
  3. jitendraag

    jitendraag Notable Member

    Messages:
    3,982
    Likes Received:
    324
    Best Answers:
    1
    Trophy Points:
    270
    #3
    @RockyMtnhi: Regular expressions are not part of robots.txt standard but it's possible that google supports them.

    @OP: According to the standard there is no way to disallow indexing of such pages.
     
    jitendraag, Mar 28, 2009 IP