1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.
  2. Better Analytics for WordPress Get It Free

robots.txt Exclusion On Dynamic URLs

Discussion in 'Apache' started by digitalpoint, Mar 16, 2004.

  1. #1
    I recently had the need to exclude dynamic URLs with the robots.txt file (the keyword suggestion tool was spawning hundreds of pages when someone would link directly to a results page). So I added this:

    User-agent: *
    Disallow: /tools/suggestion/?

    The interesting thing though is only some spiders seem to be able to understand the exclusion. Googlebot is smart enough to do it properly for example. The new MSN Bot on the other hand is not.

    - Shawn
     
    digitalpoint, Mar 16, 2004 IP
  2. nlopes

    nlopes Guest

    Messages:
    103
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You don't need the '?'

    You need only this:
    User-agent: *
    Disallow: /tools/suggestion/

    I use also this trick in my site to disable lots of dynamic pages
     
    nlopes, Apr 3, 2004 IP
  3. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    36,815
    Likes Received:
    2,414
    Best Answers:
    418
    Trophy Points:
    710
    Digital Goods:
    29
    #3
    Except I *do* want /tools/suggestion/ to be indexed. But *not* any page that starts with /tools/suggestion/?

    - Shawn
     
    digitalpoint, Apr 3, 2004 IP
  4. nlopes

    nlopes Guest

    Messages:
    103
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    That is not in the standard.
    AFAIK the standart allows you only to disabble files or directories, althought google accepts wildcards (*.cgi for example).
     
    nlopes, Apr 3, 2004 IP
  5. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    36,815
    Likes Received:
    2,414
    Best Answers:
    418
    Trophy Points:
    710
    Digital Goods:
    29
    #5
    I know it's not part of the official robots standard, but Google does adhere to it properly.

    Google uses it in their own robots file:

    http://www.google.com/robots.txt

    - Shawn
     
    digitalpoint, Apr 3, 2004 IP
  6. sarahk

    sarahk iTamer Staff

    Messages:
    18,938
    Likes Received:
    2,266
    Best Answers:
    55
    Trophy Points:
    615
    #6
    Building on Shawn's question...

    I have a nuke site where the structure for the content is

    /modules.php?name=ContentType

    Using .htaccess and mod_rewrite all sorts of good stuff gets done to this to get it looking search engine friendly.

    But, if I want to exclude some types of content but not others can I use my new urls? I'm guessing that because the bots look at robots.txt before getting any content that they will obey the dummy name.

    Is this right?
     
    sarahk, Apr 27, 2004 IP
  7. Alahad

    Alahad Peon

    Messages:
    10
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    i need command to allow sitemap.xml for robots.txt
     
    Alahad, Jul 31, 2009 IP