Any way to exclude include files from spiders?

Discussion in 'robots.txt' started by TheConnollyKid, Jan 27, 2006.

  1. #1
    I am working on a site and have various navigation sets being pulled in dynamically via a PHP include command. I tried using a robots.txt file to exclude the includes folder that contains all of the snippets that get pulled in, but it still looks like the navigation is getting indexed by the spider. Any ideas?
     
    TheConnollyKid, Jan 27, 2006 IP
  2. mcfox

    mcfox Wind Maker

    Messages:
    7,526
    Likes Received:
    716
    Best Answers:
    0
    Trophy Points:
    360
    #2
    I don't see how you can exclude the navigation from the spider since it gets served when the spider 'views' the page. Why do you want to exclude the navigation anyway?
     
    mcfox, Jan 27, 2006 IP
  3. TheConnollyKid

    TheConnollyKid Peon

    Messages:
    3
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    The reason i'd like to exclude the navigation is say that someone is searching for a term that refers to a specific product or service, and the main info for that comes up on a single page. But if that term is contained as a link in the navigation of 50 other pages which have nothing contextually to do with the product or service being sought, then the engine returns 51 results, with the latter 50 showing the navigation in the result where the term is used.
     
    TheConnollyKid, Jan 27, 2006 IP
  4. wrmineo

    wrmineo Peon

    Messages:
    3,087
    Likes Received:
    379
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Another thing to consider, is that unfortunately, not all spiders will abide by the robots.txt file :(
     
    wrmineo, Jan 27, 2006 IP
  5. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,334
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #5
    There is no Allow: directive for robots.txt, only Disallow:, so no... you can't.
     
    digitalpoint, Jan 27, 2006 IP
  6. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #6
    Unless it's Googlebot - although I've never tried this myself, the Google robots.txt information page, seems to suggest that Googlebot does recognize Allow:...

    ...unless I'm reading this wrong.
     
    minstrel, Jan 29, 2006 IP