Is it me or is MSN ignoring robots.txt files?

Discussion in 'Bing' started by ResaleBroker, Jul 15, 2006.

  1. #1
    When I use the "site:" command to query MSN for pages in my domain I see pages that are excluded in my robots.txt file.

    Is it just me or is MSN ignoring robots.txt files?
     
    ResaleBroker, Jul 15, 2006 IP
  2. eXe

    eXe Notable Member

    Messages:
    4,643
    Likes Received:
    248
    Best Answers:
    0
    Trophy Points:
    285
    #2
    I found this link

    seroundtable.com/archives/004096.html
     
    eXe, Jul 15, 2006 IP
  3. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Disallow: in robots.txt means "do not visit this page". It does not mean "do not index it". Several pages are indexed by Google, Yahoo, MSN and others without having been visited by a robot. This is in compliance with the Robots Exclusion Protocol.

    Jean-Luc
     
    Jean-Luc, Jul 15, 2006 IP
  4. ResaleBroker

    ResaleBroker Active Member

    Messages:
    1,665
    Likes Received:
    50
    Best Answers:
    0
    Trophy Points:
    90
    #4
    I'm familiar with that aspect of the Robots Exclusion Protocol [The value of this field specifies a partial URL that is not to be visited...]however I wasn't aware that a URL would be indexed without being visited/retrieved. Hmmm...
     
    ResaleBroker, Jul 15, 2006 IP
  5. oninuva

    oninuva Peon

    Messages:
    834
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Where is the difference? If they can't visit it how do they index it
     
    oninuva, Jul 15, 2006 IP
  6. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #6
    A disallowed page can be indexed because of the informations collected by the robot in other allowed pages containing links pointing to the disallowed page. The address of a disallowed page can be present in the SERP's, but there will be no cached version of the page.

    For example, if page "/blue-horse.html" is disallowed, there might be links like this in other pages :
    <a href="/blue-horse.html">Blue Horse</a>
    Code (markup):
    That's enough for a search engine to index "/blue-horse.html" and to show it in some SERP's.

    Jean-Luc
     
    Jean-Luc, Jul 15, 2006 IP