Is it me or is MSN ignoring robots.txt files?

ResaleBroker Active Member

Messages:: 1,665

Likes Received:: 50

Best Answers:: 0

Trophy Points:: 90

#1

When I use the "site:" command to query MSN for pages in my domain I see pages that are excluded in my robots.txt file.

Is it just me or is MSN ignoring robots.txt files?

ResaleBroker, Jul 15, 2006 IP

eXe Notable Member

Messages:: 4,643

Likes Received:: 248

Best Answers:: 0

Trophy Points:: 285

#2

I found this link

seroundtable.com/archives/004096.html

eXe, Jul 15, 2006 IP

Jean-Luc Peon

Messages:: 601

Likes Received:: 30

Best Answers:: 0

Trophy Points:: 0

#3

Disallow: in robots.txt means "do not visit this page". It does not mean "do not index it". Several pages are indexed by Google, Yahoo, MSN and others without having been visited by a robot. This is in compliance with the Robots Exclusion Protocol.

Jean-Luc

Jean-Luc, Jul 15, 2006 IP

ResaleBroker Active Member

Messages:: 1,665

Likes Received:: 50

Best Answers:: 0

Trophy Points:: 90

#4

Jean-Luc said: ↑

Disallow: in robots.txt means "do not visit this page". It does not mean "do not index it". Several pages are indexed by Google, Yahoo, MSN and others without having been visited by a robot. This is in compliance with the Robots Exclusion Protocol.

Jean-Luc
Click to expand...

I'm familiar with that aspect of the Robots Exclusion Protocol [The value of this field specifies a partial URL that is not to be visited...]however I wasn't aware that a URL would be indexed without being visited/retrieved. Hmmm...

ResaleBroker, Jul 15, 2006 IP

oninuva Peon

Messages:: 834

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 0

#5

Jean-Luc said: ↑

Disallow: in robots.txt means "do not visit this page". It does not mean "do not index it". Several pages are indexed by Google, Yahoo, MSN and others without having been visited by a robot. This is in compliance with the Robots Exclusion Protocol.

Jean-Luc
Click to expand...

Where is the difference? If they can't visit it how do they index it

oninuva, Jul 15, 2006 IP

Jean-Luc Peon

Messages:: 601

Likes Received:: 30

Best Answers:: 0

Trophy Points:: 0

#6

A disallowed page can be indexed because of the informations collected by the robot in other allowed pages containing links pointing to the disallowed page. The address of a disallowed page can be present in the SERP's, but there will be no cached version of the page.

For example, if page "/blue-horse.html" is disallowed, there might be links like this in other pages :
<a href="/blue-horse.html">Blue Horse</a>
Code (markup):
That's enough for a search engine to index "/blue-horse.html" and to show it in some SERP's.

Jean-Luc

Jean-Luc, Jul 15, 2006 IP

Log in or Sign up

Is it me or is MSN ignoring robots.txt files?

ResaleBroker Active Member

eXe Notable Member

Jean-Luc Peon

ResaleBroker Active Member

oninuva Peon

Jean-Luc Peon

Useful Searches