1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

301 Redirects and Spiders

Discussion in 'Apache' started by briandunning, Oct 14, 2005.

  1. #1
    In going through my server logs, I've discovered that msnbot is ALWAYS (and Yahoo Slurp is sometimes) getting a 200 response on pages where they were given a 301 in htaccess like this:

    RewriteRule ^oldlink$ newlink [R=301,L]

    Googlebot gets the 301 right 100% of the time, Yahoo Slurp wrongly gets a 200 about 25% of the time, and msnbot wrongly gets a 200 100% of the time. Anyone know why, and more importantly, anyone know if a different format of redirect has to be used to cater to MSN and Yahoo?
     
    briandunning, Oct 14, 2005 IP
  2. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #2
    Post a sample of your web logs. If you have no rewrite rules based on user agent, then the request has to be slightly different.

    Are they by chance directories? MSN and Yahoo spiders (and index) have some weirdness with properly adding a trailing slash to the end of the URL. So if they are directories that aren't really there, you can't rely on Apache to do an automatic redirect (to add the trailing slash).

    http://www.seroundtable.com/archives/000143.html
     
    digitalpoint, Oct 14, 2005 IP
  3. briandunning

    briandunning Active Member

    Messages:
    262
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    98
    #3
    Indeed. Here is the relevant line from .htaccess:

    RewriteRule ^(.*)\.(.*)\.html$ http://otherdomain.com/$1.$2.html [R=301,L]
    Code (markup):
    Here I've snipped lines from the actual log showing (1) How Googlebot always correctly gets the 301 response 100% of the time; (2) a line for Yahoo Slurp also getting a 301 about 75% of the time; (3) a line for Yahoo Slurp getting a 200, about 25% of the time; and (4) a line for msnbot failing and getting a 200 100% of the time. I added a space between each line for clarity.

    
    66.249.71.18 - - [14/Oct/2005:03:58:57 -0700] "GET /you-were-never-there.53058.html HTTP/1.0" 301 255 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"
    
    66.196.91.14 - - [14/Oct/2005:03:58:55 -0700] "GET /gloryhallastoopid.385.html HTTP/1.0" 301 250 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
    
    68.142.251.159 - - [14/Oct/2005:03:32:41 -0700] "GET /tattooing-and-body-piercing.45902.html HTTP/1.0" 200 14688 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
    
    207.46.98.128 - - [14/Oct/2005:02:26:55 -0700] "GET /riverwheel.58098.html HTTP/1.0" 200 16190 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
    
    Code (markup):
     
    briandunning, Oct 14, 2005 IP
  4. digitalpoint

    digitalpoint Overlord of no one Staff

    Messages:
    38,333
    Likes Received:
    2,613
    Best Answers:
    462
    Trophy Points:
    710
    Digital Goods:
    29
    #4
    Is there any other rewrite rules (either .htaccess or in httpd.conf)? Also, that .htaccess file is in the root directory of the domain, right?

    Have you tried using something to spoof a certain user agent to see if it's specific to that or the spider's IP address?

    Something like:

    curl "http://www.yourdomain.com/tattooing-and-body-piercing.45902.html" -I -A "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
    Code (markup):
    The 301 is coming from the sever-side, so even if the spider wanted to, they can't force your server to throw a HTTP 200. So there is some logic on the server. Just need to find out what's triggering the logic (user agent or IP is my guess) and then figure out where it is.
     
    digitalpoint, Oct 14, 2005 IP
  5. johnt

    johnt Peon

    Messages:
    178
    Likes Received:
    21
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Is it possible that Yahoo and MSN are occassionally requesting pages from domain.com rather www.domain.com ? Do you have RewriteRules set up to redirect from domain.com to www.domain.com ?
     
    johnt, Oct 15, 2005 IP