robots don't obey the whole robots.txt file

Discussion in 'Search Engine Optimization' started by serban, Mar 26, 2007.

  1. #1
    hello everyone

    i'm new here.

    here's my problem:

    http://www.itpromo.net/robots.txt

    -----snip------
    User-agent: *
    [...]
    Disallow: /*pdf$
    Disallow: /*xls$
    Disallow: /*html$
    Disallow: /*zip$
    Disallow: /*RON
    Disallow: /*EUR
    Disallow: /*USD
    Disallow: /*NONE
    Disallow: /*ASC
    Disallow: /*DESC
    -----snip------

    this should block all the urls containing the words after *, and the ones ending with them ($)

    Googlebot and Slurp recognize this, but Teoma and MSNbot don't:

    -----log snip-----
    "msnbot/1.0 (+http://search.msn.com/msnbot.htm)" www.itpromo.net GET /memory/a_data/1/NONE/DESC/NONE HTTP/1.0 41345 200 0 [26/Mar/2007:14:03:14 +0300]
    "msnbot/1.0 (+http://search.msn.com/msnbot.htm)" www.itpromo.net GET /memory/a_data/1/xls HTTP/1.0 13207 200 0 [26/Mar/2007:14:03:35 +0300]
    -----log snip-----

    what are my options to block all the bots from reaching this pages, they make a lot of traffic and i want this sections to be ignored, also i have rel="nofollow" to all the internal links pointing to this kind of URLs

    i've written the detailed problem on my blog also: http://www.ghita.ro/article/23/web_robots_and_dynamic_content_issues.html (scroll down to Problems).


    thanks!
     
    serban, Mar 26, 2007 IP