spiders ignore robots.txt !

Discussion in 'robots.txt' started by zachbb, Jul 29, 2006.

  1. #1
    Hi

    I am developing a site at www.DomainSocial.com. I noticed that GoogleBot and Yahoo Slurp have been crawling the site, but never leaving the calendar (www.domainsocial.com/calendar.php). It is very annoying because it seems that they constantly crawl the "next week" and "previous week" links, and there being an infinate amount of weeks in the calendar, it dosen't look like they will stop soon. All the while, the bots are so busy crawling the calendar that they ignore my threads (which is what i want them to crawl)!

    I have set up a robots.txt file on my site to specifically tell the bots to not crawl the calendar (www.domainsocial.com/robots.txt). However, they still do it! It's been about a week that the robots file has been there, and all the bots seem to ignore it... what's wrong?

    Thanks a lot!

    Zach
     
    zachbb, Jul 29, 2006 IP
  2. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Hi,

    The syntax of your robots.txt is not valid. Each URL in a Disallow: line should start with a slash "/" like this :
    User-agent: *
    Disallow: /newthread.php
    Disallow: /newreply.php
    ...
    Code (markup):
    Also do not use the Allow: directive. It does not belong to the standard. It is only understood by a few robots and, even when it is understood, there are several interpretations of it.

    Jean-Luc
     
    Jean-Luc, Jul 29, 2006 IP
  3. zeljic

    zeljic Peon

    Messages:
    218
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #3
    hi,


    Disallow: /newreply.php
    or
    Disallow: ./newreply.php

    ?
     
    zeljic, Jul 30, 2006 IP
  4. zachbb

    zachbb Peon

    Messages:
    276
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #4
    OK thanks guys, I fixed it!

    sorry that it took a long time to reply, but I couldnt get it to work.
    First of all, as zeljic said, I am missing the /
    Secondly, I saw on google webmaster tools thatthe robots file wasnt being recognized. The encoding was wrong. I redid it in notepad, and it worked!

    Thanks

    Zach
     
    zachbb, Aug 16, 2006 IP