Trouble with Google News Crawling site

Discussion in 'robots.txt' started by Beefandy, Oct 27, 2009.

  1. #1
    Hi, I submitted www.jazzou.com to g news for them to see if they will use my news feed as a source. Its all been built to the spec they need, I already have a site which they use as a source.

    I have got this message back though from Google:

    'Thank you for your note. In order to for us to evaluate your site articles
    we need to be able to access the content on your site. Currently, crawlers
    cannot fill out registration forms, nor do they support cookies. Given
    that, in order to successfully crawl your site, we need to be able to
    circumvent your registration page. When we tried to access
    http://jazzou.com/index.php?option=com_content we got the following
    message:
    "You are not authorised to view this resource. You need to login"

    The easiest way to do this is to configure your webservers to not serve
    the registration when our crawlers visit your pages (when the User-Agent
    is "Googlebot"). You can verify that the request is actually from our
    robot by making sure the IP address is within the range of 66.249.64.0/20.
    It is equally important that your robots.txt file allows access by
    Googlebot.'

    The thing is though the url I sent for them to use was: http://jazzou.com/index.php?option=com_content&task=blogcategory&id=0&Itemid=40 which can be accessed fine. However they are coming back with http://jazzou.com/index.php?option=com_content which does not let anyone access but I'm not bothered about anyone viewing that page anyway

    Can anyone give any help with this?

    Thanks
     
    Beefandy, Oct 27, 2009 IP