double and triple forward slashes in Googlebot requests

Discussion in 'Site & Server Administration' started by NetMidWest, Oct 6, 2005.

  1. #1
    Hello,

    I am getting some odd crawls from Googlebot, it is looking for stuff like domain.tld//folder/file.html and domain.tld///folder/file.html.

    I think the cause is older files that I did not link absolutely from - I used ../file.html and ../folder/file.html. :eek: I am fixing that as fast as possible.

    But in the meantime, I need to stop Googlebot from crawling these extra front slash urls, or 301 it back to the correct spot.

    Here's why: http://www.google.com/search?q=allinurl:www.netmidwest.com

    If you go to the second page, click on the omitted results link, things look a little better, but it is obvious I am being shot down for it as we speak.

    My own attempts at mod_rewrite failed. Searches in other forums (there is some talk at WMW) did not seem to fit my needs. At least one guy got left hanging. One disallowed the ///files.html in robots.txt, but I am not sure how Google would see that, especially since I have not seen this behavior before.

    I don't want to go into httpd.conf and risk messing up other sites... things were fine until recently. Seems google has changed something in the crawl.

    I am using a standard rewrite for non-www to www:
    and it seems as though I should be able to add something easy to it to stop this.

    Anyone have a solution?
     
    NetMidWest, Oct 6, 2005 IP