How can I redirect these bogus URLS that Google is listing my pages under?

Discussion in 'Apache' started by Darden12, Apr 22, 2011.

  1. #1
    I have a bunch of lyrics files under a directory called lyrics, as follows:

    http://www.MYDOMAIN.com/lyrics/FILENAME.html

    For some reason, Google is indexing all these pages with the following URL format:

    http://www.MYDOMAIN.com/index.php/lyrics/FILENAME.html

    See? They're adding index.php (the main page of my Web site) into the URL.

    When Google's version of the URL is loaded, the lyrics page does NOT appear: Instead, a slightly mangled version of my home page (index.php) appears.

    Can someone please tell me what to put in my htaccess file so that folks who click on these URLs are taken to the correct URL?

    I need an htaccess line that will send all requests starting with

    http://www.MYDOMAIN/index.php/lyrics

    to

    http://www.MYDOMAIN/lyrics

    instead.

    Thanks!

    Bonus question: WHY is Google adding "index.php" into the middle of these otherwise seemingly straightforward URLs for these simple html pages in my subdirectory called "lyrics"?
     
    Darden12, Apr 22, 2011 IP
  2. MartinPrestovic

    MartinPrestovic Peon

    Messages:
    213
    Likes Received:
    7
    Best Answers:
    0
    Trophy Points:
    0
    #2
    If Google is doing that it's because that is a structure they have found for your site. Check your code to make sure there are no mistakes which would allow the crawler to index the site in such a way. Could be via a broken sitemap generator or some other bug in the software.

    Either way, they are indexing it like that because that is how they are seeing it. First fix that problem then worry about redirecting the broken pages.
     
    MartinPrestovic, Apr 27, 2011 IP
  3. Tritontrax

    Tritontrax Peon

    Messages:
    23
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    As MartinPrestovic said, you need to search the source code displayed on your rendered pages to find these URLs that the Googlebot is indexing. The sitemap is a prime culprit for this sort of thing, but it's really hard to say without seeing your site.

    If the site is large, consider using a mirroring program like wget to grab all rendered pages of your site, for easier searching locally rather than via the web browser.
     
    Tritontrax, Apr 28, 2011 IP