Despite Mod Rewrite Google is Indexing URLs as Dynamic

Discussion in 'Site & Server Administration' started by LOLDavid, Apr 10, 2006.

  1. #1
    Hi. We're www.learnoutloud.com and we've done a mod rewrite on just about all our URLs. For a long time Google seemed to spider all our pages as the rewrite pages, but just recently it seems that Google has taken to spidering our site as dynamic pages. We have Google Free Site Search on our site so you can check it out by searching "7 Habits" in our search box and you'll see all dynamic results (and this is happening with most other searches). The only way you can really get dynamic pages on our site is by blocking cookies which displays URLs with added on PHP session ID:

    http://www.learnoutloud.com/productpage.php?cat=1&catid=&level=2&subcatid=129&id=15594&nav=B&PHPSESSID=764f891b9a8a3e01dfba6da28668bfe7

    Some of the dynamic URLs in Google results have this session ID but some are just our normal dynamic links. I'm wondering how and why Google is spidering our site this way when we have done a mod rewrite and would like Google to index our rewritten pages and not dynamic ones. We're making a site map to submit to Google so hopefully that will help. I'm just worried that the reason that not all our site is being indexed is due to Google's spidering our site as dynamic. Any ideas as to why this might be happening?
     
    LOLDavid, Apr 10, 2006 IP
  2. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Hi,

    Google spiders all URL's that it knows of. If Google finds somewhere (in another site, in a forum,...) a URL like /productpage.php?cat=1&catid=&level=2&...&PHPSESSID=764f891b9a8a3e01dfba6da28668bfe7, it will visit it and from there it will discover other URL's with the same session ID.

    A site map will not help for this.

    A way to avoid the problem is to disallow access to these pages in a robots.txt file.
    User-agent: *
    Disallow: /productpage.php?
    Code (markup):
    This robots.txt disallow access to all URL's starting with /productpage.php?.

    Jean-Luc
     
    Jean-Luc, Apr 10, 2006 IP
  3. rehash

    rehash Well-Known Member

    Messages:
    1,502
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    150
    #3
    if google was indexing fine as dynamic, why did you change to static? Because I dont think it will have SERP effect, it 's all about if it indexes or not
     
    rehash, Apr 10, 2006 IP
  4. andy_boyd

    andy_boyd Active Member

    Messages:
    330
    Likes Received:
    33
    Best Answers:
    0
    Trophy Points:
    83
    #4
    And for any of you with similar problems in that Googlebot likes to spider your dynamic WordPress URLs instead of your mod_rewriteen URLs ...

    
    User-agent: *
    Disallow: /?p
    
    Code (markup):
     
    andy_boyd, Apr 10, 2006 IP
  5. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Blocking google from accessing the dynamic pages is not the solution.

    You should use htaccess to redirect your dynamic urls to the static url so google ends up at the right place. Problem solved.
     
    mad4, Apr 10, 2006 IP
  6. andy_boyd

    andy_boyd Active Member

    Messages:
    330
    Likes Received:
    33
    Best Answers:
    0
    Trophy Points:
    83
    #6
    A combination of both would be ideal ... robots.txt and 301 redirects should do the trick.
     
    andy_boyd, Apr 10, 2006 IP
    BamaStangGuy likes this.
  7. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #7
    This is the wrong way to do it. If you block googlebot from viewing the pages then it will never find out they have moved.

    End result is a bunch of old established pages removed from the index and google has to spider the site to find the new pages.
     
    mad4, Apr 10, 2006 IP
  8. LOLDavid

    LOLDavid Guest

    Messages:
    20
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Thanks for the replies. So we should not block Googlebot from these URLS the way Jean-Luc has suggested? We don't really want to redirect the dynamic URLs to rewritten ones unless we could just program the spiders to be redirected. Currently for people who block cookies the site switches over dynamic URLs so that we can put their PHP session ID in the URL they can maintain their session and shop and do other session necessary activity on our site. I received another suggestion making Google and other search engines not spider dynamic URLs:

    i would also recommend using a session killer for spiders, to prevent sessions being set for visiting spiders. you can do this by creating an array of all the known spider names, or part of, and run a check on the user agent, if the useragent is a match then allow them to bypass the session and of course show your rewritten urls if that's how you want google to index your web site.

    Do you think this would work?
     
    LOLDavid, Apr 20, 2006 IP
  9. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Why do you need to pass the session ID in the url anyway? PHP allows session management and tracking without putting the session in the url and it makes it a whole lot simpler and safer to just show the same content to your users and the search engines.

    If google decides to spider your site anonymously then it might index 2 copies of each page.

    The best way to do this is not use session ids in the urls and to redirect your old dynamic urls to the new urls using either htaccess or php.
     
    mad4, Apr 21, 2006 IP
  10. LOLDavid

    LOLDavid Guest

    Messages:
    20
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    The only time we switch over to dynamic URLs and tack on the PHP session ID is if the user is browsing the site with cookies blocked. Otherwise they'll only get rewritten URLs with no PHP session ID in the URL as that is accessed through their session cookie.

    Is there a way to maintain the session of a user who has cookies blocked without putting it in the URL? Because we want users who block cookies to be able to shop our site and maintain their shopping cart session.
     
    LOLDavid, Apr 21, 2006 IP
  11. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #11
    mad4, Apr 22, 2006 IP
  12. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Please explain.:confused:

    The text of the page you refer to starts with:
    Jean-Luc
     
    Jean-Luc, Apr 22, 2006 IP
  13. LOLDavid

    LOLDavid Guest

    Messages:
    20
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Yes please explain. :eek: If you could point out other options on that page if there are some.
     
    LOLDavid, Apr 24, 2006 IP
  14. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Err.....having looked into this I can't find a tutorial to show you so maybe I am totally wrong. :rolleyes:
     
    mad4, Apr 24, 2006 IP