I have just figured out how I am able to create my amazon associated website (don't worry, it's a php question) and came across this probelm. If I use PHP to generate all product html pages needed (by using mod_rewrite, is this a good method of doing it?) based on parsing data etc for the user, I am assuming when google is crawling my website that they will encounter all of these PHP functions and land at the html pages like the user would, but, if google indexes these pages, can they be accessed by following the mod_rewrite html address? For example www.mynewawssite.com/get-the-product-the-user-requests.php Then mod_rewrite to rename the destination page: www.mynewawssite.com/Product-Title.html When google crawls this webpage ^^^ will it they index it? Also, if it is, will a google user be able to access it by clicking the link to www.mynewawssite.com/Product-Title.html? If not, is there a php function needed to generate and create the webpage on my server? Also, if this is done, am i required to stop it from generating webpages if one already exists? Thanks in advance for any advice given!
Google indexes php pages just the same as html pages. Php is executed before any data is sent to a web browser (including google). Unless your url looks like a huge query string (?this=nvfruiovfrnuivngre&that=jfunwevbo8t4wivtoui&theother=jnviurbvyuiotrhnvitoe) there's no reason to use mod rewrite just to change the file extension. Personally, I would just leave the pages with .php, and make sure they are named somethine relevant to the page itself, ex: mysite.com/computer-hardware.php
What I do is use a html_cache folder and a RewriteRule/RewriteCond to see if the REQUEST_URI exists in that cache folder. If it exists I rewrite the request to that HTML file transparently. If it doesn't exist I rewrite the uri to my PHP page which generates an HTML file and saves it to that cache folder for the next person before returning the generated page. The browser/search-engine/etc never sees a *.php URL. Here's an example of it in action. I just started on the store links this morning, but I've been using the technique for the wallpaper pages for weeks now. // Edit -- Here's a debugging thread of mine about the RewriteCond when I first implemented it.
But say for example I never created a webpage (file) in php or html, instead I used a call to Amazons api using PHP to return the data needed required there and then and then applied this data to a set template when the user wants to see this items page? So the user sees a html page generated by the results of the API call, will these pages be indexed by google, and if so, does PHP create the file, and how do you prevent it creating a new file each time, I know it's hard to understand what im talking about but its really difficult to explain
Yes, they will be indexed by google, but how many pages google will index depends a lot on the amount of links / trust that your site has. Not even static pages will fix this. Since there is potentially an infinite number of pages on your site (due to the nature of completely dynamic sites), google gets selective on how many they index. Make sure you have a relevant hierarchy of navigation such as: index: categories: sub_categories: products: Also, make sure you aren't using session id's in your urls. As stated above, try to also use a relevant url schema. Using the product hierarchy above, if you can accomplish something like this it will really help as well. mysite.com/category/sub_cat/product-name.php
People never understand my answer to this question. It's a shame too because it does exactly what they're fumbling around trying to explain and it does it well.
Google will index it just fine, b/c to them and users there's really no difference, since mod_rewrite executes before the requested file is even loaded. Examples of sites that uses mod_rewrite for this purpose is wordpress, and as far as i know, google indexes wordpress pages just fine, even though they are not really physical pages.
This is exaclty what I was talking about, my product pages will not be physical pages, at the minute im not worried if google will index them, but if they do and they index the non physical html page, how will this hyperlink lead to the product when there's no physical product page, will I be required to generate a physical page for each product I want indexed?
So do you mean you check if this webpage exists in the html_cache folder before loading anything, then if it does, load it from the html_cache folder, or if not, then generate the webpage and store it in the html_cache folder? Does this html_cache folder hold all of the generated pages for a period of time? If so, what happens when a user finds your indexed html (not your PHP page to check if this page exists etc) webpage on google, clicks it to access it and it no longer exists in the html_cache?
Yes, you do. What happens is mod_rewrite will change it into a format such as http://www.example.com/view/post/123 And so what happens is you must make your index page the "parser" your index page will first have to detect if the url is in that specific format, if not do something, maybe show a default page or output an error, or direct them to the main page so all url from there on wards will be formatted correctly. If your parser detects that the url is in the correct format (you can do so using the php $SERVER_ variable. One of the $SERVER['REQUEST_] variables can be manipulated to show only what's after the base url. and after that, you will use explode() with the delimiter set to '/' ) And then you end up getting an array of "commands" which using my above example would be {"view", "post", "123"} And then at this point, you just have your index page read the first array value and it see that's it's a "VIEW" command, that's trying to view a POST, and that the post id is 123. At that point, you just pull that info off your database. And you're done.