Let's suppose we have a script that is generating some code like this: somesite.com/item.php?c=4&sc=15&id=47 "c" is the category, and let's presume category #4 is "widgets" "sc" is sub-category, and let's presume #15 is "accessories" "id" is the item id, and let's presume #47 is a "widget cleaner" It seems the school of thought these days to rewrite such a url would be to produce something like this: somesite.com/item.php/widgets/accessories/widget_cleaner/ I'm seeing quite a few seos use this method, and it seems their site is getting indexed well. One concern I have in this example is the item is 5 level deep, and google often reduces pagerank by 1 for each level. But is general though, I would love to hear your comments about the above or similar examples.
Where do you get this from? Page Rank is based on link popularity and the PR of these links. Since most people get incomming links to home page, the homepage tend to have higer PR and internal pages lower. Your real problem ( this is from google's webmasters guidelines):
Some rewriters keep it at top level, albiet with a longer URL, so your URL might end up looking like: somesite.com/item~c~4~sc~15~id~47.htm
What shopping cart are you using? If OS commerce or Zen cart, there are contributions to make the URL's more SE friendly.
is there a difference between .php and .html the way googlebot sees it? basically is it better to have a .html url than a .php?
i was wondering is it betta to have a .htm rather than the full .html extension??? does it make a difference?
I don't think it really makes any difference. Google does not differentiate between .htm or .html file extensions, so you can use .htm, .html. But just be consistent and not to use .hml for some pages and .html for the rest.
.php or .html doesn't make a difference. It's the same for the crawlers. What comes to peoples mind is that .php are dynamic pages so they are not so good. The dynamic part that affects the crawlers are the variables in the url like ?id=whatever and you can have .html pages with those too. Because of this a lot of people make the php parser parse .html too, so they don't have to use .php, which just adds load to the server.
somesite.com/accessories/widget_cleaner/ Really I would have it like that, if I could. If there is a link from your homepage it will make no difference to the search engines. For visual purposes, less directories look a little better.
how about including all files in the root directory... like 4000 of them or so... will they fit in there or what is the max number of files you can put in one directory in unix / linux server?
index.php?page=etc should be fine. If you use dynamic urls, keep the extra chars to a minimum. Google shouldn't have a problem if it's just one "?" They have indexed my entire site http://www.pinkpt.com even though I use dynamic URLs