This is something that has been playing on my mind for a while. What determines the indexed pages capacity for your site on Google? I say this because I have a store that has about 16000 product pages but I estimate that only 1/8th of the product pages are actually indexed. I have a XML Sitemap created dynamically for each product category so Google has access to all of the product page URLs along with its traditional spidering methods as well. The site is PR 6 and has many strong backlinks so spider frequency shouldn't be a problem. I have identified a lot of pages elsewhere in the site that should not have been indexed and are showing as URL only in SERPS. I have choked the spider's access to them so they should go supplemental and disappear soon. Would this 'free' up indexed pages capacity space? Any ideas?
I think it has to do with how "important" Google thinks your site is. Get more natural links (especially deep links), and I bet Google starts showing more.
is having all your pages in the root a bad thing? I wrote my cms system with all the pages in root...no directories...is this bad?
Make sure your title tags are different for each page, google will then know they are different and will be more likely to index them.
I think it's better to spend 99% of your time building good, original, content for your website and 1% of your time building links and promoting it. Most folks spend 99% of their time promoting their site and 1% building good content. As they say in the movies, "Build it and they will come". That is true in website promotion as well. Build a good site, update it on a regular basis and folks will find your site and link to it. This natural linking is the best way to go. Yes, it will take a long time, but most shortcuts don't work and might do more harm than good.
I think PR6 and orignal content should give you a looot of "capacity" But are those 16000 product pages original or they are just amazon/something else feeds? If they are not original, is a smart move from google's point of view not to crawl and index them since it only would waste resources.
hi rehash, The 16000 product pages are from a feed so they in themselves contain unoriginal content. The site however is not just a store, it is a community containing forums and blogs (see my sig), which is a lot of original content. Googlebot swallowed up 2.4GB of my bandwidth in August which was amazing, in September it was down to 500MB. Consequently indexed pages started to become supplemental and new pages where not being indexed. I realise now that it is probably not an indexed pages capacity problem I have, more of a spider frequency problem. I still can't put my finger on why Googlebot visits dropped so dramtically from 2.4GB to 500MB. It must have been a penalty of some sorts. Quite possibly it was a datafeed penalty but this peeves me somewhat as the products are relevant to the subject matter (i am biased however ). Any ideas about the drop?