I read a post quite a while ago (can't find it again), where someone was discussing putting up a site, or multiple sites with content geared just towards getting indexed in Google. It sounded like perhaps some randomized content, completely unique pages for some of the tests, and more template based for others. Sort of an experiment with controls to see how the SEs would index... has anyone actually attempted that? I'd be curious to see the results if any... Thanks!
I think indexing is really a function of links to the site. Or in the case of a blog you have the ability to ping. In general, content should have little bearing of the ability for a spider to index the page.
Matt Cutts said that PR determine deep crawl (or number of indexed pages), I'm not quite sure but... yes... PR determine number of indexed pages, most likely.
I have an opinion on this First of all I love webposition (some of you might say it is overpriced or no good... my resutls say different) and have great success using the page critic filterd to Google.. ironically MSN absolutely adores my sites and i get ranked very fast with them once indexed. Google falls shirtly behind but MSN will rank you for the terms you optimize for exactly. Obviously you need to be building links properly for this to be totally true. As for indexing it is (well in my opinion) a function of way more then just links coming to it. Getting spidered is a function of the links coming to it but getting indexed is a combination of everything and I firmly believe that internal linking, external linking, content and latent semantics are all parts... LSI is proving to be more then just a theory.
Time is also a factor. You need time to get site indexed. Links. But I haven't succedd to get indexed large site in MSN - seems like a mission imposible.
Most pages I make are always geared towards getting indexed. Except for those in my robots.txt and those made when I'm lazy.
A single link to a lower level page, from another site on a page that is indexed & has page rank, will get your lower level page indexed. A site map dosent do any harm either "Lower Level = More than one Click from the home page" As mentioned time & PR are a factor in how deep your site is crawled by G If your PR is below 4 I belive "and could be wrong " you will not easily get pages more than one click of the home page crawled & indexed. As PR increases so dose the deepness of the crawl by G and the indexing.
Just submit a site map (XML version) file to Google and Yahoo and all of your pages will be indexed regardless of PR. Yahoo seems more timely than Google.