How does Google determine the number of pages on a site to index? Is there a relationship between a sites PR (Page Rank) and the absolute number and/or percentage of its webpages that Google will index?
I don't believe that's the case. I don't think, though am not absolutely certain, that google maintains an active index (meaning a cached image and link) of every page of a given site. Seems to me that the higher the PR, the more pages are indexed. Is that not the case?
From what i understand... there may be some correlation to pagerank versus how deep google indexes. This i obtain by placing my link in a high PR website and the number of pages indexed and showing in the site: search is higher than a site i induced spidering by placing a links in pr1 website. But if one is writing an algorithm to avoid duplicate reindexing there would be some sort of stack in the alg which would have some max number of depth attached to it. Hence it would be better idea to interlink your website as much as possible and reduce the depth needed to get to a certain page...
Site maps are helpful coz it allows Google to crawl all the pages linked from it. Google indexes pages as long as they see a link going to it. I think, the quantity of pages indexed is not related to the page rank of a incoming link, although I'm so sure. What I do understand is that the big G may index your site in a snap and then throw it out just as quickly because there are no links pointing to it. Google considers these incoming links as 'votes' for your site so the more the better and the better the quality the better.
If you have loads of pages then your site needs different content and meta tags and titles on each page to make sure they get indexed (unless you have a high PR site in which case they will prob get indexed anyway).