For a largish site, (200+pages), is there a best linking structure to get Google to index the complete site as quickly as possible? Should the main index page link to all other pages? I've seen mention of limits of 100 links per page, does that affect the speed of indexing?
Everything else aside, the fastest way would be to link to every page from the index. There are no limits to the number of links Google will spider on a single page, as long as it's within the first 100k of the HTML document (everything after 100k is truncated... content, links, etc.)
Although Google does say not to have more than 100 links on a page, their site map has about 140 links. The easiest way I have found is to have links to the main pages on the index, and have a site map page, link every page to the site map pages, and the site map to every page. Nice and easy two hop link spider.
well, technically it'd be "nothing will be as fast as.." but that doesn't mean that other methods aren't able to achieve the same result AS fast. if you've got even a small number of decent links pointing to you, my guess would be that you won't have any trouble getting things two clicks from the index crawled, instead of just one click away... the bottom line is, no matter how you structure it, you're going to usually have to wait a while. google seems to be good at picking up new domains when there's links going to them, but they really don't seem to like deep crawling them until it's sort of at that point in the cycle for google anyway..
By this, do you mean that I should expect Google to pickup the Index page and then do nothing more until it's "that time of the month"?
yep. unless, of course, you've got other external links pointing to internal pages- then they'll be cached in a hurry. I'm sure there are exceptions to this, namely if you can get a TON of links pointing to your index in a hurry, or a few really high PR ones... but on almost every site I've made, the index gets in in a hurry, then I need to wait for the rest.
Just thought of this. On one of my other sites, which is a PR5, and has lots of pages, could I temporarily link some of the internal pages of that site to internal pages of the new site, then remove the links once the new pages get indexed? Do you think that would help speed things up?
I never even think about them - trying to do this for ONE search engine is stretching my brain too much as it is.
I would think so, yeah once a link is in the index (unless it's punished/banned/whatever), it's always in, regardless of if any links still point to it.
You could do it, but I'd only link to key pages.. What disgust has said above is spot on. There's usually a wait between the initial index (first/fresh crawl) , and the ongoing crawl.. (deep crawl). My approach is quite simple. I use a tree structure (logically in terms of navigation), then identify key nodes within the structure. I then go visit my collection of 'starter sites', and link back to each node. Although PR is not a big deal in terms of relevance, it's still a good indicator of how regularly a page is likely to be crawled and your links discovered, so go find a few directories or sites with PR 5+ link pages and use them. There's a list here somewhere. Cheers, JL
I am not sure where I read it, but I believe it was in G's faq's, but I read either 80 links or 100 per page maximum? Am I thinking of another SE? I know I have seen one of those two numbers, can someone confirm.
I did point that out in my post above new C, but no one reads me coz I am old bald & stupid this is the actual wording from Google 'Keep the links on a given page to a reasonable number (fewer than 100). ' Here is the link to the page it is on. http://www.google.com/webmasters/guidelines.html AND, here is the link to the Google sitemap with 150+ links on http://www.google.com/sitemap.html Then again if your a PR 10 I guess you will get 150 links spidered.
even if you're not a PR10 you can get that many spidered... I have dozens of PR3 pages that have 200+ pages inside of it cached (that aren't linked to elsewhere) 100KB is the only limit, really.