I submitted google sitemaps without pause on a few sites that were heavy content...but I'm hesitating on others. Other of my sites have heavy content up front with sitewide links to relevant store directories. If google indexes these sites "Naturally" It gobbles up the unique content first and then generally stashes a few of the affiliate store pages too....not spidering deeply or very off topic in the storefront. SO anyway...I started GsiteCrawler (which is a pretty nifty tool BTW...though I haven't done much comparing) a couple days ago (still running) and I could easily now generate a site map with 200,000 pages after weeding out the needless URLS like cart, empty cart, signin etc... Submitting a sitemap of 200,000 pages for a year old site with 450 pages indexed doesn't feel quite right. Maybe this is where the priority field makes sense? I want to say Hey google spider...I have all these thousands of pages here if you are really hungry today...but honestly they aren't as relevant or unique to my site as these really tasty ones here. I don't want to get penalized for showing them piles of affiliate links all at once from a site that's appeared search engine friendly so far.
i think it would be best to hold off for a while and see if anyone else is getting penalised for large sitemaps....they can be the "guinea pigs" so to speak while we sit back!
I think I WILL wait till I hear more about it. It was a good exercise running the crawler anyway, making me more familiar with all the links. And in one instance I saw all the bad links coming from my index page! I had used the wrong base url in the page generator! I guess it pays to keep poking around in there...
This is what it says in the FAQ.. Don't know if that makes anyone less concerned or not or how believable it is but hey, thought I'd post it since I was reading the FAQ anyway
there is a huge difference between saying we wont penalize you for having a sitemap and saying we won't penalize the garbage your sitemap points us to.