Does anyone have reliable information about how google handles URL's not included in the sitemap? does google stop crawling them? does google treat them differently? no change? i haven't been able to find any reliable info regarding a requirement to be comprehensive. i want to generate a set of sitemaps covering all 300,000 pages at a site, but the dynamic URL's are a pain to generate, would like to leave them out. UNLESS this means they will be disregarded or de-prioritized.
nothink happens . there are missing pages one of my sitemaps. google can visit and find those pages but some time has passed until google finds these pages
For the dynamic URLs, you may have a look to the url rewriting method. Then you will be able to add them to your sitemap. ps: 300 000 pages... houahoo !! What is the subject of your site ?
yeah... you're right. i did start work on this. it's an engineering nightmare. i guess it's better to do it though, than to leave URL's out. this is the third time i am re-engineering my database. i hope the sitemaps are worth it! real estate. but my site is actually way smaller than that of my competitors. its not just this sector. nowadays you have, say, international flower delivery services generating a page, or even several pages, for every single street they deliver to. that's easily millions of pages. as the competition heats up, you'll need a million pages just to stay in the running. it's insane.
Yeah, it seems the urls not included in the sitemap get treated like you didn't send any sitemap for them.
well after a lot of unhealthy food and unsociable behaviour, i managed to complete my sitemaps... google just told me status is OK and all URL's have been accepted. i left a few URL's out of the sitemap on purpose, and i'll compare how they do in the SERP's. i will be posting results here in a few weeks time.
Why are you re-engineering your database for the sake of dynamic URLs? Just use ISAPI (if using IIS) or mod re-write. You simply need to replace "?" with "/" or similar. Also ensure you don't use client side sessions or variables/page names such as PHPSESSIONID - anything that makes a spider think that the page may use sessions is a MASSIVE turn off. Spiders obviously can't handle client side sessions so any suggestion that the page may use them will result in the spider ignoring the page.
thank you for your advice! most of what you say is gibberish to me but i think you are talking about php. i don't use that.... anyway my site is doing great right now... spending one week setting up a system for getting everything into sitemaps was definitely worth it. as for the URL's missing from the sitemap, they are still in the SERPs, and in the index, no change yet. early days still.
Google sitemap helps Google crawler access those pages that are otherways inaccessible. I mean those stray pages that are not linked from any other pages. Also how often those pages may change. It is just an information for the crawler. If you omit a page then google will treat that page too as any other page. The only difference is that, if that page is not linked from any other page, then there is no means for google to find that page.