Lets say In a sitemap I leave out pages A and B. These are pages that are supposed to be linked from other pages on the same site, and anybody taking a casual look at my sitemap should not be able to find their position (that is why they are omitted from sitemap), but they are linked to from other pages, so a real visitor will find them. Theoretically, the search engine spider should follow that link and include that page also in the index, even if it is not in the sitemap, because it is linked to from a page that exists on the sitemap. What happened is this: I created site, no sitemap, Google indexed it, indexed some german pages also, now I created sitemap, leaving out german pages, submitted to Google, Google indexed URLs in sitemap, dropped the german pages from the search results (when you do site:domain.com it shows German pages are not in the index after sitemap submission). The pages are linked to from other pages, so Google should be reaching the pages by following the links in the other pages, and the links are all do-follow.
Yes, G will index those pages so there are couple of ways to prevent the inclusion. 1. Use <meta name="googlebot" content="noindex"> to block G from indexing the site or <meta name="robots" content="noindex"> for all bots to don't index this page. 2. Use robots.txt 3. Add your site for removal in G webmaster tools > Remove URLs section
That is unusual. Have you tried lowering the priority and update frequency of the links in your sitemap? Perhaps Google is giving preference to all your sitemap links if they have a high priority level.