I have submitted my sitemap through webmaster tools ( https://whatimg.com/sitemap.xml - Warning: the -images.xml sitemaps are VERY LARGE I wouldn't click them in your browser ) and Webmaster Tools says the sitemap is fine, and properly reads all the URLs, but instead of indexing my pages Google is indexing my sitemap. You can see my indexed sitemaps here: https://www.google.com/#hl=en&q=site:whatimg.com+xml&oq=site:whatimg.com+xml Why would Google index these xml files knowing they are sitemaps? Is there something wrong with my structure? Is there a way I can hide them from the index without stopping them from being crawled - like removing their URLs through webmaster tools or blocking in robots.txt?
I'd recommend you to check again, use the below to search Google. I can see that nearly 430 pages of your site has been crawled and indexed in Google. Please use :- site:https://whatimg.com Code (markup):
The reason I asked you to check that is because of this statement of yours I only wanted to make you aware that it isn't just the sitemap that Google's indexing but your pages as well. And I don't understand, why would you not want to have the sitemap indexed? And if you didn't wanted to be so, why did you build and submit it to Google ?
This thread is 18 days old now, so in that time things have started to be indexed. I don't want my sitesmaps indexed because they are really large and provide no real content to anyone besides a robot.
Download your sitemap and remove all those URL which are not useful for you. Add only those URL for which you want to be crawled. If you do not want to be indexed/crawled, you can restrict the bot to crawling it by giving those in robots.txt.
Well, if you don't want it to be indexed or accessible to the robots, what is the point in having it then, remove it. Blocking it would anyways be as good as not having one.
Big lol.... If you do understand that the XML sitemap's for robots, and you want to block the robots from accessing it, why do you want to keep it. You don't seem to be clear yourself about what your actual requirements are .....
I don't want to block them from accessing it, I want to block them from INDEXING it. Does that make sense?
Yes, Now your question is more clear. You can add x-robots.tag to 'noindex' in your xml sitemap to block the robot for indexing. For more help, see: https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag
This question was already answered, please read the thread before posting, there is no need to bump this any more.
I figured the problem. You have made a mistake. You have added links to other sitemaps in your main sitemap.xml. This will make Google think other 9 sitemaps are site links. This is not the correct way of doing it. Remove your main sitemap.xml, remove it from Google Webmasters Tools and submit all other 9 sitemaps separately. I had the same issue once, but I fixed it this way. It is not a must to name your sitemap as sitemap.xml. But if you have dynamically generated content, prioritize each link, this is the most difficult part. If you have further questions do let me know.
No this is NOT the correct way. Why do you say its correct? If so can you point me to a Google guide? Why do Google webmaster tools allow you to submit multiple sitemaps? Why your sitemaps got indexed? Then why none of my site maps got indexed, while your ones are in index? Explanation you select as the best answer is not by Google.