Many of my website pages are scanned images of historical documents. Thus, they have large image files. I have nearly 3000 pages, probaly a full third of them are these large scanned image pages. I do want them indexed, but the time to index them for my SiteMap took hours and hours. I have two questions here: 1 - Will Google stop crawling if it finds a site map and just use that, or 2 - Will Google stop crawling because it takes too long crawling all over the pages it finds on the site map.
Create an image sitemap for your image files. Don't include them as normal URLs in a standard XML sitemap.
Thank you for your reply. In fact each of the above mentioned pages has html with a brief explanation of the scanned image therein contained. (I realize now that my original post was misleading about that) My problem is that I do want the pages with the scanned images indexed as a page, but I am concerned that, since each page contains a large image, Google may stop crawling after indexing a lot of pages. . . Or does Google use the site map exclusively? -- in that case I would prepare my own site map accordingly.
Google first downloads a page. Then, if Google crawlers choose so, the images used in the page. Meaning I don't think it matters much for your overall content crawling if you have a huge single image in each page. (However, I haven't written the code behind Google's crawler / priority crawling, so it's hard to say for 100% sure)
Thank you very much. That is really helpful information. I will construct my site map accordingly. Again, thank you
It is taking too much times doing this scan things. I prefer to have it print screened every pages and just email it.
Jargodon It may be that you misunderstand the issue. The scanned pages are part of my website so email isn't an option. In this thread we are discussing how to prepare the sitemap for my pages.