On my eCommerce site I have a pagination. So my 200 items are spread out 10 pages like page1.htm, page2.htm..... The problem is that users can easily change the sorting. So page1.htm?sort=0 (page2.htm?sort=0, .... ) will show most expensive items first and page1.htm?sort=1 (page2.htm?sort=1, .... ) will show least expensive items first. (Notice different parameter sort) So the question is what do I submit to Google's sitemap? Only page1.htm, page2.htm,.... or all 3 versions of them Page1.htm, page1.htm?sort=0, page1.htm?sort=1, .....
You should use the below xml sitemap generator to crawl your site and create an xml sitemap, it will use the best method possible. Yes its free. http://www.vigos.com/products/gsitemap/
Are all 3 different? Or are "Page1.htm" = "page1.htm?sort=0"? I do not think Google will penalize you too much for pages that are somewhat-alike... That said, you may want to ask in a search engine optimization forum and not sitemaps (since your question is also relevant when not using XML sitemaps) If you do decide to block some page variants from search engines, you can do so in e.g. robots.txt.
Thanks ThomasShulz. "Page1.htm" not equal "page1.htm?sort=0 since on page1.htm products sorted randomly and same product can end up on page3.htm with sorting specified products basically always on the same page. Right now google spiders both pages page1.htm and page1.htm?sort=0 since people link to both versions at will.... So if i submit my Site map, without sorting urls, will google ignore page1.htm?sort=0 or is it still going to keep it in index??? thanks
Google uses XML sitemaps as an extra way of discovering your pages. If it finds other variants on its own (e.g. the sorting ones), it will include them as well. This will in most cases not penalize you. You may even get multiple variants listed and indexed... Most of the time Google will just select what it thinks is "best"... But... If Google sees you have lots (!) of "similar" pages, it may become in general less willing to index your website. It is possible to use "robots.txt" and "noindex" meta tag as a solution to this - by letting search engines now what pages not to index. I would probably include all pages, but that is just what I would do. Main reason is that you say people are linking using different sorted/non-sorted urls.