Hi, I have a list of sitemap files. (These I generate programmatically) These Sitemap file are huge having thousands of URLs. It is very difficult to check each and every URL manually. So I have generated the utility which parses this sitemap file and using Apache Commons HttpInvoker I check if it is valid or not. Some urls if they are invalid they return 404 response; so I can find out the problem. But in some cases due to some exception error page is shown. So this is not a valid URL. But it does not return the 404 response. Response code is 200. So there is no way for me to identify if it is a valid URL or no. Not sure, I have heard that web-master tool does the same checking; so there must be something which can help to identify the valid URLS. Any Help on this is appreciated. Thanks in advance. Leena
I believe if you submit it to google via their webmaster tools, that after it crawls it (mine was crawled very quickly, although it was quite small) it will tell you if there were any errors (I believe it will tell you which were 404 or other invalid codes). Have you tried that out?
Hi, That I can do. But my application sitemap urls are large in number. More than 500000. So during development time only I want to verify if they are valid or no. How can I do this? How google does this? Leena