Ive just started making sitemaps for my sites, and while testing the numerous scripts and tools out there i noticed a weird occurance. When the sitemap got generated I kept on seeing bizarre urls, such as http://www.domain.com/https://www.domain.com and urls that were incorrect (such as www.domain.com/blah.php instead of www.domain.com/blah/blah.php). Ive checked out my site with Xenu and it came back clean, so Im rather stumped at this. Can anyone shed some light on this?
They might be badly interpreting different ways of expressing relative and absolute URLs ... are you getting consistant bizarre URLs from the scripts and tools?
Could we get an url of your site? I would like to run it over with my sitemap generator. Call me curious
Depending on whether the sitemap tool is using an HTML parser or regular expressions to do the job, your results may vary. strange URLs can be a sign of code that can't be validated at w3c or one of the other validator tools. If you are truly having an impossible time getting it done, the last resort may be to find a sitemap service like AutoMapIt.com or others who use 'web-bugs' as an alternative. Web-bugs are images that call scripts in order to grab your URLs. This isn't the only way to do it, but for the really stubborn pages, it's sure to do the trick.
Can you PM me the URL of your site? I'm intrigued I'll see if I notice anything obvious in the code that might be causing it
Thanks for the offer however, I feel I've figured it out. the site uses a <base href=""> tag in the header, meaning that we coded all internal links without the www. domain. com bit, as it pulls this from the base href tag.