Hi I have had GSiteCrawler running for a few days now and it still hasn't finished crawling my site. I have a coppermine gallery which has thousands among thousands of links which is making it impossible for me to create full sitemaps like I want. Is there any sitemap creators that you would recommend that could crawl this many links within a few hours or at least 24 hours? Or is it hopeless?
What slows down the crawlers may be if you have a "backend" DB which gets "hogged". I have experienced this with some forums as well. It may pay of to be very careful with crawler filters. (e.g. if two pages are very similar to each other, cut one of them) There is no way tell if e.g. my software may be faster before knowing the url of the website. For such a large website (100,000+ pages), I would in my program disable : Track all links and redirects from and to all pages Let website crawler collect external links
why is it so essential that it gets done in 24 hours? beyond that: make sure the pages are unique enough to warrant listing AND you also have enough incoming links for google to want to go that deep. having a sitemap won't instantly make google index your whole site, no matter how many or how few pages you have; links have played a huge factor in search engines for ages and will continue to do so for a long time.
While I agree that the effect of sitemaps are not always fully clearcut, I tend to think they help quite well for bigger websites. Of course, it is always a good idea to have friendly urls (e.g. like no session IDs! and prefebly everything mod_rewrite) and incoming links... But to argue that Google XML sitemaps (which this forum is about) is no help for big websites... That kinda surprises me