Ok, here is the deal. I have about 900 pages in my sitemap right now, but that's just the beginning. I'm estimating that my statistical search engine has about 400,000 urls that it can display. I would like to index all of them. How do I do this, is their any non-server side programs I can use to do this? Like a script that I can run from here to generate the information that I can enter into the sitemap manually? This would be a great help... I don't want to install anything on my server to do this, I don't see why it can't be done but I cant' find anything to do it... Any software recomendations? Websites related? Advice? Thanks!
I have been using GSiteCrawler for generating sitemaps for my sites.. you may check it out (it's free and excellent tool for generating sitemap for Yahoo, Google and Live) http://gsitecrawler.com/ Cheers Gs
google allows 50K per sitemap (i hav seen tthis when 6months ago i prepared a sitemap for my article dir 60K+ Urls) breeak it into 8 or 10 files & if you have list of urls u can prepare it using excel too
If you configure A1 Sitemap Generator for large websites, then 400.000 will probably not be a problem (if using a recent version, e.g. 1.7.4). But it can depend on factors outside my control. You should make sure to test it first. One thing worth mentioning: You can automate the software to e.g. run at night 10 hours, then stop.... Then next night, resume scan 10 hours etc. This might be useful to you when scanning such a large website...
I would think that if your site is being database driven, it would be a simpler task to write a little code that generates and updates the sitemap realtime
Site Mapper.info is offering free tool to create unlimited google site maps. NO LIMITATION!! http://sitemapper.info This may help you...
400000 links? how much time do you think this will take? I use a standalone sitemap builder downloaded at www.sitemapbuilder.net. However it is not very actual, I use it for testing and for performance matters, since this software is a lot faster than everything online, and following 10 links at a time. I tryed once to use it for a site that was "only" 50000 links (google recommended limit for a sitemap file). It took more than 5 hours to generate and the file was so big, it was almost impossible to change individual priorities! I really would like to see an online tools capable of dealing with such amount of data. Additionally, regarding the recommendations about sitemap files limitations, your sitemap will have to be divided in 8 xml files (50000 records each with a limit of 10Mb for each file) and last but not least, a sitemap index will have to be built! I don't know any online or standalone generator capable of cutting files into 50000 record subfiles and generate the index automatically. If someone knows, please inform, otherwise, you will have to build it manually. Do you really need the search engine to index all your files? To index 300 articles inside my site google took 1 month!!! how long will it take for 400000?
yes this is it right here... Is there any sql commands for this? yes it is a lot of info, and thank you for the replies. The site is pretty big... I need it to be completely indexed though so I can get the most amount of traffic. Will this huge site status get my site a better pagerank?
An interesting point of information, is that yahoo has indexed 1700 pages and google has indexed 1050 pages.. They were both submitted at the same time! It looks like yahoo is a bit quicker than google when it comes to crawling and indexing pages. So say this thing has 400k pages, does that make it worth more $$$$
Well, since you ask. A1 Sitemap Generator does this. Or put it another way: When it generates the XML sitemap files, it will automatically split output into multiple xml sitemap if necessary + generate sitemap index file. (The actual "URL limit per sitemap file" default is set to 25.000 in A1SG, but can be set to the sitemaps protocol max of 50.000 as well.) It will indeed take some time. But not so bad when you can automate software like A1 Sitemap Generator to run during night-only every day resuming earlier website scan. But yes, 400.000 pages are a lot, and will probably take a couple of days.
I tried to use Gsite and it didn't work.. Good news, google has crawled 2500 pages of the 400k... It doubled in one week. I've been getting alot of 1 hits too that register 0:0:0 that generally congregate around decent viewing visitors, 4 pages + is this the google crawler?
Yes this is the right answer. I have used that software, free and excellent. you only install on your local computer.
Well my vote goes for gsitecrawler. Any other online sitemap generator has some limit of crawling. As you have 4 Lakhs web pages, you cannot depend upon online sitemap generator unless u purchase their premium service. So i would recommend to divide your sitemap as per category or section & create multiple xml sitemap. And finally create index xml sitemap with the details of all the sitemap. Then lastly submit your index xml sitemap & all other section sitemap. Soon you will find your web pages indexed in google. Hope I have answered to your query properly.
You could use the Google sitemap generator from Google (oddly enough) - google.com/webmasters/tools/docs/en/sitemap-generator.html Another (better) way to do it assuming that your site is powered by a database would be to write a script which generates a sitemap directly out of your database. If you are using a generic script (e.g. PHPBB) check the plugins page for that site to see if there is a sitemap generator.