Hi, Any of you must have experience with a Large content websites. I have a website with over 1Million pages and only 200.000 pages are indexed. Any tips?
You must have sub site maps. Google may not do a good job at indexing it. So create sitemap 1 2 3 4 5 6 7 8 9 10 - Put the links to the sub pages at the top of your site map so they are crawled to first.
no way in the hell yuo can put that many pages on a sitemap. even if you devide the site in categories and subcategories, still its not possible to do a sitemap for a site that large. make sure you have lots of backlinks so that googlebot visits your site every second and everytime you add a page, make sure it shows up on homepage and other internal category pages as recent update or addition. if googlebot visits your site often, then it will follow and crawl your internal pages
500 pages for one sitemap is way too many. i would not recomend having a sitemap for any more than 50 pages. even 100 is a bit pushing
I am sure Google has it's own special filter for large content sites. I would suggest dividing the content up into 1000's of categories and sub categories so each sub category is about 100 pages. then create a sitemap for each sub category. Also have a damn good navigation system so Gbot can find all the pages correctly. And I do hope yuor not spamming the SERP's
There are many good Sitemap creator tools out there. We use gSiteCrawler to handle very large sites. But for a site this large, I would go to an Python base (server) version. Note: It will take ~ 3 days to create a sitemap set this large. As for the limited indexing of the site, your onsite navigation is flawed or Google considers your "template" is creating duplicate content. DP has a very good Sitemap section that you should look at. You also need to setup and use Google's Webmaster Tools to submit the sitemap set. It will give you the status of what is going on. And both Yahoo and MSN have their versions as well. PM me if you need help...
I wouldn't waste Googles time just indexing crap pages. I'd be happy for my top 1000 pages that have the best content to be indexed first up rather than everything else.
you should divide it up and into many site maps, because google only allows 50000 links on each site map
no need, not every page is going to be indexed by search engines. Submit the ones which actually optimized for your targeted keywords. Obviously you won't have 1 MILLION keywords targeted right?
Hi there! Google offers a professional way to do this. You can create a Sitemap Index file, that contains links to seperate sitemap files that itself contain a maximum! of 50.000 pages. Ofcourse the quality of the configuration of those sitemap pages is important, and may only be considered when Google finds out that you are using it to enhance the findability of information on your website. You can find more information on Google Sitemap Index files on the following address: https://www.google.com/webmasters/tools/docs/en/protocol.html#sitemapFileRequirements
Sitemaps do not ensure indexing ,,,,,they only help with crawling Google does not need nor want 1,000,000s of pages of junk and is not likely to include such in their index. In answer to the question however, you need a recursive sitemap that can be built using python for example. Manually trying to create 100s of sitemap pages would be hard to accomplish and a waste of valuable time.