I wonder if anybody here has a good experience with MSN and huge sites. As far as I have seen MSN indexes pages with a direct link on the homepage or second level distance from it, but when it comes to several level, MSN just leave it. When we have a huge site - say 300K pages - we cannot simply put a link to every page at first level or second level. Do you have any suggestions to help me get a huge site indexed at MSN.
I think 2 levels wouldn't work for 1000K pages. If I put 1000 links on each page then I need 1 sitemap containing 1000 links to pages each containing 1000 links. Wouldn't you say 1000 links on each page is not a good idea?
MSN indexes pretty fast and ranks you higher initially it also indexes dynamic links better than google or yahoo. the best possible way is to make a sitemap.xml and urllist.txt to index all your links also try searching in other adsense search boxes.
Interesting thread! I have a site with 11k + pages yet the SEs seem to have only approx 150 indexed each.
Backlink to homepage is always better in global point of view. And and in this particular case it is better to get more backlinks exactly to your homepage.
So it looks like we need to get the search engines to dig deeper into our sites. How do we do that? Alot of good quality links pointing to our homepage?
That way of linking will get you ranked highly for your homepage but it may take a bit for PR to spread to your other pages and get them ranked highly. Another method of link building, deep content link building has you link building for internal pages so they can rank highly. This is impossible for every page of your large site but you could still do it to some internal pages you want to rank highly for.
It may be worth you getting some links that point to some of the pages that are deeper in your website.That way many spiders will find the link into your site and start indexing from the inside outwards will get more content indexed that is deeper. I could also be that they were just doing a light crawl and not going deep.
Structure For MSN, try to keep the structure of your site as flat as possible. MSNbot seems to skim a site and seems very reluctant to venture any deeper than the second directory. Avoid this from the start by optimizing your site's structure to be as lean, mean and flat as possible. - example.com/ -- example.com/main-category.html --- example.com/main-category-subcategory.html ---- example.com/article.html ... rather than ... - example.com/ -- example.com/main-category/ --- example.com/main-category/subcategory/ ---- example.com/main-category/subcategory/article.html Optimizing Navigation Use crumbs to provide a navigation funnel / tree where users and bots can move vertically (or horizontally as your content will all be in the same root directory). Link back to the homepage from the main category Link back to the homepage and main category the subcategory Link back to the homepage, main category and subcategory from the article Inbound Links Point plenty of links to your homepage to keep MSNbot visiting. Install a randomizer on the homepage that will generate something unique (a new article / list of new content or products etc.) so the page always has fresh content. Link to both your main categories and subcategories from the homepage. Moreover, link to prominent articles that will encourage the bot to dig deeper. Then point good solid inbound links to your category, subcategory and article pages. It is especially important to take advantage these deep links (with on-topic anchor text) as they will provide a deeper point of entry to your site for the bot. It is typically easier to get 'deeper' content indexed when the bot lands right in the thick of it. A few other things off the top of my head that might help ... Always provide a navigation crumb. Replicate and image-based navigation with a textual equivalent. Keep it fresh, if you use a blogging platform then schedule posts for release every day. Use a 301 redirect to establish the site's canonical domain. When using WordPress with clean URLs, add the following to your robots.txt to avoid confusing the bot ... User-agent: * Disallow: /?p I'm sure there's a load more, and every DP member will know this anyway. But if it helps someone then that's cool!
Um ..no.. here is HOW you index LOTS of pages whihc actually works and is much simpler than the long rambling explanation above. Get an inbound PR7+ link to the sitemap. Wait a few weeks for the entire site to index. A link will cost 50-150/month but it is the fastest way and very easy to do.