hi evry body i have really problem with google crawling my forum i was have sitemap containe 40425 url in it only 2573 indexed but the sitemap really have some problem in it like : -------- Errors for URLs in Sitemaps 37 HTTP errors 0 Not found 88 URLs not followed 0 URLs restricted by robots.txt 0 Unreachable URLs 14 -------- i wasn't have robots.txt then i made one for my site : http://www.a87a.com/robots.txt then the errors day after day decreased but one of my friend adviced me to remove the old sitemaps and generate new one for my web site with vbseo and disabled the archive from google spiders when i did it evry thing gone wrong Errors for URLs in Sitemaps 67 HTTP errors 1 Not found 93 URLs not followed 0 URLs restricted by robots.txt 228 Unreachable URLs 14 what can i do for these errors ---------- another thing how can i avoid google spiders from crawling : user names and moderators forum if the moderator forum link like this : http://a87a.com/vb/f7.html plz help ---------
I am trying how to learn the robots / spiders / sitemaps game as well. I tried following the steps in google webmaster tools but I never got very far. I hope this helps you more than it does me, good luck http://www.google.com/webmasters/start/
There are certain things that will help you go a long way. 1. Check from Google Analytics the pages visited by your visitors. 2. Check from Xenu the Site Analysis ( this way you would understand which of the pages in your site is having errors) 3. Try to figure out the problems in terms of link wrong pages, url or page names being called. 4. Implement a robots.txt file to stop bots from crawling pages you don't want 5. Implement a sitemap(xml version) but i do believe that you have some thing like sitemap1.xml and sitemap2.xml if you have so many pages. I think this help you a lot.
thanks for the comments thank you arnabme and evry body i tried to do like thins in the robort.txt file Disallow: /~vb/f7.html i hope it's work thank you
You can "noindex" all those URLs you don't want sitemap generators / Google to crawl. At least A1 Sitemap Generator will obey noindex, robots.txt etc. That might help you getting Google to only crawl your important pages. Regarding errors in sitemaps... Are you sure those are 404s in the XML sitemaps and not just Google reporting 404 error from an URL it has followed from another source (e.g. link from some other website?)