At first, i will thank you for this great community + support. This is my first issue i'd like to discuss with you. 2 months ago, google was indexing the community threads after 1/2 hours. 18500 urls in sitemap, 18400 indexed. So lets say, no problems. But the last month google was not indexing the new posted threads anymore, not even after a week. So i decided to rebuild my sitemap, and resubmit it on webmaster tools. After a week of resubmit, google indexed just 30% of the 18400 urls. And i can see in search results that Google is deleting a lot of posts. And when we create a new thread, it doesnt appear anymore on Google. I dont understand it, nothing changed, there is still the same cms and url structure website: http://forum.rkempo.nl/
Have you confirmed that you have 0 sitemap errors and 0 crawl errors ? These errors can keep your site from being crawled and eventually indexed.
Thank you for your reply Braulio. I have 300 crawl errors, 403 access errors (caused by rules in robots.txt). But i always have these errors. and 0 sitemap errors
You need to drill down to the root cause of the errors and eliminate the root cause. Please post some of the most common errors so we may help you.
Ok thank you. Right now, i have 2 kinds of crawl errors. 1) Lot of Access denied errors (response code 403) for all profile urls like: http://forum.rkempo.nl/members/samir.17/ - I dont have these profile errors in sitemap & I blocked view access for guests (from xenforo cpanel). 2) not found attachments errors like: http://forum.rkempo.nl/attachments/1755/ this is my robots.txt file User-agent: * Disallow: /find-new/ Disallow: /account/ Disallow: /attachments/ Disallow: /goto/ Disallow: /posts/ Disallow: /login/ Disallow: /admin.php Allow: / Sitemap: http://forum.rkempo.nl/sitemap.php Code (markup): I have this forum in a sub directory, and this robots.txt is inside this map. Do i have to move it to the root? Today i received a message from Google: Google detected a significant increase in the number of URLs we were blocked from crawling due to authorization permission errors.
Generally speaking, blocking Google from URLs you do not want indexed is a good idea, but will of course result in fewer URLs indexed. You may also look into if you have duplicate URLs and then use a mix of canonical and robots.txt to solve that as well. That way, Google will only spend time crawling your real content URLs meaning you may also get more fresh content crawled faster.