I had a static website (45 Pages) a week ago. I redeveloped it in PHP + MySQL and it is now a dynamic website with around 11000 pages. I have around 400 categories and 3500 products wherein each product is available in many categories and accordingly generates a URL. Hence 11000 URLs. My Questions are: a. My Robots.txt has around 7500 URLs & a size of 650 KB. Is this acceptable? What is the limit (URLs or Size)? I had to do that because I did not want to face dupe issues with Google. Though I have added the meta noindex tag too to all these pages. Will this work as good as the robots.txt? b. I deleted all my previously indexed 45 pages (they were not sending me any traffic anyways so I did not worry). Did I do something wrong? since these URLs show up in "Errors for URLs in Sitemaps", "Not found", "URLs not followed" in the Webmasters Tool. I used the URL removal tool to remove them but apparently they are still present there as I can still see them in search results and as also Webmasters Tool is reporting them! c. I tried removing the /sitemap.html file also as I deleted that too from the root directory but the Removal Tool reports it as a Denied action. d. I have submitted & re-submitted the Sitemap but even when the Webmaster Tool shows it as "successfully crawled" I dont see the pages indexed yet! e. MOST IMPORTANT - The "Analyze Robots.txt" Tool shows me "Googlebot is blocked from http://www.mywebsite.com/" ! What could be the possible reason for this. Other than URLs, I am using the matching pattern - Disallow: /*?. Please help me as I am really worried about all these problems. I have spent many $$$ for this website to be developed but it seems that I did something terribly wrong. Thanks in advance. My website is www(dot)sameday-flowerdelivery(dot)com
hi, there is a option in google webmastertools to block the old urls.so that u can avoid using the robots.txt for top landing pages.
wow, lots of questions here. You aren't even close to the max for number of urls so no worries. Give Webmaster tools time to catch up with you, like a week or so. what are you trying to block here? Disallow: /*?. Meta should work just as good as robots Cheers,