Hi All, I have a website xyz.com and containing xyz.com/abc/. Now i want to remove xyz.com/abc/ from google using GWT removal option. Can any one suggest me how to remove the entire category from Google SERP.
I did the same thing and then using GWT, I have submitted for removal request. In this category i have 4lakhs pages so i have to submit the individual page for removal request or is any way to remove the entire category. Can any one Suggest me.
Just submit xyz.com/abc/ in GWT removal option and it will remove the diretory and all of its internal pages. Make sure to block the directory using robots.txt before submitting the request
Manish, I did same this first thing. Using robots.txt file i have block User-agent: * Disallow: /abc/ and then submitted to google via GWT - Site configuration - Crawler access - Removal URL but in GWT /abc/ treating as web page. So, How to remove the /abc/ for Google SERP.
The easiest way is to simply make sure that all of the pages you want removed contain a <meta name="robots" content="noindex"> element in the head. This tells Google NEVER show my URL in the SERPs and if you have it indexed remove it. It's really the only way to guarantee Google will never show your URL in the SERPs. The second best option is to first put something in place to prevent Google from re-indexing the page like adding the following to your robots.txt: User-agents: * Disallow: /abc and then login to Google's WMT... Go to Site Configuration -> Crawler Access and click on NEW REMOVAL REQUEST button. Then enter "/abc" in the text box and press enter. Then when the page refreshes select REMOVE DIRECTORY from the dropdown. And press the SUBMIT REQUEST button. This does NOT, however, guarantee Google will never display your URLs in the SERPs. Even though they cannot crawl it any more because of the Disallow in the robots.txt to re-index it, Google can STILL display the URL in the results if enough sites link to that URL and they can infer from the link text used in those links that the URL is relevant to the search. Learn more about how to prevent google indexing.
If i am correct, he wants to remove the all pages under the category /abc/ and as per your robots.txt instruction User-agents: * Disallow: /abc Crawlers will block only abc page (if there is any), this code won't block the internal pages of this directory like /abc/pqr.html
I am not sure how did you do that and how can Google considers a directory as a page. Did you select 'remove directory' from the drop down as a reason?
Here is the solution for you from Google webmaster tool.. If you own the site To request removal of the outdated cached version of the page from search results: 1. Verify your ownership of the site in Webmaster Tools. 2. On the Webmaster Tools home page, click the site you want. 3. On the Dashboard, click Site configuration in the left-hand navigation. 4. Click Crawler access, and then click Remove URL. 5. Click New removal request. 6. Type the URL of the page you want removed, and then click Continue. Note that the URL is case-sensitive—you will need to submit the URL using exactly the same characters and the same capitalization that the site uses. How to find the right URL. 7. Select Remove page from cache only. 8. Select the checkbox to confirm that you have completed the requirements listed in this article, and then click Submit Request. Source- http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=164734
Manish, Disallow: /abc should do exactly what they want. It disallows the /abc/ folder AND every URL where the path begins with /abc just like Disallow: / disallows every path from the root down as you can see on the de facto official robots reference by Google's Webmaster Help. You can think of ALL disallow statements as having a wildcard (*) on the end... So Disallow: /abc can be thought of as Disallow: /abc*. As long as the /abc folder is the ONLY file or folder within the root that begins with "abc" then Disallow: /abc will work to disallow the folder /abc/ AND all sub-folders and files contained in the folder and sub-folders. However, if there is an /abcdef folder or /abcd.html or /abc.html file that you do NOT want excluded then they would need to use Disallow: /abc/. This will disallow the folder when it it is referenced as /abc/ AND all sub-folders and files contained in the folder and sub-folders while NOT blocking the /abcdef folder or /abcd.html or /abc.html. However the /abc folder can still be indexed when it is referenced as "/abc" because it is missing the trailing and considered by crawlers to a FILE named abc in the root folder. This is why it is important IMO to implement 301 redirect to always refer to folders containing default documents (index.html, default.aspx, etc.) using the folder name with a trailing '/'. So 301 redirect requests for /folder to /folder/ and 301 redirect requests for /folder/sub-folder to /folder/sub-folder/. If that redirect is in place then Disallow: /abc/ WOULD also disallow crawling links to /abc because the file server will redirect their request to /abc/ which will be blocked.
Don't tell me you have ever tried robots.txt. Tell me i have a website with a page /company and a directory /company/. how would you block only directory. According to you, you would use following instructions: User-agent: * Disallow: /company am I correct? Now tell me, wudn't this code also block my company page?
Disallow: /company will block BOTH the /company page AND the /company/ folder. In that case, you would want to use Disallow: /company/ which will NOT block the /company page. It will, however, block the /company/ folder and all sub-folders and files beneath /company/ folder. This is exactly what I was trying to explain above.