Hello, I do not want google spiders to crawl the specific directory, subdirectory or files of my website. Can you please tell how it can be done and where it should be done. Please explain in detail. Thanks in advance. Regards
Create a robots.txt file and upload it on the root of your server, use disallow commands to instruct the bots to stay away from your specified folders and files. There is no point re-inventing the wheel here when there is an excellent resource on this, go to http://www.robotstxt.org/robotstxt.html. Good luck!
Perfect! I always seem to forget about the robots.txt file when I create a website. Bookmarking now. Thanks!
Robots.txt CAN be used to prevent certain directories, sub-directories and files from being crawled but it does NOT guarantee that Google will not show those pages in their SERPs. If those pages have inbound links to them from other sites, Google can STILL show them in the SERPs even without crawling them. They can infer from the link text of the inbound links whether that page might be relevant to a particular search query. Robots.txt also will NOT cause Google to remove those blocked/disallowed pages from their index if they are already indexed. You'll need to use the URL removal tool in Google's Webmaster Tools to remove them AFTER you have the robots.txt disallows in place. If you want to guarantee that the pages will never be shown in the SERPs then you should use a <meta name="robots" content="noindex"> element in the <head> of the pages you don't want to show up. This will not only keep them from showing the page in the SERPs, but if the pages are already in their index, it will cause them to remove them from their index. Learn more about how to prevent Google indexing.
1. Using Disallow: in your robots.txt and putting it under your site root directory. 2. Setting your noindex, nofollow meta tags in your page files. Have a nice day,
I have one doubt if you people can help me would be very thank full. I add a link a dynamic link which I saw in google when I use site: to check my link. I found some dynamic link which is no more exist in my site. I tried through two ways that is included in robot.txt and requested for removal in webmaster tool. But I found and error in webmaster tool that the link is denied to remove.
Dear member, It's easy to avoid your folder, pages and files by using the robots.txt file in your root, just define in the user agent, which pages you don't want to crawl by the search engines, in the user agent module make that pages disallow and that would not be crawled by the search engines. Like as:- (for example) User-Agent: * Disallow: /*_V Disallow: /*barpID Disallow: /resources2.do Disallow: /resources1.do Disallow: /*&pID Disallow: /*Cause Disallow: /*shop.do?cID=1962 Disallow: /*shop.do?cID=1966 Than you will be able to avoid from being crawled.
I have a web site setup with 'robots.txt' file in use. My only ERROR pages come from PHP files on my site. How do I setup the robot txt file to 'exclude' all my php files without having to list EACH and EVERY page with the disallow code?
you might be able to do this with * wildcards in your disallow commands, check this thread on wmw http://www.webmasterworld.com/forum93/622.htm, it might point you in the right direction.
Ya this is the way to avoid your problem, you are facing use the wildcard sign * in your disallow command and for any particular pages or whole I have told you already.
User-Agent: * Disallow: /the folder/file you want blocked Disallow: /the 2nd file/folder you want blocked Compile this into a robots.txt file placed at the root of your site, in this format.