Hi, If I specify within my robots.txt file to disallow specific pages do I still need to include <meta name="robots" content="noindex,nofollow"> on each of those pages? Thanks
Not really required, as Robots.txt rules for allowing / disallowing indexing are the most important ones... However, you need to make sure to use it wisely, otherwise you might get important pages/folders deindexed
You can, but it's probably not necessary (but will definitly issure that it doesn't get indexed). You can also maybe try it if there is a page that you want removed from an index.
it's better to use robots.txt file to block the pages from getting crawled. moreover, you should always try to keep the coding as mow as possible to prevent code bloating
We are developing a portal, for that our development team has made 3 or 4 sub-folders on the same server for its backup and testing purpose. Google is considering these folders as a sub-sites and indexing all of them. Today I have disallowed all these folder or sub-sites with the help of Robots.txt file. In which I have used Following code User-agent: * Disallow: / User-agent: Googlebot Noindex: / in this way I think search engine crawlers will not index these sub-folders. We are also using Meta Tag <meta name="robots" content="index, follow" /> in site, I cant change it in subfolders for disallowing because developer does all changes in these folders, they can upload in to site. My question is I have disallowed sub-folder by robot.txt file but there is meta tag <meta name="robots" content="index, follow" /> which is saying to follow and index the content. Should I remove follow meta tags from all of them? One is saying for follow and one is disallowing it? I am totally confuse what to do.
@ gravy834 - Both the robots.txt file and the <meta name="robots" tag are used to control the indexing and caching of your website's pages. If you already stated NOT to index a page in the robots.txt file it is not necessary to do so on the page with the meta tag. However, keep in mind that not all spiders are created equal...meaning, they don't all use or follow your robots.txt directives so in my humble opinion it is still good to utilize the <meta "robots" tag even though you have explicitly stated not to index a page in your robots.txt file. Consider also, the scenario in which a spider gets to your page via a link that someone else put to it directly...will the spider index that content? (who knows for sure)...besides, it's not that much code that you should be too concerned about it's "weight" on the page. @ meri0098, if you consider what I've said above, the set-up you have seems like it could potentially cause a problem for you. I would find a way to have your directives in sync.
First of all we must put "robot.txt" at the top-level directory of our web server. And the second one When a robot looks for the "/robots.txt" file for any URL, it takes the path component from the URL (Everything from the first single slash), and puts "/robots.txt" in its place. For example, for "http://www.ABC.com/designs/index.html, it will remove the "/designs/index.html", and replace it with "/robots.txt", and will end up with "http://www.ABC.com/robots.txt". So i thing there is no need to again specify robot tag in every page coz whenever spider comes to any of the page of our website first of all it directly goes to "robot.txt" then after goes to that particular page which we request .
Well.Thanks for the info.I heard that bing is taking site info from DMOZ and not from the robot.text. So if my site is listed in DMOZ then there is no point in using robot.text.
robots.txt file to block the pages from getting crawled. furthermore, you should always try to keep the coding as mow as possible to prevent code bloating
You should remove the meta tag info (it's redundant), so these pages folders are currntly blocked by robots and no one crawler ( robots.txt compliant ) could crawl these pages or folders