I want to know how to restrict link from robots txt file, I want to restrict below url . We did not made sub domain but sina.com.cn using our URL, I have checked from domain & hosting panel there is no any file and sub domain, How to restrict and fix it. Moreover anybody know why sina.com.cn using other website url or how we can protect our website by using robots.txt file or other way. https://tool.mykidslunchbox.com.au/forgot-password.aspx http://www.sina.com.cn.mykidslunchbox.com.au/forgot-password.aspx https://www.sina.com.cn.mykidslunchbox.com.au/how-it-works.aspx https://www.sina.com.cn.mykidslunchbox.com.au/contactus.aspx http://tool.mykidslunchbox.com.au/contactus.aspx http://tool.mykidslunchbox.com.au/benefits.aspx http://tool.mykidslunchbox.com.au/
suppose i want to stop below url for indexing. is it ok Disallow: /tool.mykidslunchbox.com.au/forgot-password.aspx Disallow: /sina.com.cn.mykidslunchbox.com.au/forgot-password.aspx m i right ?
This article as it all bud, I think you're right with the above example, but for files within a whole directory you can just go: http://www.free-seo-news.com/all-about-robots-txt.htm Disallow: /tool.mykidslunchbox.com.au/somedirectory
This is really wonderful article and very helpful for me, but in this article author mention that you cant use "ALLOW" word but when i check Google.com/robots.txt file , They are using why ? check this article line. " Don't use an "Allow" command in your robots.txt file. Only mention files and directories that you don't want to be indexed. All other files will be indexed automatically if they are linked on your site."
please use Robot file with your site so that no search Engine crawling done for the next few of the week and then you need to verify so the error will not be so longer there.
"allow" is a nonstandard extension of the protocol. Please use robots.txt only to disallow crawler access. User-agent: * Disallow: / equals User-agent: * allow: / whilst "allow" is not part of the robots exclusion standard (robots.txt) I have collected a full set of example implementations here: http://rield.com/cheat-sheets/robots-exclusion-standard-protocol
jabz.biz explained ti perfectly. But you do not need to put an Allow directive in the robots.txt file. Its not a part of exclusion standard. Robots.txt has never helped webmasters in achieving good ranks. This file is used to restrict robots from crawling a part of whole of the website. Remember there are bad robots as well, which do not always follow the directives of robots.txt. In this case using robots,txt does NOT mean a security system too. It is always better to password protect the folders and directories you do not want to be crawled. Anyways this is not what you have inquired of. Please keep in mind that robots..txt has got nothing to do with in ranking on SERP. Cheers!!!
In order to use a robots.txt file, you'll need to have access to the root of your domain (if you're not sure, check with your web hoster). If you don't have access to the root of a domain, you can restrict access using the robots meta tag.