can robots txt be used to prevent the link juice going to some pages, but still allow the pages to be indexed? if so how do i do this? thanks
Great Question Fuzz. I asked this on another thread but never saw a response. Hopefully, someone can answer this.
if you want to some page index but not follow and you can use the below tag in pages as <meta name="robots" content="index,nofollow"> Need not to use robots txt.
To prevent it from going to "some" pages you should use the rel=nofollow attribute. Below is an example of how to insert it. <a href="http://www.abc.com" rel="nofollow">ABC</a> http://en.wikipedia.org/wiki/Nofollow http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html
Robots file will block it !! I will suggest you to use nofollow links .. this will prevent the juice .. but unfortunately as far as i know it will not index the URL either ...
Lads, I think you are missing the point of the original question. The OP talks about the robots.txt file, not about the robots meta-tag nor about the relationship-attribute. Returning to the original question: In my opinion, the robots.txt file is used to exclude certain directories from being crawled by search-engine spiders. If these directories are not crawled, they will not be indexed either. And if these directories are not indexed, the links pointing to files in these directories won't be followed, hence link juice won't flow to these files, despite that the robots meta-tag and the relationship attribute may allow hyperlinks to be followed. Can anyone confirm/deny this?
lawn - thas exactly what i mean. See my issue is, on some pages i have no access to the meta area, so that would mean its out the question. I want to be able to SE's not to view a particular page, therefore not to index it and subsequently not parse link juice that way. At the moment i have unessary pages getting juice which they basically dont need. And i wanna stop that!!! Gonna put my super robot.txt superman like outfit on and stop it happenin .... (imagination getting carried away )
Those are two different situations, Fuzzbuzz: Use the following robots.txt file for this situation: User-agent: * Disallow: /your-inaccessible-directory/ Here's a useful tutorial to creating a robots.txt file. If you want a certain webpage to parse link-juice to all linked-to files, except a few, then use the following code in the referring file: <meta name="robots" content="...,follow"> (put this in the header) <a href="http://www.yourwebsite.com/your-unnecessary-page.htm" rel="follow">Your anchor text goes here</a> (parses link-juice) <a href="http://www.yourwebsite.com/your-unnecessary-page.htm" rel="nofollow">Your anchor text goes here</a> (does not parse link-juice) Note that in this situation, it does not matter if your let the file be indexed or not.
hi lawn, thanks for the reply if i use robots.txt and your example, wouldnt that just block out any page within that directory? If i wanted it to be page specific, can i name the actual page name and file extension? I have some files within directories that should be indexed and some which shouldnt. Thanks
This is why the robots.txt file ISN'T your solution. Unless you block each specific page directly. Use the nofollow attribute.
There's a nice article about the use of the robots meta-tag and the robots.txt file on the Official Google Webmaster Central Blog, which I suggest you to read.