I've been made aware that a site I'm linking to uses links.php to build link index pages but has an interesting robots.txt entry User-agent: * Disallow: /links As I'm not an expert on robots.txt I'm not sure if this is just stupid or actually prevents crawling of mydomain.com/links.php etc.... Do I shoot at these "*******" or is it just plain ignorance (mine or theirs)? Thanks and cheers M
They are cheating... List the site on a blacklist (Put their name on it too if you know it)... Guys like this piss me off.
Yep I reckon they are cheating. The page won't be getting indexed which means you won't be getting any PR value from their links page. He's obviously just trying to get a whack of inbound links without giving anything back.
I've never had to do it either... They aren't easy to find (Or so it appears)... If I find some I'll let you know...
Is the links.php page under the links directory? If not then I don't see any problem. They are cheating if: 1. The links.php file is under the links directory. Or, 2. They have Disallow: /links.php in the robots.txt file. 3. They have the robots meta tag (with nofollow) on the links.php page.
well w3 is not clear about this... to ban a directory it should be (according to the directives) disallow: /links/ disallow: /links looks like working as none of the generated pages nor the page links.php is indexed from this site although the site has overall acceptable pr content has been out for quite a while etc. M
now I was intruigued I read up on the original convention and protocoll for robots exclusion and here it is the litle known dirty trick !!! Disallow The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved. For example, Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html. so the lines I found disallow: /links will disallow anything in root where the url starts with links !!!! (links.php linksandmore.htm etc.... These *astards M PS the original can be found here http://www.robotstxt.org/wc/norobots.html
I never thought about using the robots.txt file to exclude a links page or directory. Whenever I consider a link exchange, I just check to see if the page where my link would be is in the Google cache. If so, then I know it's indexed and G knows is there and I'll get credit for the link. If not, I delete the link exchange request and fuggedaboutit
someone use to visit here who had a list going I thought. I would give it a day to see if someone remembers that list.
Hello Greenhorn http://www.google.com/help/features.html#cached is what yuo need to read. Or get the Google Toolbar and enable the little blue I(nformation pull-down button).
well I've send you a pm as I have no intention to give such sites even a mention without a link on a well regarded forum. Cheers M