I created a basic robots.txt file. Do I need to also include <meta name="ROBOTS" content="ALL"> in the head tags of my html pages or is just having the robots.txt in my root directory is enough. Thanks in advance.
Robots.txt file sufficient. Actually if you had neither the robots.txt or the meta tag mentioned above entire site would be available for indexing. I use robots.txt more to disallow certain folders for compliant robots. Rogues will not honor robots.txt. Shannon
It's only necessary to have the Meta Title, Keyword, and Description Tags, the rest are not required.
Thank you, both. I hope having the file will be conducive to the big 3 indexing more of my pages. I understand that MSN likes robots.txt files.
Getting well indexed in the SE's is just a matter of getting more links to your site. The robots.txt is typically used to tell the SE's NOT to index parts of your site.
Actually, it's only NECESSARY to have the title tag, the meta description is optional (the SE's will skim your page and pull a description for you) but highly recommended you have your own. The keyword is usually a waste of time, but I still use it anyways for good form. The robots.txt is especially important for MSN, even if you upload a blank one. Also, for awstats because hits on that file is one of the ways it recognizes bot hits.
To clarify, when I stated it were necessary, I really meant it's advised to include them. As for robots.txt and MSN, I havent had any problems with any sites on MSN when excluding the file.
I'd like to add one thing here..... be aware of websites while link exchanging as some web masters use robots.txt file to dis-allow the Google spiders to crawl their link pages..your link will be of no use then...
Hadn't thought of that - it's a good point. Unlike "nofollow," which can be detected, yes, how would you know what's in their robots.txt?
just look at the file... theirdomain.com/robots.txt Without checking the source, How do you detect a nofollow?
Excellent! In most cases I use robots.txt more to disallow certain folders for robots. MSNBot complies with the standards for robots.txt.
Yeah, just saw this today: put robots.txt and then look for the disallow on the directory or page you are linked on: User-agent: Googlebot Disallow: /TheDirectoryorpageyou'reon/ Thanks.
Why would it be more important for MSN than for Google or Yahoo ? If robots.txt is not present, they all will understand that they are permitted to visit all pages. When the file is not present, AWStats sees the requests for the non-existing robots.txt file. These requests allow AWStats to recognize these bots. Jean-Luc
This is not the way to identify bots, all it will do it fill up your 404 error section. The SE's dont download the robots.txt every visit as this would be a waste of resources and would nullify your idea.
I agree that it is not the best way to identify bots and it should certainly not be the only way to do it, but it is used by AWStats and other stats software to discover new bots. AWStats reports them as Unknown robot (identified by hit on 'robots.txt') Code (markup): Jean-Luc
No, that simply means that the bot does not have an official listing as a verified source. It still knows it is a bot, just doesn't know the name. Could be many reasons for this, the robots.txt file has nothing to do with it. I get these with sites I have the file on, and ones I do not. Can you provide documenation on this theory? I would be interested to read more about it.
A line, titled "Unknown robot (identified by hit on 'robots.txt')", appears in the AWStats list of "Robots/Spiders visitors". That seems pretty clear to me. Jean-Luc
opposed to have hundreds of 404 errors on the file? This may just be due to the fact that the stats programs aren't going to waste overhead by listing out ever little bot that stops by. Most likely just list the larger sources.