Web Games - Bad Credit Credit Cards - Mobile Phones - Flights - Mortgage

PDA

View Full Version : Robots.txt question


Moneyfolk
Jan 12th 2006, 6:23 pm
I created a basic robots.txt file. Do I need to also include

<meta name="ROBOTS" content="ALL"> in the head tags of my html pages or is just having the robots.txt in my root directory is enough.

Thanks in advance.

Smyrl
Jan 12th 2006, 6:47 pm
Robots.txt file sufficient. Actually if you had neither the robots.txt or the meta tag mentioned above entire site would be available for indexing. I use robots.txt more to disallow certain folders for compliant robots. Rogues will not honor robots.txt.

Shannon

dcristo
Jan 12th 2006, 6:48 pm
I created a basic robots.txt file. Do I need to also include

<meta name="ROBOTS" content="ALL"> in the head tags of my html pages or is just having the robots.txt in my root directory is enough.

Thanks in advance.

It's only necessary to have the Meta Title, Keyword, and Description Tags, the rest are not required.

Moneyfolk
Jan 12th 2006, 7:00 pm
Thank you, both. I hope having the file will be conducive to the big 3 indexing more of my pages. I understand that MSN likes robots.txt files.

dcristo
Jan 12th 2006, 7:04 pm
Thank you, both. I hope having the file will be conducive to the big 3 indexing more of my pages. I understand that MSN likes robots.txt files.

Getting well indexed in the SE's is just a matter of getting more links to your site. The robots.txt is typically used to tell the SE's NOT to index parts of your site.

mdvaldosta
Jan 12th 2006, 7:31 pm
It's only necessary to have the Meta Title, Keyword, and Description Tags, the rest are not required.

Actually, it's only NECESSARY to have the title tag, the meta description is optional (the SE's will skim your page and pull a description for you) but highly recommended you have your own. The keyword is usually a waste of time, but I still use it anyways for good form.

The robots.txt is especially important for MSN, even if you upload a blank one. Also, for awstats because hits on that file is one of the ways it recognizes bot hits.

dcristo
Jan 12th 2006, 7:55 pm
To clarify, when I stated it were necessary, I really meant it's advised to include them.

As for robots.txt and MSN, I havent had any problems with any sites on MSN when excluding the file.

seo_expert
Jan 12th 2006, 10:56 pm
I'd like to add one thing here.....

be aware of websites while link exchanging as some web masters use robots.txt file to dis-allow the Google spiders to crawl their link pages..your link will be of no use then...

Moneyfolk
Jan 13th 2006, 6:32 am
So should you check the robots.txt file of websites that you want to link to?

northpointaiki
Jan 13th 2006, 6:36 am
I'd like to add one thing here.....

be aware of websites while link exchanging as some web masters use robots.txt file to dis-allow the Google spiders to crawl their link pages..your link will be of no use then...

Hadn't thought of that - it's a good point. Unlike "nofollow," which can be detected, yes, how would you know what's in their robots.txt?

BILZ
Jan 13th 2006, 8:12 am
just look at the file... theirdomain.com/robots.txt

Without checking the source, How do you detect a nofollow?

maldives
Jan 13th 2006, 8:23 am
Actually, it's only NECESSARY to have the title tag, the meta description is optional (the SE's will skim your page and pull a description for you) but highly recommended you have your own. The keyword is usually a waste of time, but I still use it anyways for good form.

The robots.txt is especially important for MSN, even if you upload a blank one. Also, for awstats because hits on that file is one of the ways it recognizes bot hits.

Excellent! In most cases I use robots.txt more to disallow certain folders for robots. MSNBot complies with the standards for robots.txt.

Moneyfolk
Jan 13th 2006, 10:38 am
You can to Sitename/robots.txt so for www.w3.org its:

http://www.w3.org/robots.txt

northpointaiki
Jan 13th 2006, 1:26 pm
Yeah, just saw this today: put robots.txt and then look for the disallow on the directory or page you are linked on:

User-agent: Googlebot
Disallow: /TheDirectoryorpageyou'reon/

Thanks.

Jean-Luc
Jan 13th 2006, 1:34 pm
The robots.txt is especially important for MSN, even if you upload a blank one.Why would it be more important for MSN than for Google or Yahoo ? If robots.txt is not present, they all will understand that they are permitted to visit all pages.

Also, for awstats because hits on that file is one of the ways it recognizes bot hits.When the file is not present, AWStats sees the requests for the non-existing robots.txt file. These requests allow AWStats to recognize these bots.

Jean-Luc

ServerUnion
Jan 13th 2006, 1:38 pm
When the file is not present, AWStats sees the requests for the non-existing robots.txt file. These requests allow AWStats to recognize these bots.


This is not the way to identify bots, all it will do it fill up your 404 error section. The SE's dont download the robots.txt every visit as this would be a waste of resources and would nullify your idea.

Jean-Luc
Jan 13th 2006, 1:52 pm
This is not the way to identify bots, all it will do it fill up your 404 error section.I agree that it is not the best way to identify bots and it should certainly not be the only way to do it, but it is used by AWStats and other stats software to discover new bots. AWStats reports them as Unknown robot (identified by hit on 'robots.txt')

Jean-Luc

ServerUnion
Jan 13th 2006, 1:58 pm
No, that simply means that the bot does not have an official listing as a verified source. It still knows it is a bot, just doesn't know the name. Could be many reasons for this, the robots.txt file has nothing to do with it.

I get these with sites I have the file on, and ones I do not. Can you provide documenation on this theory? I would be interested to read more about it.

Jean-Luc
Jan 13th 2006, 2:36 pm
A line, titled "Unknown robot (identified by hit on 'robots.txt')", appears in the AWStats list of "Robots/Spiders visitors". That seems pretty clear to me.

Jean-Luc

ServerUnion
Jan 13th 2006, 2:45 pm
opposed to have hundreds of 404 errors on the file?

This may just be due to the fact that the stats programs aren't going to waste overhead by listing out ever little bot that stops by. Most likely just list the larger sources.