Problem Mortgage - Loans - Buy Anything On eBay - Free Ringtone - Debt Management

PDA

View Full Version : What is robots.txt?


icare
Feb 12th 2006, 12:00 pm
How do I get or create on for my site?

Please advise

Smyrl
Feb 12th 2006, 12:04 pm
Do a Google search for robots.txt tutorial. Your robots.txt file can be created with any text editor. This file spells out files that may or may not be indexed. There are many non-obedient robots out there but Google, Yahoo, and MSN all obey you robots.txt command.

These two lines allow all robots to index every page
User-agent: *
Disallow:

These two lines keep all robots out.
User-agent: *
Disallow: /

icare
Feb 12th 2006, 12:10 pm
Do a Google for robots.txt tutorial. It can be created with any text editor. Your robots.txt file spells out for obedient robots which files may be index.

These two lines allow all robots to index every page
User-agent: *
Disallow:

These two lines keep all robots out.
User-agent: *
Disallow: /


Even I fI google it it will show DP page on very top then Y not ask here, i had tried serching this on DP but couldnt find any answere which ican explain...:D

Smyrl
Feb 12th 2006, 12:14 pm
Here is number one listing in Google.

http://www.searchengineworld.com/robots/robots_tutorial.htm

Cristian Mezei
Feb 12th 2006, 12:15 pm
I have this one (http://www.searchengineworld.com/robots/robots_tutorial.htm) in my bookmarks, together with this one (http://en.wikipedia.org/wiki/Robots.txt).

It might do you good, to read them :)

dashboard
Feb 12th 2006, 7:33 pm
you can also use <meta name=robots.txt content=index,nofollow>

seoaddict
Feb 13th 2006, 2:43 am
Create .txt file. And save as robots.txt
Here you can allow and disallow crawlers.

mariush
Feb 13th 2006, 3:28 am
I've added a robots.txt file just to keep out the 404 not found errors. It annoyed me because I was seeing them in awstats.

My robots.txt is actualy:


User-agent: *
Disallow: /cgi-bin/

JEET
Feb 13th 2006, 4:01 am
you can also use <meta name=robots.txt content=index,nofollow>

That's not a ROBOTS.TXT . It's meta tags .
And neither is it right .You cannot specify a file name in meta tags .
<meta http-equiv="robots" content="index,follow" />
is the right tag for the content and links on that particular page .

Robots.txt is a simple text file which "GOOD" Crawler bots read to see which folders or files are allowed to index and which are not .
It is placed in the main host folder inside "Public_html"

User agent *
Disallow /images

will keep out search engines from your images folder .
If you want everything to be available for indexing then create an empty "robots.txt" and put it in "public_html" folder .
A blank notepad file named "robots.txt" ...

If you don't have a "public_html" folder , then probably your host already has a robots.txt and you need not do anything . Your site is a folder inside "his public_html" which already has a robots.txt .

But if you are getting a 404 not found error for robots.txt , then ask your host if he has that file . If no , then ask him to put one .

This is what I have noticed from my logs .
Hope that's right .

Regards
Jeet

lionstarr
Feb 21st 2006, 9:27 am
That's not a ROBOTS.TXT . It's meta tags .
And neither is it right .You cannot specify a file name in meta tags .
<meta http-equiv="robots" content="index,follow" />
is the right tag for the content and links on that particular page .

Robots.txt is a simple text file which "GOOD" Crawler bots read to see which folders or files are allowed to index and which are not .
It is placed in the main host folder inside "Public_html"

User agent *
Disallow /images

will keep out search engines from your images folder .
If you want everything to be available for indexing then create an empty "robots.txt" and put it in "public_html" folder .
A blank notepad file named "robots.txt" ...

If you don't have a "public_html" folder , then probably your host already has a robots.txt and you need not do anything . Your site is a folder inside "his public_html" which already has a robots.txt .

But if you are getting a 404 not found error for robots.txt , then ask your host if he has that file . If no , then ask him to put one .

This is what I have noticed from my logs .
Hope that's right .

Regards
Jeet

I know it as
<meta name="robots" content="index, follow">
You can say index - noindex in the first place: Allow search engines to index your site or don't.
Then you can say follow or nofollow, to disallow Search Engines giving away your PageRank :-)
greetings,
lionstarr

minstrel
Feb 22nd 2006, 8:48 pm
lionstarr, the meta tag you mention is not as good a solution as robots.txt for most websites:

1. it has to be used on a page by page basis, i.e., for spiders that read and honor that meta tag, it only applies to the page that contains it

2. it does not have the capability for excluding specific spiders or entire directories

The only time one normally would use the meta tag is if you are on free hosting that won't allow you to place a robots.txt file in the root directory.

lionstarr
Feb 23rd 2006, 8:01 am
Of course it's not as good as a robots.txt!
I only saw JEET Posting about <meta http_equiv and thought I tell you that I know it as <meta name="robots"> - maybe I'm wrong and I learn something or he's wrong and learns something!