I am curious, if it's necessary to have robots.txt in your root directory. Let’s say I don't want to disallow anything, I don't have any hidden places Currently I have that in my robots.txt User-agent: * Basically I am telling the crawlers to crawl my whole site. Isn’t considered time wasting? Does it help to have that in the root directory? Because on of my colleague, says, it's not important at all, because there are lots of chances to make an error, thus you could loose all indexed pages from SEPRs. In my opinion, when SE spiders come to any site, the first place the go is to robots.txt Another thing, we all know that spiders in the future will be able to crawl CSS and JavaScipts external files. Would it be correct to place these to prohibit crawling the files? Disallow: /scripts.js Disallow: /styles.css So having User-agent: * in your robots.txt or <meta name="robots" content="index,follow"> are just waste of time? What do you think?
i agree with you, i dont even have robot.txt file in any of my websites but they r good on SERPs and indexed well
You're absolutely right, sites can do very well in the SERPs without a robots.txt. Having a robots.txt - even a very basic one - does stop your error logs being filled with messages like this: [error] [client 72.30.252.152] File does not exist: /home2/you/public_html/robots.txt (This is an error log from Yahoo's Inktomi Bot looking for a robots.txt and not finding it.)
I dont think robot.txt is absolutely necessary, but its an addon if you have it..Its really simple to put up one..
It is good to block bad bots though which can cause problems. One question, for example on a forum and there is the admin files folder, would you block that from the spiders?
Wel, I think I block user/bin/ directory. I dont have much ideas on this unix thing. A guy told me to block that, that y i did...
"Well if you have an admin then its important to have so they do not crawl the admin." Yea and if bad minders (hackers) peek into the robots.txt file (Which most do), they can see what you didn`t wanted them to see... So password protecting the DIR is better instead of placing a robots.txt, yea but both of them can give more and better increased security.
Well I use robots.txt, even a basic one, to stop those annoying logs from appearing. Seems to be that the spiders will always look for the robots.txt file first and if they don't, an error ensues. And I have a lot of spiders sniffing at my sites. - MENJ
A missing robots.txt will not prevent your pages from being indexed. The only thing you will notice with a missing robots.txt is a 404 error in the log files. If you have a robots.txt you can however block crawlers from spidering directories marked as "disallow" in your robots.txt file. In other words, a robots.txt file is used to "disallow" crawlers from indexing your pages but is not needed to "allow". I use robots.txt to prevent the crawlers from indexing private directories and member directories. It can also be done in the html page itseld with a META Robots tag.
I'm glad I found this thread. I was wondering if robot txt is important. If I understood, I shouldn't bother with that?
So using I use robots.txt to prevent the crawlers from indexing private directories and member directories. It can also be done in the html page itseld with a META Robots tag. Means you not to need to use a text file?
One or the other is okay. It's just an option AFAIK. However, if you use robots.txt then you can use some features such as wildcards (for some of the SE's bots, Google included). If you use the meta tag, you can dynamically generate pages without having to edit robots.txt. Also, I'm not sure, but... I think that a bot actually has to download the file before it can get at the meta tag. If you don't need to block bots at all, I figure having a robots.txt just wastes bandwidth? Robots.txt is USEFUL for disallowing/de-indexing pages such as login or useless pages that suck the PR out of your site.
Robots.txt : I think if it helps Search Engines to block something from my site, its useless for me.... I want search engines to understand and spider each and every corner of my site.
What is the point? The more it indexs, the more traffic you will get. I have never used robots.txt once, and i removed it after a month. Robots.txt isnt imporrtant, nor do i use it. Rob
It's just a way of regulating spider access to your site. It can actually save you bandwidth! For example, you might want to block certain bots, like some known email harvesters that are actually well-behaved, and honour robots.txt. In this case, it's a huge saving on bandwidth to have them only download your little robots.txt file rather than crawl your entire site. In the case of running an SE, you might not want your result pages to be crawled, so you could exclude these should people link to them. Perhaps you might also want to exclude images folders, etc. It's useful, but not mandatory. If you're not worried about bandwidth, nor what spider visits where, then you may omit it. But if you work out a good one for your site, you can save a considerable amount of bandwidth!