If I want the whole site to be visible to the bots and spiders, do I need the robots.txt? I think the answer is "no". Thanks in advance.
You don't need a robots.txt file if you want a spider to access everything. Personally I create an empty file just to get rid of the 404 errors. Odds are the SEO people are going to say different though. On some of my sites I use the wikipedia robots.txt and it seems to work for me.
Unless you have need of blocking spider access to certain files or folders a robots.txt is not needed. However there are other ways to block access when you need to. Keep in mind that if you do use a robots.txt to disallow crawling not all spiders will 'obey'
Thanks for the advice. I've heard that blocking certain folders attracts some bots to the folders. Is this true? I certainly wouldn't think that the big, reputable organisations would do this tho
It is possible - and anything that is possible is likely The main purpose for blocking files or folders with robots.txt is just to keep Google and other SEs from indexing the pages. You wouldnt want admin folders, for example, to be indexed.
Uhhhh... yes. There is (almost always) something in your /public_html/ or /www/ folder in need of protecting from public (bot - spider) view. Karl http://fastercats.com http://market-match.info
How to robots actually work the way through your site? My limited knowledge says that they start from a home paqe and then start following all of the links until everything has been reached. Now I know that's simplistic because how does a bot know what is your home page because when I submit the URL to engines, they usually only want the root folder. So do they look at every file in the root folder and follow all of the links? If they do then I can see how they can find files that you don't want them to index (and publish). Is there a good (simple) article somewhere that gives this kind of info? The ones I've found just launch straight into telling you how to create a robt.txt file and are pretty vague on how the bots work. I suppose that not all bots work in the same way as well. So I might be OK as far as the Google bot goes but the Yahoo one might really do something that I don't want. I'm obviously concerned that the bots keep out of my content management area.
Use a blank html file if you don't want to iondex that particular folder. robots.txt is bypassed by few SEs.
just create a robot.txt file that is empty. I would advise putting one up, even if it is empty. It will save the 404 errors and creating an empty robots.txt is advised anyway by the w3c.
If you are a webmaster who watches his server error logs, then it is a good idea to have the files robots.txt and favicon.ico on your server, otherwise the error log will be filled with 404 errors for these two files, and the actual errors will be drowned out.