View Full Version : Do I Need A Robots.txt?
rayqsl
Oct 13th 2007, 2:35 am
If I want the whole site to be visible to the bots and spiders, do I need the robots.txt? I think the answer is "no".
Thanks in advance.
chickens
Oct 13th 2007, 3:55 am
You don't need a robots.txt file if you want a spider to access everything. Personally I create an empty file just to get rid of the 404 errors.
Odds are the SEO people are going to say different though. On some of my sites I use the wikipedia robots.txt and it seems to work for me.
kentuckyslone
Oct 13th 2007, 3:57 am
Unless you have need of blocking spider access to certain files or folders a robots.txt is not needed. However there are other ways to block access when you need to.
Keep in mind that if you do use a robots.txt to disallow crawling not all spiders will 'obey'
rayqsl
Oct 13th 2007, 4:33 pm
Thanks for the advice. I've heard that blocking certain folders attracts some bots to the folders. Is this true? I certainly wouldn't think that the big, reputable organisations would do this tho
kentuckyslone
Oct 13th 2007, 4:36 pm
Thanks for the advice. I've heard that blocking certain folders attracts some bots to the folders. Is this true? I certainly wouldn't think that the big, reputable organisations would do this tho
It is possible - and anything that is possible is likely
The main purpose for blocking files or folders with robots.txt is just to keep Google and other SEs from indexing the pages. You wouldnt want admin folders, for example, to be indexed.
rayqsl
Oct 13th 2007, 11:36 pm
That's what I thought. Thanks for your help k
countZZero
Oct 25th 2007, 5:04 pm
Uhhhh... yes. There is (almost always) something in your /public_html/ or /www/ folder in need of protecting from public (bot - spider) view.
Karl
http://fastercats.com
http://market-match.info
rayqsl
Oct 26th 2007, 2:45 pm
How to robots actually work the way through your site?
My limited knowledge says that they start from a home paqe and then start following all of the links until everything has been reached.
Now I know that's simplistic because how does a bot know what is your home page because when I submit the URL to engines, they usually only want the root folder.
So do they look at every file in the root folder and follow all of the links? If they do then I can see how they can find files that you don't want them to index (and publish).
Is there a good (simple) article somewhere that gives this kind of info? The ones I've found just launch straight into telling you how to create a robt.txt file and are pretty vague on how the bots work.
I suppose that not all bots work in the same way as well. So I might be OK as far as the Google bot goes but the Yahoo one might really do something that I don't want.
I'm obviously concerned that the bots keep out of my content management area.
:)
inworx
Oct 27th 2007, 3:30 am
Use a blank html file if you don't want to iondex that particular folder.
robots.txt is bypassed by few SEs.
webrepair
Oct 29th 2007, 7:54 am
just create a robot.txt file that is empty. I would advise putting one up, even if it is empty. It will save the 404 errors and creating an empty robots.txt is advised anyway by the w3c.
rayqsl
Nov 2nd 2007, 3:09 am
Thanks again for your words of wisdom
Kuldeep1952
Nov 3rd 2007, 12:34 am
If you are a webmaster who watches his server error logs,
then it is a good idea to have the files robots.txt and
favicon.ico on your server, otherwise the error log
will be filled with 404 errors for these two files, and the
actual errors will be drowned out.
vBulletin® v3.6.8, Copyright ©2000-2008, Jelsoft Enterprises Ltd.