I've been doing that. I'll take a look at the ones you mentioned. Heck, I have to enable Xenu every time I use it on myself.
Ok i would like get some more knowledge about this robots.txt so i would like to ask a couple of questions. If i would want to have the regular bots visiting and indexing my site such as Google and yahoo and such and not wanting the bad bots to visit my site and have them access denied i would have to put in this type of syntaxes right? I know real bad bots will ignore the robots.txt but everything i can manage to block out with only a simple installment of an .text file is well worth it. What could it hurt right? (or maybe i am missing something here) Ok here goes: User-agent: * Disallow: Disallow: / User-agent: Alexibot Disallow: / User-agent: Alexibot Disallow: / User-agent: Alexibot I used Alexibot as an sample for the example. But would this mean that all robots are granted access accept for the ones in with Disallow: / Alexibot ?? I know there are a lot of bots that will ignore the robots.txt file but every bot that is on the list that can be stopped is one that bites the dust and saving me bandwith I will also look into the botsense beta service that yeah i know is also not fail proof but everything helps and it will improve in time even more as they say, hopefully
No. First, you have the order of lines reversed - the useragent comes first before the disallow: User-agent: Alexibot Disallow: / User-agent: * Disallow: Code (markup): Second, repeating the name of the user-agent won't help - once is enough. Third, construct your robots.txt file for the GOOD bots. Filling it full of stuff for the bad bots won't help you because the bad bots are not going to be even reading the robots.txt file.
Oh man your right, yeah i have to list it as you said and not the other way around, why i even put it like that anyways copy and paste error. Yeah, i know repeating is not neccassary but i did this as an example to illustrate various bot names Putting a list in wouldn't hurt if it would help to deter all the bots is another question but if you don't shoot you would miss for certain But in this manor: User-agent: * Disallow: User-agent: Alexibot Disallow: / i would allow all bots and would make an attempt to block Alexibot right?
I think the usual order is disallow bots first and allow all others at the end, though it may not matter at all. I really don't disallow bots - only certain directories I don't want indexed. As I said, the good bots I want to let in - the bad bots are going to ignore the robots.txt file anyway so all that does is clutter up the file for the good bots. Why make it any harder than it has to be for Googlebot and the other Good Witches?
I keep a nice big fat robots.txt to help reduce bandwidth utilization from annoying and useless web robots.
To be clear on something guys, when i would use this syntax Would i need to specify each bot also that i want to grant access? i am not sure about this. Or would above suffice? Will Spencer, i see on your robots.txt for instance no indication of this, only an indication of the ones that are disallowed. So can i presume i only have to put in the file which ones to block and the ones that aren't listed such as googlebot would crawl the site since it doesn't encounter any reference to it?
No. This part: User-agent: * Disallow: Code (markup): applies to ALL bots not specifically mentioned. Yes. Although it's best to include the two lines above to state that clearly. It may not be necessary for the bots but if nothing else it will remind you the human of what the robots.txt file is doing.
A while back, I installed Opera specifically because it seems to be the one that is most likely these days to break pages. I don't use it for general browsing but it can be instructive to get a copy to view your site... much like in the old days I used to keep a copy of Netscape 4.7x around as a worst case scenario browser.