View Full Version : robots.txt disallow
girbaud
Jan 13th 2006, 6:54 am
hi.
i just want to ask if my codes are correct??
User-agent: *
Disallow:
User-agent: *
Disallow: /public_html/configure/
User-agent: *
Disallow: /public_html/images/
Jean-Luc
Jan 13th 2006, 6:57 am
Your three robots.txt files are correct.
The first version does not disallow anything. The second and third versions disallow access to the mentioned directory.
Jean-Luc
girbaud
Jan 13th 2006, 7:01 am
is robots.txt important to my site?
do i need to have one?
ServerUnion
Jan 13th 2006, 7:10 am
You do not need the first entry. Good luck
Jean-Luc
Jan 13th 2006, 7:11 am
You do not need to have one. You only need one if you want to ask the robots not to visit some parts of your site. If you don't have a robots.txt, the robots will be quite happy:D , because it means they are allowed to visit all your site.
If there is no robots.txt in your site, each tentative access of the robots to read it will return a 404 error (file not found). This is not a problem at all for the robots, but it will show up in your stats as a series of 404 errors. To avoid that, you can use an empty robots.txt file or a robots.txt file that does not disallow anything, like your first example.
Some robots do not follow the instructions in robots.txt.
Jean-Luc
girbaud
Jan 13th 2006, 7:15 am
okay. thanks for all the responses.
is it true that robots can get you banned in search engines?
Jean-Luc
Jan 13th 2006, 7:24 am
is it true that robots can get you banned in search engines?What do you mean ?
A robot is a computer program operated by a search engine, a research organization, a University, an individual. To some extend, this computer program visits your web site like a human user would do.
Search engines use robots to know what the contents of your pages is. So you better allow them to visit your site, if you want search engine to let the world know that your site exists.
Some "bad robots" search for email addresses or known vulnerabilities in your web site. These robots do not respect the robots.txt standard anyway. For time being, you should probably not worry about them.
Hidden contents, some types of "unfair" link exchanges,... can get you banned from search engines. "Bad robots" cannot.
Jean-Luc
girbaud
Jan 13th 2006, 7:40 am
i've used a site that generates robots.txt... when im done with filling up the fields....
" Now you just copy the text above, create a new .txt file called robots.txt, paste the above text in it and upload it to the root your server ! That's all there is, be careful with editing, because a mistake can ban your site to the search engines for a long time. Last but not least, we are never responsible for the results of our tool, we do our best to make quality tools, but after all it's up to you to use our generated robots.txt file."
what does it mean by this "That's all there is, be careful with editing, because a mistake can ban your site to the search engines for a long time."
Jean-Luc
Jan 13th 2006, 7:56 am
User-agent: *
Disallow: /means that you do not want any robot in any directory of your site.
A search engine will not "ban" your site if it sees this, but it will respect your disallow instruction. At the same time, it will probably decide to postpone its next visit to your site to an undefined date.
Jean-Luc
girbaud
Jan 13th 2006, 8:55 am
thanks a lot Jean-Luc :)
minstrel
Jan 14th 2006, 8:18 am
i just want to ask if my codes are correct??
User-agent: *
Disallow:
User-agent: *
Disallow: /public_html/configure/
User-agent: *
Disallow: /public_html/images/
Do it like this:
User-agent: *
Disallow: /public_html/configure/
Disallow: /public_html/images/
but also check that "public_html" is a real directory on your site. Many sites these days use virtual hosting, so that a domain like www.somesite.com is physically situated in a directory like /~somesite/public_html/ but you wouldn't use that in the robots.txt file -- you'd just use the root directory and subdirectories for your domain. Thus, for a typical site, the robots.txt file would look like this for yout example above:
User-agent: *
Disallow: /configure/
Disallow: /images/
girbaud
Jan 16th 2006, 7:53 am
that's a nice advise minstrel! thanks for that one.
"public_html" is a real directory on my site, so i'll be using the smaller code. its more simple than the one i used.
genny2006
Mar 8th 2006, 6:54 am
Hey guys,
I have a forum and it has been getting so much spam i can barely keep up:( . I just wanted to know, do i really only need a robots.txt file to block? Thats it?:confused:
This would save me so much time.
I need an answer quick please.
Thank you
Genny2006
Jean-Luc
Mar 8th 2006, 7:08 am
Hi,
robots.txt does not block anything. It "asks" robots not to visit some pages. Bad intended robots don't read it or read it to try to discover targets for their attacks.
You need other methods to fight spam. By the way, what kind of spam do you have in your forum ?
Jean-Luc
genny2006
Mar 8th 2006, 7:18 am
I get all kinds of spam, from weird symbols to porn.
How can i fight this???
Genny
Jean-Luc
Mar 8th 2006, 7:26 am
These robots register in your forum and then they post all kind of stuff ?
Try to find out if there are repetitive elements in these multiple registrations: same IP-range, same words used,... Then your actions will depend on what you are able to do as admin. Maybe you can block access to some of these IP-ranges for a while or force people to copy some text to register.
Jean-Luc
genny2006
Mar 8th 2006, 7:50 am
They dont register, they just go in and post.
I can now spend half my days just removing all the junk they post.
I have 2 ideas of things to do but im trying to find pros and cons. The site is made in PHP. I was thinking of doing an image validation when the user clicks post. Or someone said that i can use the Microtime() function, which would differenciate the time interval that the robots use and the users use. and if it takes a certain amount of time either let it through or not.
Genny
minstrel
Mar 8th 2006, 7:56 am
Is this a forum? or a blog? if so, what software are you using?
genny2006
Mar 8th 2006, 8:40 am
Its a forum, this is the site, homeressources.com/forum/index.php, i dont know about any software.
minstrel
Mar 8th 2006, 8:50 am
You are using software called "Phorum": see http://dev.phorum.org for more information. Maybe start with Getting rid of spam entries (http://www.phorum.org/phorum5/read.php?12,54912) and spam invasion on our site forum (http://www.phorum.org/phorum5/read.php?12,54792)...
I believe DigitalPoint member Sarahk is experimenting with this - you might send her a private message asking her to look at this thread and advise you.
vBulletin® v3.6.8, Copyright ©2000-2008, Jelsoft Enterprises Ltd.