View Full Version : How to create robot.txt file ? how to stop indexing files from cgi-bin ?
poseidon
Jan 23rd 2006, 11:54 pm
Hi,
I am new to the concept as far as robots.txt is concerned. What exactly it does ? How to use it ? Also I don't want search engines to crawl and index my cgi-bin directory. How can I do it.
Some code will be really helpful.
Regards.
Cristian Mezei
Jan 24th 2006, 12:06 am
You should read this (http://www.searchengineworld.com/robots/robots_tutorial.htm).
Jean-Luc
Jan 24th 2006, 12:16 am
Also I don't want search engines to crawl and index my cgi-bin directory. How can I do it.Use this robots.txt :
User-agent: *
Disallow: /cgi-bin/
Jean-Luc
poseidon
Jan 24th 2006, 2:21 am
so what I have to do is just to create a robots.txt file having
User-agent: *
Disallow: /cgi-bin/
isn't it ?
Jean-Luc
Jan 24th 2006, 2:32 am
Exactly. Make sure you upload the robots.txt in the right directory. You have to be able to view it at www.your-site.com/robots.txt.
Jean-Luc
noiprox
Jan 25th 2006, 10:52 am
i have read the tutorial, have a great robots.txt file, but when i do a sitemap generator, it still indexing those pages i want to disallow
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /cache/
Disallow: /class/
Disallow: /images/
Disallow: /include/
Disallow: /install/
Disallow: /kernel/
Disallow: /language/
Disallow: /templates_c/
Disallow: /themes/
Disallow: /uploads/
this is an example... its called robots.txt and is in the root
any thoughts?
GoGlobal
Jan 27th 2009, 3:03 am
Yeah,
It's good but which is the most important use in Robots.txt file.
manish.chauhan
Jan 27th 2009, 4:23 am
i have read the tutorial, have a great robots.txt file, but when i do a sitemap generator, it still indexing those pages i want to disallow
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /cache/
Disallow: /class/
Disallow: /images/
Disallow: /include/
Disallow: /install/
Disallow: /kernel/
Disallow: /language/
Disallow: /templates_c/
Disallow: /themes/
Disallow: /uploads/
this is an example... its called robots.txt and is in the root
any thoughts?
A Sitemap Generator tool doesn't read the robots.txt, it collects all web page urls and put it into a single file, you have to manually remove the web pages from there.
udayns
Jan 28th 2009, 4:57 am
you can tell help from http://www.robotstxt.org/ This will solve your all query.
manish.chauhan
Jan 28th 2009, 5:04 am
you can tell help from http://www.robotstxt.org/ This will solve your all query.
This is crap..:)
NickR25
Feb 1st 2009, 7:58 am
Don't put it in your sitemap; you will get the error in Google Webmaster Tools (if you use it) that there are some URLs in your sitemap being restricted by robots.txt.
sriraj46
Feb 2nd 2009, 1:51 am
Don't put it in your sitemap; you will get the error in Google Webmaster Tools (if you use it) that there are some URLs in your sitemap being restricted by robots.txt.
How would one put robots.txt in sitemap. First of all does the robots file be included in the sitemap page. I guess it doesn't.Correct me if i'm wrong
manish.chauhan
Feb 2nd 2009, 1:57 am
How would one put robots.txt in sitemap. First of all does the robots file be included in the sitemap page. I guess it doesn't.Correct me if i'm wrong
You can put the sitemap.xml in robots.txt... For more information check
http://www.sitemaps.org/protocol.php#submit_robots
shailendra
Feb 4th 2009, 3:02 am
This is crap..:)
How can you say that this is crap?
manish.chauhan
Feb 4th 2009, 3:25 am
How can you say that this is crap?
just because it only offers general instructions that one can get from anywhere else. However, when it comes to specific points like use of regular expressions, it doesn't provide solid information.
ggmittal
Feb 17th 2009, 5:54 am
How would one put robots.txt in sitemap. First of all does the robots file be included in the sitemap page. I guess it doesn't.Correct me if i'm wrong
yes... i agree.. robots.txt does need to be included in sitemap...
proson
Feb 17th 2009, 6:37 am
when it comes to specific points like use of regular expressions, it doesn't provide solid information.
hi manish I guess you want to exclude certain files?
if you want to exclude certain file why don't you use meta robots on that page instead?
why complicated things when you can do it easily...
DareDevils
Feb 23rd 2009, 4:33 pm
yes , this is the one
User-agent: *
Disallow: /cgi-bin/
infomalaya
Mar 30th 2009, 5:02 pm
Thanks for the tips!
3drendering
Apr 9th 2009, 10:49 pm
He Shailendra,
Manish is Absolutely Right.
Why you Oppose him??
Manish is Right
MrPJH
Apr 10th 2009, 9:26 pm
i cam here to ask the same question but found it already and most helpful
can i disallow: filename.php too? or only directories have to disallow
linkdealer
Jun 9th 2009, 12:41 am
You can disallow the pages as well along with directories
kiwin
Jun 14th 2009, 4:53 pm
hello, im new to this stuff too....you can check bigger sites like: http://www.cnn.com/robots.txt for more info.
and what is the difference between "/cgi-bin/" and "/cgi-bin" which is the right one?
sgtcory
Jun 14th 2009, 6:16 pm
.....what is the difference between "/cgi-bin/" and "/cgi-bin" which is the right one?
That depends on how it's being used. For example - if you have a Perl script you want to call, you couldn't call it without typing the full directory i.e. /cgi-bin/. This is because the server needs to know that it is a directory to look in for the xxxxxx.pl / xxxxx.cgi you want to run. Otherwise your call would look like :
/cgi-binxxxxx.pl
When you really want :
/cgi-bin/xxxxx.pl
However - leaving the trailing slash off is sometimes required if you are running a perl script in an server environment that already appends the slash (which is not typically the case). Or for example - you have a call to a perl script from a php script that looks like this :
$script_call='/xxxxx.pl';
if {
$thishappens;
}
echo '/cgi-bin$script_call';
Make sense?
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.