I have a site www.widgets.com www.site2.com is actaully parked at http://www.widgets.com/site2 but is a domain in its own right. So i have a domain parked on a subfolder of my widgets.com account (im confusing myself) I dont want google to index "http://www.widgets.com/site2" so can I use Disallow: /site2 in my robots.txt file without affecting www.site2.com? Hope that makes sense...
Yes, do a Google search for Robots.txt tutorial and follow directions. Would tell you what to do but if I made mistake do not want you mad at me. Here is a tutorial and validator. http://www.searchengineworld.com/robots/robots_tutorial.htm http://www.searchengineworld.com/cgi-bin/robotcheck.cgi Remember not all bot are obedient creatures. Good luck. Shannon
no worries shannon I understand robots.txt syntax its just becuase i have a second domain parked on a subfolder of the first that im worried
I did similar thing when a client needed a one page web up the next day in time for press release. I created a page in root directory of web he owned and pointed second domain to newly created page in root directory of his existing web. Things rocked along nicely for a couple of years until owner of the parked domain gave the domain name to an artist who linked to page using second domain name. We were then hit with duplicate content penalty. Shannon
If I understand you correctly, you have this configuration: website1.com > /physical/path/webroot website2.com > /physical/path/webroot/site2/ There are two ways to access website2.com (as http: //website2.com/ and as http: //website1.com/site2/), and you will need to protect it twice. If you want to use robots.txt for this, you'd need two of them - one in the root of website1.com for the /site2/ directory and one in the root of website2.com. I wouldn't use robots.txt for this, though (because it reveals the protected path). If it's a temporary location that you use just for testing, use htaccess to return 404 (not found) for eveybody, except you (say, using the IP address range or user agent). J.D.
yeah jd thats right. But i dont want to hide it, just dont want google to index http: //website1.com/site2/ i want them to index http: //website2.com so it robots.txt ok for that?
Are you on a Nix server, THT? If so, use a redirect in your htaccess file so that all requests for http: //website1.com/site2/ are sent to http: //website2.com -- except that I'd recommend you use http: //www.website2.com rather than http: //website2.com
Note: The .htaccess file MUST be placed in the root directory of www.website1.com -- i.e., at http://www.website1.com/.htaccess
Then hide the directory /site2 with something like htaccess (you can do a similar thing with IIS through configuration). There's no need to redirect /site2 to the second website, since SEs will be able to access the second site through the domain name. Like I said, I wouldn't use robots.txt for this. Both websites will behave as if they are independent. In fact, unless you share some code between these websites, I would place the second website in a separate directory (thinking that you would have to add a virtual website anyway, it shouldn't be a problem to create another directory on this machine. J.D.