Robots.txt question

THT Peon

Messages:: 686

Likes Received:: 8

Best Answers:: 0

Trophy Points:: 0

#1

I have a site www.widgets.com

www.site2.com is actaully parked at http://www.widgets.com/site2 but is a domain in its own right.

So i have a domain parked on a subfolder of my widgets.com account

(im confusing myself)

I dont want google to index "http://www.widgets.com/site2" so can I use Disallow: /site2
in my robots.txt file without affecting www.site2.com?

Hope that makes sense...

THT, Jun 5, 2005 IP

Smyrl Tomato Republic Staff

Messages:: 13,740

Likes Received:: 1,702

Best Answers:: 78

Trophy Points:: 510

#2

Yes, do a Google search for Robots.txt tutorial and follow directions. Would tell you what to do but if I made mistake do not want you mad at me.

Here is a tutorial and validator.

http://www.searchengineworld.com/robots/robots_tutorial.htm
http://www.searchengineworld.com/cgi-bin/robotcheck.cgi

Remember not all bot are obedient creatures.

Good luck.
Shannon

Smyrl, Jun 5, 2005 IP

THT Peon

Messages:: 686

Likes Received:: 8

Best Answers:: 0

Trophy Points:: 0

#3

no worries shannon

I understand robots.txt syntax its just becuase i have a second domain parked on a subfolder of the first that im worried

THT, Jun 5, 2005 IP

Smyrl Tomato Republic Staff

Messages:: 13,740

Likes Received:: 1,702

Best Answers:: 78

Trophy Points:: 510

#4

I did similar thing when a client needed a one page web up the next day in time for press release. I created a page in root directory of web he owned and pointed second domain to newly created page in root directory of his existing web. Things rocked along nicely for a couple of years until owner of the parked domain gave the domain name to an artist who linked to page using second domain name. We were then hit with duplicate content penalty.

Shannon

Smyrl, Jun 5, 2005 IP

J.D. Peon

Messages:: 1,198

Likes Received:: 65

Best Answers:: 0

Trophy Points:: 0

#5

If I understand you correctly, you have this configuration:

website1.com > /physical/path/webroot
website2.com > /physical/path/webroot/site2/

There are two ways to access website2.com (as http: //website2.com/ and as http: //website1.com/site2/), and you will need to protect it twice. If you want to use robots.txt for this, you'd need two of them - one in the root of website1.com for the /site2/ directory and one in the root of website2.com.

I wouldn't use robots.txt for this, though (because it reveals the protected path). If it's a temporary location that you use just for testing, use htaccess to return 404 (not found) for eveybody, except you (say, using the IP address range or user agent).

J.D.

J.D., Jun 5, 2005 IP

THT Peon

Messages:: 686

Likes Received:: 8

Best Answers:: 0

Trophy Points:: 0

#6

yeah jd thats right.

But i dont want to hide it, just dont want google to index http: //website1.com/site2/

i want them to index http: //website2.com

so it robots.txt ok for that?

THT, Jun 5, 2005 IP

minstrel Illustrious Member

Messages:: 15,082

Likes Received:: 1,243

Best Answers:: 0

Trophy Points:: 480

#7

Are you on a Nix server, THT?

If so, use a redirect in your htaccess file so that all requests for http: //website1.com/site2/ are sent to http: //website2.com -- except that I'd recommend you use http: //www.website2.com rather than http: //website2.com

minstrel, Jun 5, 2005 IP

THT Peon

Messages:: 686

Likes Received:: 8

Best Answers:: 0

Trophy Points:: 0

#8

yeah i am using the www. version

what would be the syntax for this?

THT, Jun 5, 2005 IP

minstrel Illustrious Member

Messages:: 15,082

Likes Received:: 1,243

Best Answers:: 0

Trophy Points:: 480

#9

Try this:

Redirect 301 /site2 http://www.website2/

minstrel, Jun 5, 2005 IP

minstrel Illustrious Member

Messages:: 15,082

Likes Received:: 1,243

Best Answers:: 0

Trophy Points:: 480

#10

Note:

The .htaccess file MUST be placed in the root directory of www.website1.com -- i.e., at http://www.website1.com/.htaccess

minstrel, Jun 5, 2005 IP

J.D. Peon

Messages:: 1,198

Likes Received:: 65

Best Answers:: 0

Trophy Points:: 0

#11

THT said:

yeah jd thats right.

But i dont want to hide it, just dont want google to index http: //website1.com/site2/

i want them to index http: //website2.com

so it robots.txt ok for that?
Click to expand...

Then hide the directory /site2 with something like htaccess (you can do a similar thing with IIS through configuration). There's no need to redirect /site2 to the second website, since SEs will be able to access the second site through the domain name. Like I said, I wouldn't use robots.txt for this. Both websites will behave as if they are independent. In fact, unless you share some code between these websites, I would place the second website in a separate directory (thinking that you would have to add a virtual website anyway, it shouldn't be a problem to create another directory on this machine.

J.D.

J.D., Jun 5, 2005 IP

THT Peon

Messages:: 686

Likes Received:: 8

Best Answers:: 0

Trophy Points:: 0

#12

apart from im on a *nix server as previously discussed, so no IIS

THT, Jun 6, 2005 IP

ZuraX Active Member

Messages:: 156

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 58

#13

Would it be wise to also use this 301 redirect for dir's for subdomains?

ZuraX, Jun 6, 2005 IP

Log in or Sign up

Robots.txt question

THT Peon

Smyrl Tomato Republic Staff

THT Peon

Smyrl Tomato Republic Staff

J.D. Peon

THT Peon

minstrel Illustrious Member

THT Peon

minstrel Illustrious Member

minstrel Illustrious Member

J.D. Peon

THT Peon

ZuraX Active Member

Useful Searches