Hi, I created a subdomain for testing a Wordpress site. It is a huge site, so it will take up a lot of space. I want to ban robots, stop Google from indexing the site etc. Basically ban everything except human visit to the page, so that I can still see it. If I put this into the robots.txt and upload it onto the subdomain's files, will it only affect the subdomain or the entire site? User-agent: * Disallow: / Because, the main domain should be left alone. I know this code bans robots from visiting. But: should I put anything else in it? Will this also block Google from indexing any pages, images found on the subdomain?
Using that code ONLY bans bots that RESPECT your wishes. ROGUE bots will still index your site. The ONLY sure ways not to get indexed are 1) password protect your subdomain AND/OR 2) NEVER access your subdomain publicly AND/OR 3) make sure there are NO links from your public domain to your subdomain
The original question was: does this code only ban bot from the subdomain or, does it also apply for the main domain? I want the main domain to be indexed (it is already), but I want to exclude the subdomain. As for password protection: I will still access the subdomain, so I will not block it entirely. I want to block unwanted traffic and robots. The subdomain is for a staging environment - for testing the new version of the site.
Read what I said again. I answered your question then added some other comments to help you, all of which allow you to access your whole site, but limits what the bots can access. And I know what I said works because I have a public facing website of 13,000 pages and an even larger private website with additional technical information that is useless to the public.
Hi, I found this step by step guide on allowing and disallowing subdomains from a website. https://www.theproche.com/2020/05/03/robots-txt-to-disallow-subdomains/
Hello! mmerlinn is right - only password protection will save the subdomain from Google visits for sure. If Googlebot found a single link pointing to this subdomain - it can visit and crawl it so .. robots.txt is not solution.