I'm creating a website that I don't want to get indexed. Well, actually I want the main site, mysite.com NOT indexed and I'll make a blog as a sub-domain (blog.mysite.com), which I DO want indexed. There is a feature in WP settings to Discourage search engines from indexing this site. However, can I be sure that if I tick this, the site will definitely not be indexed?
I don't know anything about WP, sorry. But I know how to do it directly within the robots.txt or the HTML output. Don't worry about your blog. You tell Google not to index www. and blog. will not be effected. Use the following methods with caution. Meaning it's up to the spider if they actually want to look at your 'noindex' properties or not. Google and all the other big ones do respect your settings. The best way would be setting the right properties within the robots.txt: Disallowing everything for everyone: User-agent: * Disallow: / OR disallowing everything for Googlebot only: User-agent: Googlebot Disallow: / Google Support: https://support.google.com/webmasters/answer/6062608?hl=en An other way would be using Meta-Tags: <meta name="robots" content="noindex"> <meta name="googlebot" content="noindex"> Important! For the noindex meta tag to be effective, the page must not be blocked by a robots.txt file. Google Support: https://support.google.com/webmasters/answer/93710?hl=en So, just enable the setting within WP and see what method they use. Call www.mysite.com/robots.txt. If this file does not exist, has no content or not any content that actually says 'disallow' then take a look at the HTML response within www.mysite.com and check for the proper meta tags.
Let us know if you need further help. Btw: There is a decent robots.txt checker within Google Webmaster Tools: google.com/webmasters/tools/robots-testing-tool
I imagine you can use all the tricks there is not to index. at the end of the day, it is up to google or other engines to honor your request; google will do what they well please.
Using the noindex meta tag is the only thing that makes sense to me. Via robots.txt, you can't allow access to a subdirectory if the main directory is blocked. Bots need to access the main directory to get to the subdirectory! Nevertheless, both approaches only work with bots that honor these tools. There are a number of bots that don't.
You can do this with tons of free wordpress plugins.... https://wordpress.org/plugins/ultimate-noindex-nofollow-tool-ii/ https://wordpress.org/plugins/easy-noindex-and-nofollow/ hope this helps!
Thanks for explaining. So maybe it's best to do it the other way round? Have the main site indexed and put the part I don't want to get indexed under a sub-domain?