Hi, can anyone help me out with this? How to use robots.txt to index only the homepage and disallow/block all the other pages from search engines? I can't seem to find a proper answer for this anywhere. Thanks in advance.
No, this should not work as index.html in in root (the slash). Also, this would make search engines rank the homepage for the file index.html. Unless you rewrite from root to index.html anyways (via .htaccess for example), this is confusing for search engine crawlers and bots. Show me your website and I give you a solution.
Thanks for the replies, I was actually testing out a wordpress theme on my personal blog, which I find is not very search engine friendly but it's not really a big of a problem because it's just my personal site and I don't put much content on it. I've used my robots.txt to block search engines from indexing the wp-content/themes directory because the theme somehow doesn't point post pages to their original urls but points to certain urls within the theme's directory instead. Here's the link to my site: Azure Haze Let me know if you have any idea to make it more search engine friendly and btw, the theme is Folio Elements from Press75.com Thanks.
Everything but your homepage is in wp-content - the robots.txt should look like this: User-agent: * Disallow: /wp* Disallow: /feed/ Code (markup): This should allow indexing of your homepage but not the rest of the content.
Thanks for the advice. So the star sign * in /wp* will block every directory tht starts with wp, including /wp-content, /wp-admin, /wp-include? Does other crawlers than googlebot recognize this function? So far I don't see any major problem in my G webmasters account. I'll wait for a few more days to see if there's any changes. One more question, if I'm not using the url removal tool in G webmasters, the old/unused pages that have been indexed will disappear after a certain period of time, correct?
They won't be deindexed. Instructions in robots.txt makes the robots stop from futher crawling, but they don't tell search engines to deindex those pages. Meta noindex is need for that.
Thanks, thanks for the advice. Being more specific, I meant pages that don't exist anymore, do they get de-indexed after a period of time?
Pages that do not exist anymore need to give the search engine crawler a 404-error-page. You can solve that problem using .htaccess. If you put your errorpages in folders (e.g. 404) than you need to add this to your .htaccess file: After the search engines have picked that up, the non-existing pages stop appearing in the SERPs. Be patient.
I used .htaccess to redirect my 404s to my homepage, is it okay to do so? Does it make any difference if I direct them to a custom 404 page? Thanks for the advice.
I have a similar question, how can i remove all pages except homepage, also its not a wordpress site, otherwise I would have used the code by jabz.biz Any ideas?
What he did was to disallow the directories and other files. So if your root directory is something like folder1/ folder2/ test/ index.html anotherpage.html Code (markup): You should enter something like User-agent: * Disallow: /folder* Disallow: /test/ Disallow: /anotherpage.html Code (markup):
No, this way a Search Engine does not understand, that this site does not exist anymore. Setup a 404 error page and offer the user some links where he/she can find what he/she is looking for.