Disallow All Pages Except Homepage

AzureHaze Peon

Messages:: 171

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#1

Hi, can anyone help me out with this?

How to use robots.txt to index only the homepage and disallow/block all the other pages from search engines?

I can't seem to find a proper answer for this anywhere.

Thanks in advance.

AzureHaze, Jul 22, 2009 IP

dmi Well-Known Member

Messages:: 2,705

Likes Received:: 51

Best Answers:: 0

Trophy Points:: 140

#2

User-agent: * 
Disallow: /
Allow: /index.html
Code (markup):
Try that and let us know if it works.

dmi, Jul 23, 2009 IP

jabz.biz Active Member

Messages:: 384

Likes Received:: 6

Best Answers:: 1

Trophy Points:: 70

#3

No, this should not work as index.html in in root (the slash). Also, this would make search engines rank the homepage for the file index.html. Unless you rewrite from root to index.html anyways (via .htaccess for example), this is confusing for search engine crawlers and bots.

Show me your website and I give you a solution.

jabz.biz, Jul 28, 2009 IP

AzureHaze Peon

Messages:: 171

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#4

Thanks for the replies, I was actually testing out a wordpress theme on my personal blog, which I find is not very search engine friendly but it's not really a big of a problem because it's just my personal site and I don't put much content on it.

I've used my robots.txt to block search engines from indexing the wp-content/themes directory because the theme somehow doesn't point post pages to their original urls but points to certain urls within the theme's directory instead.

Here's the link to my site: Azure Haze
Let me know if you have any idea to make it more search engine friendly and btw, the theme is Folio Elements from Press75.com

Thanks.

AzureHaze, Jul 29, 2009 IP

jabz.biz Active Member

Messages:: 384

Likes Received:: 6

Best Answers:: 1

Trophy Points:: 70

#5

Everything but your homepage is in wp-content - the robots.txt should look like this:
User-agent: *
Disallow: /wp*
Disallow: /feed/
Code (markup):
This should allow indexing of your homepage but not the rest of the content.

jabz.biz, Jul 30, 2009 IP

AzureHaze Peon

Messages:: 171

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#6

Thanks for the advice.

So the star sign * in /wp* will block every directory tht starts with wp, including /wp-content, /wp-admin, /wp-include? Does other crawlers than googlebot recognize this function?

So far I don't see any major problem in my G webmasters account. I'll wait for a few more days to see if there's any changes.

One more question, if I'm not using the url removal tool in G webmasters, the old/unused pages that have been indexed will disappear after a certain period of time, correct?

AzureHaze, Jul 30, 2009 IP

jabz.biz Active Member

Messages:: 384

Likes Received:: 6

Best Answers:: 1

Trophy Points:: 70

#7

After big spy Google picks up your new robots.txt those pages should be de-indexed.

jabz.biz, Jul 30, 2009 IP

dmi Well-Known Member

Messages:: 2,705

Likes Received:: 51

Best Answers:: 0

Trophy Points:: 140

#8

jabz.biz said: ↑

After big spy Google picks up your new robots.txt those pages should be de-indexed.
Click to expand...

They won't be deindexed. Instructions in robots.txt makes the robots stop from futher crawling, but they don't tell search engines to deindex those pages. Meta noindex is need for that.

dmi, Aug 1, 2009 IP

AzureHaze Peon

Messages:: 171

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#9

Thanks, thanks for the advice. Being more specific, I meant pages that don't exist anymore, do they get de-indexed after a period of time?

AzureHaze, Aug 1, 2009 IP

jabz.biz Active Member

Messages:: 384

Likes Received:: 6

Best Answers:: 1

Trophy Points:: 70

#10

Pages that do not exist anymore need to give the search engine crawler a 404-error-page. You can solve that problem using .htaccess. If you put your errorpages in folders (e.g. 404) than you need to add this to your .htaccess file:

# custom errorpages
ErrorDocument 401 /401/
ErrorDocument 402 /402/
ErrorDocument 403 /403/
ErrorDocument 404 /404/
ErrorDocument 500 /500/
Click to expand...

After the search engines have picked that up, the non-existing pages stop appearing in the SERPs. Be patient.

Last edited: Aug 5, 2009

jabz.biz, Aug 5, 2009 IP

AzureHaze Peon

Messages:: 171

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#11

I used .htaccess to redirect my 404s to my homepage, is it okay to do so? Does it make any difference if I direct them to a custom 404 page?

Thanks for the advice.

AzureHaze, Aug 5, 2009 IP

premiumscripts Peon

Messages:: 1,062

Likes Received:: 48

Best Answers:: 0

Trophy Points:: 0

#12

Well, the user experience will be better if you show them a custom 404 page obviously.

premiumscripts, Aug 5, 2009 IP

Professional Dude Prominent Member

Messages:: 6,261

Likes Received:: 430

Best Answers:: 0

Trophy Points:: 330

#13

I have a similar question, how can i remove all pages except homepage, also its not a wordpress site, otherwise I would have used the code by jabz.biz

Any ideas?

Professional Dude, Aug 23, 2009 IP

Exa Active Member

Messages:: 471

Likes Received:: 8

Best Answers:: 0

Trophy Points:: 85

#14

Professional Dude said: ↑

I have a similar question, how can i remove all pages except homepage, also its not a wordpress site, otherwise I would have used the code by jabz.biz

Any ideas?
Click to expand...

What he did was to disallow the directories and other files. So if your root directory is something like
folder1/
folder2/
test/
index.html
anotherpage.html
Code (markup):
You should enter something like
User-agent: *
Disallow: /folder*
Disallow: /test/
Disallow: /anotherpage.html
Code (markup):

Exa, Sep 1, 2009 IP

jabz.biz Active Member

Messages:: 384

Likes Received:: 6

Best Answers:: 1

Trophy Points:: 70

#15

AzureHaze said: ↑

I used .htaccess to redirect my 404s to my homepage, is it okay to do so? Does it make any difference if I direct them to a custom 404 page?

Thanks for the advice.
Click to expand...

No, this way a Search Engine does not understand, that this site does not exist anymore. Setup a 404 error page and offer the user some links where he/she can find what he/she is looking for.

jabz.biz, Sep 8, 2009 IP

Log in or Sign up

Disallow All Pages Except Homepage

AzureHaze Peon

dmi Well-Known Member

jabz.biz Active Member

AzureHaze Peon

jabz.biz Active Member

AzureHaze Peon

jabz.biz Active Member

dmi Well-Known Member

AzureHaze Peon

jabz.biz Active Member

AzureHaze Peon

premiumscripts Peon

Professional Dude Prominent Member

Exa Active Member

jabz.biz Active Member

Useful Searches