Robots.txt code to disallow by default but allow domain name

I am trying to reconfigure a robots.txt file. I know this approach may be frowned upon but... I want to exclude everything except certain specified directories (instead of allowing everything except certain paths/files)

Consider this block:
User-agent: *
Disallow: /
Allow: /Dir1/
Allow: /Dir2/
Allow: /Dir3/
Allow: /Dir4/
Code (markup):
This works except for one fatal flaw. It blocks the use of the default home page referenced by the url domain name alone, such as:
www.domainname.com
Code (markup):
Since the 'index.htm' or whatever default file returned by the web-server is implied and not implicit the rule fails for the domain name by itself. I don't care much for the idea of allowing everything by default and then having to hunt down everything I don't want indexed/crawled. Whoever came up with this idea was creating crawlers

I know you can allow subdirs after a disallow statement but how then can you handle anything in the root? Hell, that's the one place I want to limit. It seems like it would be much simpler to be able to just list areas of a site you want crawled, not the other way around. Am I crazy? Or is this just stupid?

Any workarounds I can't see?

Log in or Sign up

Robots.txt code to disallow by default but allow domain name

Steviebone Member

Log in or Sign up

Robots.txt code to disallow by default but allow domain name

Steviebone Member

Useful Searches