I am doing SEO for a friens site and i've encountered an issue I don't quite understand. The company that put the site together flattened the homepage URL so that it does not show the index.htm file. Now google is indexing the page twice, once as www.domain.net and the other as www.domain.net/index.htm. I obviously can't Canonicalise the index.htm file as both the flat URL and index.htm pages would then show the canonicalisation. Would it be possible to use the robots.txt or something, to block indexing of the index.htm file. would this prevent total homepage indexing or would Google still index the flat url, just not the index.htm?? I hope this is explained sufficiently if you need further info please ask.
I'm afraid I don't know how to solve the problem, but first up we should ask does it really matter than it's indexed twice? If the only concern is about duplicate content and SE penalisation (is that a word?) then I'd be willing to bet that the SEs know the content is on the same page and therefore wouldn't penalise you?
That's really what I'm concerned about. Webmaster Tools is flagging both pages which makes me think that it's more likely than not, that the site will get penalised.
Add the follow rel=canonical in the page header and Google would only consider domain.net as the index page: <link rel="canonical" href="http://domain.net" />
Like I said, if I do that, both pages will show the canonical tags, there isn't two pages, there is one page index.htm and the flattened domain.
Is the page at domain.net and domain.net/index.html different? If not, then Google will understand which url to keep in its index and which one to throw out!
Because some (including your friend) might be linking to domain.net and others are linking to domain.net/index.htm, causing Google to index both URLs! This is exactly the reason why rel canonical was introduced!
Domain would be a better choice, that way you won't have to worry while changing the home page to php or some other extension.