Index.html

Discussion in 'Search Engine Optimization' started by finucane, Jun 30, 2009.

  1. #1
    I heard someone say that if you write www.yourdomain.com/index.html you should not see yourdomain.com
    He said that it could be counted as duplicate content. Is this correct and if so how can I fix it. He mentioned something about htaccess and re-running a mod whatever that means. Can someone please explain what he was talking about. Thanks!
     
    finucane, Jun 30, 2009 IP
    merlinseo likes this.
  2. finucane

    finucane Peon

    Messages:
    37
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    By the way, yourdomain.com is just an example even though it does seem to exist.
     
    finucane, Jun 30, 2009 IP
  3. ApexSEORM

    ApexSEORM Peon

    Messages:
    58
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #3
    what this person is talking about is canonicalization. Basically the idea is to only show visitors as well as SE's 1 version of any page on your site. SE's see the following as all being "different"

    domain.com/
    domain.com/index.html
    www.domain.com/
    www.domain.com/index.html

    So what you want to do is pick one and stick with it. I recommend www.domain.com/ as this is the way most people are going to link to your site. the bit about htaccess is setting up 301 redirects for the other options to make sure that the URL is always rewritten to your chosen format. This ensures when someone improperly links to your site using one of the other options that the person clicking the link will be sent to the proper version. It also helps with ensuring that all the linkjuice from that link is passed to the proper place.

    what you want to do is set up a mod rewrite rule in the htaccess file. Additionally you may want to include the meta canonical tag in your document head. this will cue google and other SE's which version you prefer it to index and can help with dupe content problems.

    For more information on mod rewrite and canonicalization check out some of these articles.

    Canonicalization:
    http://en.wikipedia.org/wiki/Canonicalization
    http://www.mattcutts.com/blog/seo-advice-url-canonicalization/
    http://www.seobook.com/canonicalization-missing-manual

    Mod rewrite:
    http://www.modrewrite.co.uk/mod-rewrite/canonical-urls-with-mod-rewrite.html
    http://httpd.apache.org/docs/2.0/misc/rewriteguide.html

    Hope that helps answer your question :)
     
    ApexSEORM, Jun 30, 2009 IP
    m42 likes this.
  4. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #4
    Here is an excerp from a previous post of mine that attempts to explain it in layman's terms:

     
    Canonical, Jun 30, 2009 IP
  5. RoninEMS

    RoninEMS Peon

    Messages:
    35
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    How does one accomplish the redirects? Are we talking about creating redirects in the domain manager of your host? For example, I am using cpanel to manage my host is this where I would manage this.
     
    RoninEMS, Jun 30, 2009 IP
  6. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #6
    If you are hosted on an Apache web server (Linux or some other flavor of Unix) then you likely have access to Mod Rewrite and several other tools that could be used to accomplish this. I prefer Mod Rewrite. It requires learning about regular expressions but it's not that difficult once you get the hang of it.

    There are lots of books that you can pick up at Borders or Barnes and Nobles to learn this... Or you can fumble thru the Apache docs online. I would look at the 1.3 docs for an explanation of the order in which RewriteCond and RewriteRules are executed... I tried to explain this in a recent post here.

    For example, if I chose http://www.example.com/ as my Canonical URL in the post above, I could create a .htaccess file for Mod Rewrite and place it in the root of that looked something like this:

    DISCLAIMER: I haven't tested this... Just winged it off the top of my head... but it should be close. It should take care of the 3 redirects I mentioned above for the home page AND any folder or subfolder that has an index.html in it as the default document.
     
    Canonical, Jun 30, 2009 IP
  7. anhbloginc

    anhbloginc Well-Known Member

    Messages:
    1,288
    Likes Received:
    15
    Best Answers:
    0
    Trophy Points:
    175
    #7
    Simple added :

    <link rel="canonical" href="http://anhblog.net/">

    Replace http://anhblog.net/ by your url . It's help google bot don't mark it such as a duplicate version
     
    anhbloginc, Jun 30, 2009 IP
  8. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #8
    <link rel="canonical" href="http://www.example.com/"> in the <head> of your home page HTML will fix the canonical issues... Ummm... at Google, Yahoo!, and MSN/Live/Bing. But what about the all of those other less sophisticated engines that don't support it? You're screwed.

    <link rel="canonical"> was designed for sites that don't have access to something like Mod Rewrite or server side scripting to implement 301 redirects like those running pure HTML sites on IIS or for very large ecommerce sites where it would be very difficult to implement canonical URLs w/ 301 redirects because they have LOTS of query string parameters (and the same query string name/value pairs in a different order are seen as different URLs even though they are rendering the same page). I heard Matt Cutts in person at Pubcon November 2008 say this and that the new <link rel="canonical"> element should be used basically as a last resort.

    Besides only being supported by basically 3 engines, another major problem w/ <link rel="canonical"> is that it still shows the non-canonical URL in the browser. How do you think most webmasters who want to link to a page on your site get the URL for the link? They:

    1) go to your site, navigate to the URL they want to link to, copy the address out of the browser, and paste it into an <a href="paste it here"> element on their site OR
    2) they follow a link from another site to the URL on your site they want to link to, copy the address out of the browser, and paste it into an <a href="paste it here"> element on their site .

    Depending on how they navigate to the page or which link they follow to the site, they may copy a canonical or non-canonical URL to make their link. By NOT implementing 301 redirects and using <link rel="canonical"> you are perpetuating the use of non-canonical URLs because you continue to show them in the browser address bar. 301 redirects solve this problem. No matter which link they follow on your site or another... the canonical URL is ALWAYS displayed in their browser so going forward almost every link to your site will likely be created with the canonical URL. If it's not, the 301 redirect changes the browser so it sees the canonical.

    301 redirects are STILL the prefered way to implement canonical URLs. You should learn to do it the right way instead of taking the easy way out by using something meant for those who cannot implement 301s.

    What are you going to do when you move a page from URLA to URLB. Putting <link rel="canonical" href="URLB"> in the head of URLA might give URLB credit for the inbound links but it is NOT going to redirect the user from URLA to URLB. They will still see the old page... So credit for inbound links to URLA is transfered to URLB... Great! But the browser is still showing the old URLA AND the content displayed will be that of URLA NOT the new page at URLB. If you implement a 301 redirect when a page moves from URLA to URLB not only does URLB get credit for all of URLA's inbound links but URLB is ALWAYS shown in the browser address bar regardless of whether URLA or URLB is requested.

    Learn to do 301 redirects... It's the proper way to fix canonical issues. Use <link rel="canonical"> ONLY as a last resort.
     
    Canonical, Jun 30, 2009 IP
  9. merlinseo

    merlinseo Well-Known Member

    Messages:
    1,686
    Likes Received:
    54
    Best Answers:
    0
    Trophy Points:
    130
    #9
    Cannonical, Repped u +++, Excellent info , this is the best thread to understand the redirect and cannoical issues. I have bookmarked it also .
     
    merlinseo, Jun 30, 2009 IP
  10. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #10
    Thanks... hope it helps... sorry so long winded! lol ;)
     
    Canonical, Jun 30, 2009 IP
  11. ApexSEORM

    ApexSEORM Peon

    Messages:
    58
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #11
    harsh I pretty much gave all the same info and he gets the rep :(
     
    ApexSEORM, Jun 30, 2009 IP
  12. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #12
    Actually I got no rep... I don't do this for rep. Just to help out where I can.
     
    Canonical, Jun 30, 2009 IP
  13. finucane

    finucane Peon

    Messages:
    37
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Thank you very much for explaining it to me. If I am not mistaken I need to add about 2 lines to my htaccess file.
    Could someone please tell me exactly what I should write for http://www.footballreporter.co.uk
    Also I need to know that the folders I have on that the index pages I have in other folders on that site will not be redirected to the home page of the site.
    Thanks!
     
    finucane, Jul 2, 2009 IP
  14. Canonical

    Canonical Well-Known Member

    Messages:
    2,223
    Likes Received:
    141
    Best Answers:
    0
    Trophy Points:
    110
    #14
    It looks like you are already redirecting requests for:

    http://footballreporter.co.uk/index.html
    http://www.footballreporter.co.uk/index.html

    to:

    http://www.footballreporter.co.uk/

    However, when someone request a non-www URL (for example, http://footballreporter.co.uk) you are still rendering the page under the non-www URL. You should be 301 redirecting all such requests to the www version of the URL.

    So I would recommend adding the following to the bottom of the .htaccess in your in the root of the web:

    Now if someone requests (used example.com so you wouldn't end up w/ a bunch of 404s in your Google WMT):

    http://example.co.uk
    http://example.co.uk/folder/
    http://example.co.uk/folder/subfolder
    http://example.co.uk/page.html
    http://example.co.uk/folder/page.html
    http://example.co.uk/folder/subfolder/page.html

    to:

    http://www.example.co.uk
    http://www.example.co.uk/folder/
    http://www.example.co.uk/folder/subfolder
    http://www.example.co.uk/page.html
    http://www.example.co.uk/folder/page.html
    http://www.example.co.uk/folder/subfolder/page.html

    respectively.
     
    Canonical, Jul 3, 2009 IP
  15. finucane

    finucane Peon

    Messages:
    37
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #15
    Thank you for all the information, especially a big thanks to Canonical for the detailed explanations. I spoke with some nice people at an SEO company called High Position at a fair and they helped me solve the index problem, but then I decided that I wanted all the options to go to http://www and as I could not figure it out myself, the help I got here was exactly what I needed and much appreciated. So once again thank you very much.
    I hope other readers will find this useful too.
     
    finucane, Jul 4, 2009 IP