Robots.txt help?

Discussion in 'Search Engine Optimization' started by Khaludi92, Dec 25, 2011.

  1. #1
    Hello Forum,

    I am working on Wordpress Blog where I will have Original content and Syndicated ( Or copied ) content.

    For instance,

    All the post in the Category "Apple" are Copied/Syndicated from other sites. However, All the Originally written Articles will go to the Category "News". These are just examples, the real the design is a little different.

    I am planning to display all the categories on the Homepage. In other words, If a Article/blog post is published with the category NEWS or Apple, Either ways it will appear on the Homepage. Moreover, The article will also be accessible in the Archives page which is MYURL/Category/Apple or MYURL/Category/News.

    I know that Syndicated Copied content will harm my website's SEO to Great extent, but to reduce harming my blog to some extent am planning to place robot.txt so as to keep Crawlers/Search engine Spiders away from the Syndicated Content. If I have got all of it wrong, please correct me :)

    Here is the Robot.txt i wrote to keep Search engine away from Copied content on my site/blog


    User-agent: *
    Disallow: /date/2011/01/23
    Disallow: /category/APPLE/
    
    
    Sitemap: http://www.MYURL/sitemap.xml
    Code (markup):
    The Date Line above is an example date on which i Copied Content into my site.

    Can someone tell me if I have written a correct Robots.txt file ? What possible corrections do i need to make.. ?
     
    Khaludi92, Dec 25, 2011 IP
  2. filegrasper

    filegrasper Active Member

    Messages:
    493
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    73
    #2
    why you need to pay for the simple job? just disallow the content of the of the copied category
     
    filegrasper, Dec 26, 2011 IP
  3. mhovingh

    mhovingh Well-Known Member

    Messages:
    341
    Likes Received:
    13
    Best Answers:
    1
    Trophy Points:
    145
    #3
    Does the content in your "Apple" category dominate the % of content that appears on your front page? If not, I wouldn't worry about disallowing it.

    Does the "Apple" content that appears on your front page include a link to its own page? By this, I mean do you create /category/APPLE/an-apple-post and your CMS automatically puts it on your front page with a link to the /an-apple-post page? If so, I wouldn't worry about disallowing it.

    As long as you disallow the actual page, you should be fine.

    Keep in mind that duplicate content penalties get kind of overblown in terms of the effect they have on your site when you see people talking about them in forums and on their SEO blogs. Google doesn't penalize as much as lots of people seem to believe. I view it as a "the sky is falling" effect.

    Google will index anything, duplicate or not, and does not automatically kill your site's SERP because you have some duplicate content on it. Google is looking for relevancy and has no problem with syndication. What Google wants to see is a link to the original source and value from your site. Google doesn't want you to copy content just for the sake of getting more pages on your site. If you provide an added value to Google's searchers, you are doing it right.

    An example would be a product, let's say Green Widgets. The Widgets Corporation has their own website where they have a page for Green Widgets. They have their factory specs, products details, and some other information about the Green Widgets on that page. You have a website and copy the content for Green Widgets. You add a sentence or two saying something that adds no real value, like "These are green widgets. They are not blue widgets." Your site is going to suffer in Google because you are copying without the intent to provide additional value. Now let's say that you copy all the Green Widgets information, give a professional review, provide some details about how Green Widgets compare to Yellow Widgets, and let users comment to ask questions or review the Green Widgets. All of these things you added were not available on the official Widgets Corporation website, meaning you are giving real value to certain searchers.

    Taking the second instance where you provided value, Google should treat search results like this:
    - I search for "Green Widgets". Google serves me the Widgets Corporation website as the top results and may include you in a lower placing on the results page.
    - I search for "Green Widgets Reviews". Google serves me a results page that is much more likely to include your site and put you closer to the top where the Widgets Corporation website appears.
    - I search for "Comparison of Green and Yellow Widgets Review". Google is very likely to serve me your website above the Widgets Corporation website because your site, though containing some duplicate content, is much more likely to give me the information that I as the searcher want to see.

    Google wants to please the person searching, not the websites, and tries to recognize acceptable duplicate content. Too many webmasters get into the mentality of needing to please Google. You do need to make sure you do that, but at the core is pleasing the person performing a search. If your site adds to the value of the /APPLE content, you are doing what Google wants you to do. If you are just copying content for the sake of having it, adding no value for a searcher, your /APPLE content is likely to hurt you.
     
    Last edited: Dec 26, 2011
    mhovingh, Dec 26, 2011 IP
  4. weboved@gmail.com

    weboved@gmail.com Peon

    Messages:
    30
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    sometimes google dont look at robots.txt so be careful with it...
     
    weboved@gmail.com, Dec 26, 2011 IP