Robots.txt help please

Discussion in 'Search Engine Optimization' started by edwardo, Feb 1, 2007.

  1. #1
    Hi there,

    I have a site based on osCommerce, which i am trying to optimise for the engines. My problem is that on some pages google has indexed both:

    http://www.mp3extras.co.uk/product_info.php/products_id/54

    and

    http://www.mp3extras.co.uk/product_info.php/products_id/54/language/en

    Which have the exact same content. I am fairly sure i can use robots.txt to disallow google from doing this. But what code exactly do i use ? And as i am using a rewrite for SEO friendly URLS does this affect it in any way ?

    I think i am looking at something along the lines of:

    User-Agent: Googlebot
    Disallow: */language/en

    Or i could list each product page individually - however that would take ages.

    Many thanks,
    Eddy
     
    edwardo, Feb 1, 2007 IP
  2. Dudibob

    Dudibob Peon

    Messages:
    618
    Likes Received:
    14
    Best Answers:
    0
    Trophy Points:
    0
    #2
    that way might work, or even just

    User-Agent: *
    Disallow: /lanaguage/en

    Sign up for Google sitemaps and use there Robots.txt tester or something similar just so you don't shoot yourself in the foot ;)
     
    Dudibob, Feb 1, 2007 IP
  3. edwardo

    edwardo Peon

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Brilliant,

    Worked a treat.

    Used this if anyone is interested:

    User-Agent: *
    Disallow: */language/en
    Disallow: */osCid

    Thanks
     
    edwardo, Feb 1, 2007 IP
  4. thegypsy

    thegypsy Peon

    Messages:
    1,348
    Likes Received:
    109
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I would also look into the SEF URL contributions for OSC
     
    thegypsy, Feb 1, 2007 IP