Robots.txt ?

Discussion in 'robots.txt' started by mortgage-pro-seo, Feb 28, 2006.

  1. #1
    I am using phpbb for one site. The problem is that it generate multiple urls. One of the Urls is crap mortgagesaver.org/forum/refi-w-540-fico-vt1539.html?start=0&postdays=0&postorder=asc&highlight=

    Can I use robots.txt to prevent urls being indexed that have ? or highlight in the url
     
    mortgage-pro-seo, Feb 28, 2006 IP
  2. Jean-Luc

    Jean-Luc Peon

    Messages:
    601
    Likes Received:
    30
    Best Answers:
    0
    Trophy Points:
    0
    #2
    You can use robots.txt to prevent URL's starting with some text to be indexed.

    Example :
    User-agent: *
    Disallow: /forum/refi-w-540-fico-vt1539.html?
    Code (markup):
    This will prevent that URL's starting with /forum/refi-w-540-fico-vt1539.html? are indexed.

    This is probably not practical in your case, if you have hundeds of these URL's.

    Jean-Luc
     
    Jean-Luc, Feb 28, 2006 IP
  3. mortgage-pro-seo

    mortgage-pro-seo Peon

    Messages:
    170
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I need a solution for thousands of pages.
     
    mortgage-pro-seo, Feb 28, 2006 IP
  4. chengfu

    chengfu Well-Known Member

    Messages:
    113
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    108
    #4
    There is no solution for excluding files based on url-parameters using robots.txt. To ban those you will have to change the forum code and insert the robots meta-tag into the output when the highlight-parameter is given.

    Something like this at the right place should do the job:
    
    if ($_GET["highlight"] != "") {
    echo '<meta name="robots" value="noindex,follow,noarchive">';
    }
    
    Code (markup):
     
    chengfu, Feb 28, 2006 IP
  5. seolion

    seolion Active Member

    Messages:
    1,495
    Likes Received:
    97
    Best Answers:
    0
    Trophy Points:
    90
    #5
    even I have similar prob in my new forum. But here I am experimenting something else.

    I have the bot indexing mod installed. (I still dont know whether it is working properly)

    For those urls which is already indexed with session ids, I have site wide noarchive tags. This will prevent my pages to not to go into supplemental results.

    I have a sitemap with all the major urls without session ids, I hope over a period of time, the site will get crawled normally.
     
    seolion, Feb 28, 2006 IP
  6. mortgage-pro-seo

    mortgage-pro-seo Peon

    Messages:
    170
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Any other creative ideas here?
     
    mortgage-pro-seo, Mar 1, 2006 IP
  7. alifan

    alifan Peon

    Messages:
    46
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    You could use a JPG pin number function but that could really annoy users
     
    alifan, Mar 30, 2006 IP