regex - extract data from web page

Discussion in 'PHP' started by php_techy, Jun 11, 2009.

  1. #1
    Hi,
    My html looks like this

    <meta name="description" content="New info! Code: http://www.example/index.html Code: http://testing.com/fil" />
    <!-- message -->
    <div id="post_message_510223" class="vb_postbit"><font color="green"><font size="3">Temp</font></font><br />
    <br />
    <br />
    <img src="http://sample/test.jpg" border="0" alt="" onload="NcodeImageResizer.createOn(this);" /><br />
    <br />
    <br />
    info!<br />
    <br />

    <div style="margin:20px; margin-top:5px">
    <div class="smallfont" style="margin-bottom:2px">Code:</div>
    <pre class="alt2" dir="ltr" style="
    margin: 0px;
    padding: 6px;
    border: 1px inset;
    width: 470px;
    height: 34px;
    text-align: left;
    overflow: auto">http://www.sample1.com/part1.html
    http://www.sample1.com/part1.html
    http://www.sample1.com/part1.html</pre>
    </div><br />

    <div class="smallfont" style="margin-bottom:2px">Code:</div>
    <pre class="alt2" dir="ltr" style="
    margin: 0px;
    padding: 6px;
    border: 1px inset;
    width: 470px;
    height: 1490px;
    text-align: left;
    overflow: auto">http://www.sample1.com/part1/sample_code.part01.rar
    http://www.sample1.com/part1/sample_code.part01.rar</pre>

    </div></div>
    I want all the values that are after Code:</div> and between pre tags.
    eg http://www.sample1.com/part1.html
    http://www.sample1.com/part1.html
    http://www.sample1.com/part1.html
    and
    http://www.sample1.com/part1/sample_code.part01.rar
    http://www.sample1.com/part1/sample_code.part01.rar

    Please note that at the start in meta tag there is also string Code: and I don't value from it.
    Thanks in advance
    Regards
     
    php_techy, Jun 11, 2009 IP
  2. matthewrobertbell

    matthewrobertbell Peon

    Messages:
    781
    Likes Received:
    35
    Best Answers:
    0
    Trophy Points:
    0
    #2
    preg_match_all('(http:\/\/(.+?)<\/pre>',str_replace("\n",'',$html),$output);
     
    matthewrobertbell, Jun 13, 2009 IP