Need regex script for data mining

Discussion in 'Scripts' started by caslab, Jul 31, 2008.

  1. #1
    Hello, I'm trying to find a regex that will allow me to scrape data from 1,000+ table rows like this below and assign $1 variable to weight and $2 to 155 lbs. Do you know the right expression for this?

    The best I could come up with is:
    <th[^>]*><[^>]>([^<]*)</th><td[^>]*>([^<]*)</td>

    But I have something wrong in the first part (between the th). The td part by itself works great.


    <tr><th class="theader" ><a class="heading" href="?r=sourcecode"
    title="Display in a new window"
    onclick="if (popInfo(this.href, 300, 300)) return false;"
    onkeypress="if (popInfo(this.href, 300, 300)) return false;">Weight</a></th><td class="data"> <a class="ext-link" href="?r=sourcecode" title="Show Details" onClick="if (popInfo(this.href, 450, 300)) return false;" onKeyPress="if (popInfo(this.href, 450, 300)) return false;">155 lbs</a></td></tr>


    What I want to get:
    -------
    Weight = $1
    155 lbs = $2

    Another complication is that not all of the table rows have the <a class> part, so I'm wondering if regex can magically account in for that.

    Thanks for any help you have!
     
    caslab, Jul 31, 2008 IP