I am trying to craft two regular expressions that I can use with preg_match and preg_match_all to grab information off a webpage. So far I have been partially successful but I was hoping someone could help guide me to a solution. First I need to grab a number in parentheses. This appear just once on the webpage and comes in this form: Client Age (##), where ## is the number I need to grab Second, I need to grab a line of information between a span tag. This occurs several times on the webpage but it follows the same form: <span class="blue-highlight">TEXT</span>, where TEXT is the information I need to collect. I should also note that in TEXT there are other HTML tags but I will worry about parsing them out later. So far I have tried a couple different expressions but I can't get the first one working at all and none of the second seem to work as intended. This expression produces the best results on the first one but still far off from what I need: $regexp = '(\<span(.+)\>)'; Can anyone offer some help? Many thanks in advance.
I figured out half of the solution so I wanted to share my results in case anyone ran into the same problem. I wanted to create a regular expression that would grab a number from the following form: Client Age (##), where ## is the number of interest. The regular expression that worked for me was: $regexp = '/Client Age \(([0-9]+)\)/' Let me break down this solution so you can understand it. 1. / - This character will start your regular expression 2. Client Age - Simply the string that we are searching for 3. \( - Also part of the string we are searching for but you must use the escape character "\" on "(" so it is literally interpreted 4. ([0-9]+) - Next we are searching for a number. Because we don't know the number or length of the number we use [0-9] to denote at all numbers may appear and "+" to signify that there may be one or more numbers 5. \) - Also part of the string we are searching for but you must use the escape character "\" on ")" so it is literally interpreted 6. / - This character will end your regular expression. So now when I use: preg_match($regexp, $input, $output), $output becomes an array that holds the results. For instance: echo $output[0], might return "Client Age (24)" Hope this helps, will post the other half if I figure it out.
Hi, About the client age you can use regexp like: $regexpr='#client age\s*\(\d+\)#im'; PHP: About the other part of your question - this class makes easy to grab webpages and find specific elements.Just take a look at the example and you'll figure it out yourself. Regards
Thanks for the information! That html parser class works great and I would really recommend it to anyone who is need of a similar solution.