View Full Version : picking out phrases using REGEX
gfreeman
Jun 8th 2007, 11:15 am
Hi all,
I have a really long string called $html which contains a whole web page from which I want to grab some info.
The format is sort of like this:
blahblahblah
<tag>info-1</tag>
<tag>info-2</tag>
<tag>info-3</tag>
tum-te-tum-te-tum
<tag>info-4</tag>
<tag>info-5</tag>
foobarfoobar
I want 2 arrays - one array that contains the info-n between blahblah and tum-te-tum, and another array that contains the info-n between tum-te-tum and foobar.
Can a master of regex show me how this is done please?!
Thanks!!
krakjoe
Jun 8th 2007, 11:59 am
regex is a pretty specific thing to use, please post a link to the actual page you want data from and point out the actual data you want.
gfreeman
Jun 8th 2007, 12:21 pm
The page is at:
http://gfreeman.com/files/regex.zip
In it you see about 1/4 of the way down a line that says "This team has visited the following leagues for international training matches" followed by a list of images of flags. I need an array containing the numbers prefixing all the flag.gifs
Again, about halfway down, there is a line that says "This team has been visited by teams from the following leagues for international training matches" and I need another array containing the numbers prefixing all those flag.gifs
If this helps - then a HUGE thanks!
krakjoe
Jun 8th 2007, 1:09 pm
<?
function retrieve_data( $location )
{
if( !@fopen( $location, 'r' ) )
{
printf("Cannot open %s", $location );
}
else
{
$buffer = file_get_contents( $location ) ;
if( preg_match('~This team has visited the following leagues for international training matches:(.*?)This team has been visited by teams from the following leagues for international training matches:~si', $buffer, $leagues ) )
{
foreach( split( "\n", $leagues[1] ) as $line )
{
if( preg_match('~SRC="/Common/images/([0-9]+)flag.gif"~', $line, $num ) )
{
$returns['visited'][ ] = (int) $num[1];
}
}
}
if( preg_match( '~This team has been visited by teams from the following leagues for international training matches(.*?)<H2>International Players</H2><BR>~si', $buffer, $visitedby ) )
{
foreach( split( "\n", $visitedby[1] ) as $line )
{
if( preg_match('~SRC="/Common/images/([0-9]+)flag.gif"~', $line, $num ) )
{
$returns['visitedvy'][ ] = (int) $num[1];
}
}
}
}
return $returns ;
}
echo "<pre>";
print_r( retrieve_data('test.html') );
?>
Your arrays ......
gfreeman
Jun 8th 2007, 1:58 pm
1- That's some excellent coding!
2- It doesn't break my current php when I change it slightly and embed it
3- Alas it returns zero lines, but that's probably more to do with my tweaks than anything.
I'll check further when I get home this weekend, but you've probably cracked this for me!
Thanks!!!
gfreeman
Jun 13th 2007, 6:56 am
OK, all working perfectly thanks.
That was a great solution!
vBulletin® v3.6.8, Copyright ©2000-2008, Jelsoft Enterprises Ltd.