Parsing link content whit preg

Argento Active Member

Messages:: 69

Likes Received:: 2

Best Answers:: 1

Trophy Points:: 53

#1

Hi, iam trying to parse the director name (red part):
<div id="director-info" class="info">
<h5>Director:</h5>
<a href="/name/nm0004716/">[COLOR="Red"]Darren Aronofsky[/COLOR]</a><br/>
</div>
Code (markup):
i tried this, but i cant make it, how should i do it ???
preg_match('/director:<\/h5><a href=\"([^\"]*)\">(.*)<\/a>/i', $file, $matches)
Code (markup):
thanks alot !

Argento, May 26, 2009 IP

koko5 Active Member

Messages:: 394

Likes Received:: 14

Best Answers:: 1

Trophy Points:: 70

#2

Hi,

You have to strip \r\n for the input string ($file):
preg_match('/director:\<\/h5\>\<a href=\"([^\"]*)\"\>(.*)?\<\/a\>/i', preg_replace('#(\r?\n)+#','',$file),$matches);
PHP:
Regards

koko5, May 27, 2009 IP

Argento Active Member

Messages:: 69

Likes Received:: 2

Best Answers:: 1

Trophy Points:: 53

#3

thanks koko, i have a problem, i really suck whit regular expressions because i taked some rules from the internet, but i really dont understand well.

The array is returning two values, the first one "/name/nm0004716/" (href content), but not Darren Aronofsky (this is the value that i need), and the second value of the array returns all the web content.

How can i solve it ? and it is any nice tutorial to learn about regular expressions to parse contents ? thanks and sory for my english !

Argento, May 27, 2009 IP

koko5 Active Member

Messages:: 394

Likes Received:: 14

Best Answers:: 1

Trophy Points:: 70

#4

Hi, Argento

the returned result is array and its size depends on round brackets you use in your regular expression.
Here is an example:
$file='<div id="director-info" class="info">
<h5>Director:</h5>
<a href="/name/nm0004716/">Darren Aronofsky</a><br/>
</div>';
$matches=array();
preg_match('/director:\<\/h5\>\<a href=\"([^\"]*)\"\>(.*)?\<\/a\>/i', preg_replace('#(\r?\n)+#','',$file),$matches);
print_r($matches);
PHP:
Array
(
[0] => Director:</h5><a href="/name/nm0004716/">Darren Aronofsky</a>
[1] => /name/nm0004716/
[2] => Darren Aronofsky
)
Click to expand...

Now, let's remove ...href=\"([^\"]*)... because we don't need the href value, but only innerHTML instead:
Array
(
    [0] => Director:</h5><a href="/name/nm0004716/">Darren Aronofsky</a>
    [1] => Darren Aronofsky
)
PHP:
Hope it's now a little bit clear.
Regards

koko5, May 27, 2009 IP

Argento Active Member

Messages:: 69

Likes Received:: 2

Best Answers:: 1

Trophy Points:: 53

#5

yeah, it works fine in the example, i see that my problem it is whit the entire code, when i convert the url content to an string:

$url = "http://www.imdb.com/title/tt1125849/";

function get_imdb($url)
{
   if (!($file = file_get_contents($url)))
      trigger_error('Imposible to return imdb page', E_USER_ERROR);
   if (!preg_match('/director:\<\/h5\>\<a href=\"([^\"]*)\"\>(.*)?\<\/a\>/i', preg_replace('#(\r?\n)+#','',$file),$matches))
      trigger_error('Unable to parse IMDB response', E_USER_ERROR);
   return $matches[1];
}

$resultado = get_imdb($url);
echo $resultado;

Code (markup):

Why it dosent work in this case ?

Thanks koko !

Argento, May 27, 2009 IP

JDevereux Peon

Messages:: 50

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 0

#6

You could also do this using DOM and XPath:

$html = file_get_contents('http://www.imdb.com/title/tt1125849/');
$dom = new DOMDocument();
@$dom->loadHTML($html);


$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//div[@id='director-info']//a");

for ($i = 0; $i < $hrefs->length; $i++) {
	$href = $hrefs->item($i);
	echo $href->firstChild->data . '<br />';
  echo $href->getAttribute('href');
  }

PHP:

JDevereux, May 27, 2009 IP

koko5 Active Member

Messages:: 394

Likes Received:: 14

Best Answers:: 1

Trophy Points:: 70

#7

Argento said: ↑

Why it dosent work in this case ?
Click to expand...

Because incomming data comes escaped and you have to stripslashes:
preg_match('/director:\<\/h5\>\<a href=\"([^\"])*\"\>(.*)?\<\/a\>/i', stripslashes(preg_replace('#(\r?\n)+#','',$file)),$matches)
PHP:
btw as JDevereux wrote, it's better using DOM is this case.

Regards

koko5, May 27, 2009 IP

Log in or Sign up

Parsing link content whit preg_replace

Argento Active Member

koko5 Active Member

Argento Active Member

koko5 Active Member

Argento Active Member

JDevereux Peon

koko5 Active Member

Log in or Sign up

Parsing link content whit preg_replace

Argento Active Member

koko5 Active Member

Argento Active Member

koko5 Active Member

Argento Active Member

JDevereux Peon

koko5 Active Member

Useful Searches