I have seen a few post here on DP about how to get movie data from the IMDB website. No laughing but heres what I came up with, would anyone like to clean it up ? <?php $url = 'http://www.imdb.com/title/tt0499549/'; // a form here based into an admin panel would be best here... $imdb = get_data($url); $movie_pic = get_match('/<a name="poster".+title=".+">(.+)<\/a>/',$imdb); $dvd_image = preg_replace('#<img(.*)src="#','',$movie_pic); $image = preg_replace('#" />#','',$dvd_image); $genres= strip_tags(get_match('/<h5[^>]*>Genre:<\/h5>(.*)<\/div>/isU',$imdb)); $genre = preg_replace('#See more(.*)#','',$genres); $name = get_match('/<title>(.*)<\/title>/isU',$imdb); $director = strip_tags(get_match('/<h5[^>]*>Director:<\/h5>(.*)<\/div>/isU',$imdb)); $about = strip_tags(get_match('/<h5[^>]*>Plot:<\/h5>(.*)<\/div>/isU',$imdb)); $plot = preg_replace('#Full summary(.*)(|)Full synopsis(.*)#','',$about); $release_dates = strip_tags(get_match('/<h5[^>]*>Release Date:<\/h5>(.*)<\/div>/isU',$imdb)); $release_date = preg_replace('#See more(.*)#','',$release_dates); $mpaa = get_match('/<a href="\/mpaa">MPAA<\/a>:<\/h5>(.*)<\/div>/isU',$imdb); $rating = preg_replace('#<div[^>]*>#','',$mpaa); $run_time = get_match('/Runtime:<\/h5>(.*)<\/div>/isU',$imdb); $runtime = preg_replace('#<div[^>]*>#','',$run_time); echo "$image<br />"; echo "$name<br />"; echo "$genre<br />"; echo "$director<br />"; echo "$plot<br />"; echo "$release_date<br />"; echo "$rating<br />"; echo "$runtime<br />"; // Once you have these above variables, I guess you could INSERT them into a database and echo elsewhere... function get_match($regex,$content) { preg_match($regex,$content,$matches); return $matches[1]; } function get_data($url) { $ch = curl_init(); $timeout = 5; curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout); $data = curl_exec($ch); curl_close($ch); return $data; } ?> PHP:
I've seen that code on sourceforge - or elsewhere before. Looks ok to me, but you can consider using DOM instead of regular expressions, as you may find it easier to work with.
Yes, like danx10 says, try using DOM, which will speed things up for you and will be easier to understand technically: Read more on www(dot)w3schools(dot)com(slash)htmldom(slash)default(dot)asp or www(dot)w3schools(dot)com(slash)jsref(slash)default(dot)asp