I need a php script that will go to: http://www.gumtree.com/business-services gather all the urls for all the adverts and load each one of them up through a scraper which I need to extract phone numbers out of the pages. I only want uk mobile numbers which start in 07 and are 11 digits long. Eg: 07593 354084 0754 492 2461 07529392193 07523 45 56 32 07564-435-239 07639-232-432 I need it to pick up all of the mobile numbers, however they are formatted. Please give me a quote.
Do you want to store them in database, download in csv or text file or just display them on your screen? If it's still available, you can contact me with your budget. It's really easy to do, so it can be done within hours.
You can try it preg_match('{07([0-9-\s]+){9,11}}', file_get_contents($url), $matchesarray); var_dump($matchesarray); PHP:
<?php set_time_limit(0); function getInfo($url) { $useragent = "Mozilla/5.0"; $ch = curl_init($url); curl_setopt($ch, CURLOPT_USERAGENT, $useragent); curl_setopt($ch, CURLOPT_AUTOREFERER, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $result = curl_exec($ch); curl_close($ch); return $result; } function isPhone($phone) { $phone = str_replace( array(' ', '_'), '', $phone ); if( strlen( $phone )==11 && substr($phone, 0, 2) == '07' && is_numeric( $phone ) ) return true; return false; } function getList() { $info = getInfo('http://www.gumtree.com/business-services'); preg_match_all('|href="http://www.gumtree.com/p/business-services/(.*)" name|', $info, $final); $final = $final[1]; $matrix = array(); foreach( $final as $item ) { $item = explode('"', $item); $matrix[] = 'http://www.gumtree.com/p/business-services/'.$item[0]; } return $matrix; } function getPhone($url) { $info = getInfo($url); preg_match('|<meta name="og:phone_number" content="(.*)"/>|', $info, $final); $phone = $final[1]; if( ereg('on', $phone) ) { $phone =explode('on ', $phone); $phone = $phone[1]; } if( empty($phone) ) { preg_match('|<meta name="description" content="(.*)" />|msU', $info, $final); $final = $final[1]; $pregs = array( '|07[0-9]{9}|', '|07[0-9]{3} [0-9]{5}|', '|07[0-9]{3}-[0-9]{5}|', '|07[0-9]{2} [0-9]{3} [0-9]{4}|', '|07[0-9]{2}-[0-9]{3}-[0-9]{4}|', '|07[0-9]{3} [0-9]{2} [0-9]{2} [0-9]{2}|', '|07[0-9]{3}-[0-9]{2}-[0-9]{2}-[0-9]{2}|', '|07[0-9]{3} [0-9]{3} [0-9]{3}|', '|07[0-9]{3}-[0-9]{3}-[0-9]{3}|' ); foreach( $pregs as $preg ) { preg_match( $preg, $final, $finalx ); if( isPhone( $finalx[0] ) ) { $phone = $finalx[0]; break; } } } return $phone; } $list = getList(); foreach( $list as $item ) { $phone = getPhone($item); if( $phone > 0 ) print $phone."<br />\n"; if( $x==30 ) break; # Remove this line to get all numbers $x++; } ?> PHP: