Hi, I want to know that how we can scrape or mine data from business directories, white pages and yellow pages websites with excel? Currently I am looking to do it with website 411.ca These are the values I want to get from the website.... Business Name Address Phone Fax Email Website Please help me how to do this? Regards Hassan
you can use curl and regex if your language is php, Like this: <?php function get_data($url) { $ch = curl_init(); $timeout = 5; curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); $data = curl_exec($ch); curl_close($ch); return $data; } $page = "http://m.411.ca/business/profile/7673631?source=Suggestion"; $returned_content = get_data($page); $pattern = "/(m?)streetAddress.*?</"; preg_match($pattern, $returned_content, $matches); $matches[0] = str_replace('>', '', str_replace('"', '', str_replace("streetAddress", "", str_replace("<", "", $matches[0])))); echo $matches[0]; ?> PHP: This is an example how to get the address "70 Six Point Rd" from "http://m.411.ca/business/profile/7673631?source=Suggestion" Note, use some better regex patterns to remove the need of str_replace, But you get the idea!
Thanks for the detailed explanation... I am sorry if my question is not clear to you, I want to extract data with excel 2010... Regards Hassan
Excel's VBA is such a dog to use, why not get smart and use php? Is someone telling you VBA is the way to go? or do you just not have hosting to run the script on?
If you want all info from 411.ca, we have it. We even have hidden profiles. 2.38 million records with 108,000 non-duplicate, verified emails. I will be nice and sell for $499. Email: