Hey there. Scrapping is usually really easy these days with the use of Regular Expressions and CURL with php, however I ran into a problem recently with a client in which I couldn't grab the some values because it was being generated via JavaScript. I figured okay, I'll try DOM, but because DOM doesn't let you set header functions I couldn't get the info from the the following site because the DOM didn't know how to correctly handle the page request. I'll post how I got CURL to properly give me return results at the bottom of this so you can experiment and to save you time. http://equestrian.en.alibaba.com/trustpass_profile.html On that page is an example of a company, I've got all the other information recorded fine, but take a look at the source code for that page and look for 'Selling Leads (171)' and 'Products (130)'. I'm trying to capture those numbers. You will notice that the numbers are generated by Javascript, and the source shows the Javascript instead of the numbers =(, If you can find a way to do it or point me in the right direction I don't mind paying you for your help (by pay-pal preferably). Best, - Joe Code to help you connect to there pages as promised below. It works by making it think you are a googlebot. function getResultFromURL($url) { //This function needs to be like this because it disguises the URL as the googlebot so it can read from any site $curl = curl_init(); $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,"; $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; $header[] = "Cache-Control: max-age=0"; $header[] = "Connection: keep-alive"; $header[] = "Keep-Alive: 300"; $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; $header[] = "Accept-Language: en-us,en;q=0.5"; $header[] = "Pragma: "; // browsers keep this blank. curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)'); curl_setopt($curl, CURLOPT_HTTPHEADER, $header); curl_setopt($curl, CURLOPT_REFERER, 'http://www.google.com'); curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate'); curl_setopt($curl, CURLOPT_AUTOREFERER, true); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl, CURLOPT_TIMEOUT, 10); $html = curl_exec($curl); // execute the curl command curl_close($curl); // close the connection return $html; // and finally, return $html } PHP:
You don't need to do anything. The information is right on the page: var sellLeadsCount = ""+171; var productsCount = ""+28; Code (markup): Peace,