Hi there, How can I pull data from other websites and get it stored in my database.I need the data for reporting process. Thanks
In order to get data from other websites in PHP, you can use cURL. You can find more info at http://au.php.net/curl.
But don't bother, use curl because it's faster and easier. $url="http://anything"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); $data = curl_exec ($ch); curl_close ($ch); // you can do something with $data like explode(); or a preg match regex to get the exact information you need echo $data; PHP:
If you have access to the function, file_get_contents() is faster (to type) than all those extra lines of cURL. Check it out ... $text = file_get_contents('http://www.mypage.com/') ; // scrape page into variable preg_match ("/<!--start product-->([^`]*?)<!--end product-->/", $text, $temp); // get data out of the page echo htmlentities($temp[0]) ; // spits out the 1st occurance of your data PHP: It can get more complicated than the above code but it really depends on what you need harvested. If you don't have access to file_get_contents() you could write a function to automate all the cURL stuff that will work the same as file_get_contents. I think cURL is a bit faster so it might be smart to go ahead and use it. function file_get_the_contents($url) { $ch = curl_init(); $timeout = 10; // set to zero for no timeout curl_setopt ($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout); $file_contents = curl_exec($ch); curl_close($ch); return $file_contents; } // now start your data harvesting $text = file_get_the_contents('http://www.mypage.com/') ; // scrape page into variable preg_match ("/<!--start product-->([^`]*?)<!--end product-->/", $text, $temp); // get data out of the page echo htmlentities($temp[0]) ; // spits out the 1st occurance of your data PHP:
The benefit of curl over file_get_contents is curl allows you to do stuff like post data, follow redirects, spoof user agent, accept cookies, etc.
Hi All, I used curl_init() and it worked.. used some RegExp to get the particular piece of information i needed. Special thanks to Andy Peters and ErectADirectory.
Can you please give me some advice for this problem, I do it site in CMS, I have boxes on site http://www.istoots.com and in this boxes must pull automatic information from this site http://www.dodtracker.com/ but only in couple boxes, can somebody help me to give me advice hove I can do this... Thanks.
Also there is something called WPRobot for automated posting of content on your wordpress. are you talking about such a thing?
Hi Andy, Hope your still in here sometimes How would the code look like if i need to login to a website? I have this code but it doesn't work. $username = 'xxxxxx'; $password = 'yyyyy1'; $url = 'http://www.fracsoft.com'; $context = stream_context_create(array( 'http' => array( 'header' => "Authorization: Basic " . base64_encode("$username:$password") ) )); $data = file_get_contents($url, false, $context); // echo $data
I have created a program in Java that pulls data from a website. To read in page: public static BufferedReader read(String url) throws Exception { return new BufferedReader ( new InputStreamReader ( new URL(url).openStream() ) ); } Code (markup): Then I find instances of particular exclusive chars to find the starting point of my data int start = line.indexOf(">") + 1; Code (markup): and then afterward, I find instances of the next char to end mark the end of the information I am looking for int start = line.indexOf("/") - 4; Code (markup): then I run a loop from the start to the finish and append a String String whatIwant = ""; for (int i = start; i < end; i++) { whatIwant = (whatIwant + line.charAt(i)); } Code (markup): Then I finally print that data to a file or screen. This may be slow but I have not had any trouble getting all the data in a situation where the pages announce the changing value in the url... I increment the value (or pull the data from a predefined text file) and reinitiate the URL from another section of code... The advantage is that it actually loads the entire page to gather the data so you are able to capture anything that is sent to the presentation layer without risking 'hacking' the website. Simply put, for them to block this, they would have to block an address for accessing their website to many times. As it stands, I am increasing their ranking anyway. Any questions will not be answered unless written on a $20 bill and sent to my address. (for those not familiar with Java... this will be wasted... if you are, you have enough information to do what I have done.)
hi gang, sorry to resurrect this thread, but I also would like to pull certain snippets of text from a web page that's wrapped in a div tag that has a class. Something similar to this: <span class="OutOfStock">Out of stock</span> HTML: is this possible? thanks, gabstero
Way too much to teach you for free bro. You need to learn regular expressions (regex) and functions like preg_match_all(). Any links I send would be via google, which you are capable of.