I have used php and curl to screenscrape a web page into another web page.. but the page called was a very basic .txt file What i need to do now is call a Div from an external web page. The div has an ID of "mobile" this div contains all of the page copy. No navigation, footer, header etc. I own both sites, the reason for doing this is the copy on the screengrabbed page will be updated frequently. And as this page copy is duplicated on the other site this will mean not having to do twice the update. I have spent 6 hours wandering the web trying to work out how its done. And i see other people asking the same question and the answers agree it is possible but no one gives a clear answer as to how. I am not a php programmer - so part of instructions wont be enough sadly. I have found a popular answer "Download the page using cURL (There are a lot of examples in the documentation). Then use a DOM Parser, for example Simple HTML DOM or PHPs DOM to extract the value from the div element." - but ive tried this and its too advanced for me to work out. Ive asked my service provider and they have pointed me towards grabber v.01 - which ive downloaded but it isnt well commented enough for me to adjust. 1) can this be done by inserting something inbetween divs on my page the is calling the screengrab. 2) or does this need to be a program that runs and then delivers it to my page calling the screen grab 3) is this something called in the header of my page calling the grab... im lost. I have tried: <div id="page"> <? $html = file_get_contents('http://www.mysite.com',0); $dom = new DOMDocument(); $dom->loadHTML($html); $dom_element = $dom->getElementById('mobile'); $inner_html = $dom_element->textContent; ?> </div> ive also tried: <div id="page"> $html = file_get_html('http://www.mysite.com'); $ret = $html->find('div[id=mobile]'); </div> and that didnt work either.. any pointers much appreciated!
Try this: <?php $url = "http://www.mysite.com"; $d = new DOMDocument(); $d->loadHTMLFile($url); $xpath = new DOMXPath($d); $myMobile = $xpath->query('//@id="mobile"')->item(0); ?> PHP:
Hi Lee, thanks for the suggestion - but this one returns a stream of page errors.. eg - Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: ID download already defined in http://mysite.com, line: 139 in /home/mysite/www/m/test.php on line 20 I did get my hopes up with this one: <? $html = file_get_contents('http://www.trailrun.co.nz/aucklandseries/hunua.php'); $dom = new DOMDocument('1.0', 'iso-8859-1'); //Suppress any warnings from invalid html markup @$dom->loadHTML( $html ); $xpath = new DOMXPath( $dom ); $query = '//div[@id="mobile"]'; $nodes = $xpath->query( $query ); foreach( $nodes as $node ){ echo $node->nodeValue; } ?> This one above does actually pull the information from the div into my page!... but sadly it strips all of the layout inside that div and just displays it as a massive paragraph.. so it must be stripping titles, images and classes within the div...
When you say "strips all of the layout inside that div" do you mean inner HTML syntax? Or do you mean CSS styling?
Doing what you want is advanced, so you'll either have to advance your skills, find someone who wants to play and has the skills or pay someone to do it. It's really as simple as using cURL or opening the foreign page as a file, but either one is "advanced".
Thanks Rukbat - I came to that conclusion last night ive worked around my lack of knowledge by removing the div from the page im trying to screenscrape - placing it in its own file. Using php to include it back into its old file, and curl to call it into my other file so I have acheived the end result of only having to update one file when the copy has to update, so all good.. if not a little round about lol. Mobile app here we come
Thanks for opening this thread Deliwasista! and thanks for answering it Lee Stevens! I needed to scrape the ATP Rank (Tennis ranking) from the ATP site for a players site I was doing. This works perfect! Thanks, and good luck with your project!
it stripped out all html syntax and styling and displayed all of the copy as a run togther massive paragraph
Excellent! im glad you found an answer too FYI Im using curl to pull the external file holding the copy into the frame of my page.. and looking for other options as this seems to slow my page download time down considerably. <?php $data = file_get_contents("http://www.mysite.co.nz/training/training.php",0); echo $data; ?> <?php $url = "http://www.mysite.co.nz/training/training.php"; $ch = curl_init(); $timeout = 5; // set to zero for no timeout curl_setopt ($ch, CURLOPT_URL, $url); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout); $file_contents = curl_exec($ch); curl_close($ch); ?> ps in my travels i came across this answer from someone else on the subject of php includes. Im only calling a simple text file so this has not effected me - but im adding it to my thread just in case it helps anyone else trying to include a more complicated page. ----------------------------------------------------------------------------------------------------------------------------------------------------------------- "Something not previously stated here - but found elsewhere - is that if a file is included using a URL and it has a '.php' extension - the file is parsed by php - not just included as it would be if it were linked to locally. This means the functions and (more importantly) classes included will NOT work. for example: <?php include "http://example.com/MyInclude.php"; ?> would not give you access to any classes or functions within the MyInclude.php file. to get access to the functions or classes you need to include the file with a different extension - such as '.inc' This way the php interpreter will not 'get in the way' and the text will be included normally. "