hi can anyone knows how do i do a html grabber from a third party website ? which i know need a few of this: curl fopen php etc. which after that my script will read the following url as a source ? for example yahoo and what my script will read for is <input type="text" name="test"> and so i can use my script to do some more of the additional functions i need help please guide me
with curl you will get entire page. Then you must do some regexp to extract your desired content from it. Check that link http://www.php.net/manual/en/ref.curl.php#65700 for a page grabber class Another good reference link for you: http://blog.lejer.ro/tag/curl/ that contains a little article for curl books and the curl basic&usual usage surely will help
I have given you only some advices, what articles to read, to get example files , not providing a final script You want a website scrapper, that's all. And i don't provide such things
i mean i tried some of the codes but it stated it need some sort of curl scripting to do the viewing of other third partys source ? isit true ?
I don't use curl very often, generally fopen() is acceptable, just fopen($url, 'r') will set it up and you run a loop to grab the data. Depending if you're on linux, it's also possible to just exec() to wget and have it stored as a file (also an option to overcome if your host blocks fopening urls).
ermm but for the fopen right i manage to get my own codes but i cant detect the codes can anyone help out ? $filename = "http://www.baddot.com" ; $dataFile = fopen( $filename, "r" ) ; if ( $dataFile ) { while (!feof($dataFile)) { $buffer = fgets($dataFile, 4096); //$myfile = html_entity_decode($buffer); full link without picture $myfile = htmlentities($buffer); if($myfile == "%.jpg"){ echo $myfile . "<br>"; echo "I got a picture named"; } } fclose($dataFile); } else { die( "fopen failed for $filename" ) ; } PHP:
<? function html_to_array( $url, $element = null ) { if( !( $data = file_get_contents( $url ) ) ) return false; preg_match_all( '~<img.*?>(</img>)?~si', $data, $page['img'] ); preg_match_all( '~<div.*?>.*?[^<]</div>~', $data, $page['div'] ); preg_match_all( '~<style.*?>.*?[^<]</style>~', $data, $page['Inline_Css'] ); preg_match_all( '~<link.*?>~', $data, $page['Linked_Css'] ); preg_match_all( '~<meta.*?[^>]>~', $data, $page['Meta'] ); preg_match_all( '~<a.*?[^>].*[^<]</a>~', $data, $page['Link'] ); return !is_null( $element ) ? $page[ $element ] : $page ; } function display_links( $links, $htmlentities = true ) { foreach( $links as $number => $link ) { printf("Link number %d : [ %s ]<br />\n", $number + 1, $htmlentities ? htmlentities( $link ) : $link ); } } foreach( html_to_array( 'http://forums.digitalpoint.com' ) as $element => $html ) { printf( "I see %d %s tags<br />\n", count( $html[0] ), str_replace('_', ' ', $element ) ); } foreach( html_to_array( 'http://forums.digitalpoint.com', 'Link' ) as $links ) { printf("I found %d links, here they are :<br />\n%s", count( $links ), display_links( $links ) ); } ?> PHP: Something like that, I wouldn't use curl if you don't have too, it's marginally quicker than file_get_contents but totally uncalled for in most cases........
using curl in most cases is NO different to using fopen or file_get_contents or file, curl is only useful if you need some special control over the http request you are making, like setting a useragent, or referer string, it's also helpful if you have large files to download as you can use callbacks to write data as it becomes available instead of waiting untill the server has the whole file in its temporary filesystem. that code DOES work, I tested it before I posted it, if you could tell me exactly what doesn't work for you, and post the exact code that doesn't work for you, I'm sure someone can get it to work.
hi guys i already did a code which can display the gif and jpg but how do i get the variable from each array for instance Array ( [0] => Array ( [0] => src="http://images.friendster.com/200703E/js/headernav.js"></script><script type="text/javascript" src="http://images.friendster.com/200703E/js/friendster_v1.js"></script><script type="text/javascript" src="http://images.friendster.com/200703E/js/home.js"></script><script type="text/javascript" src="http://images.friendster.com/200703E/js/modules_friendster.js"></script><style type="text/css">body {background-color:#000000; background-image:url([url]http://i26.photobucket.com/albums/c106/drmzer/DRM2ER/th80.gif[/url] [1] => src="http://images.friendster.com/images/global/friendster_nav_logo.gif" border="0" class="logo" width="130" height="18"></a><script type="text/javascript">if(typeof correctPNGImage == 'function') {correctPNGImage(document.getElementById('f_logo'), 130, 18, 'http://images.friendster.com/images/friendster_nav_logo.png [2] => src="http://images.friendster.com/images/global/search_go_on.png" alt="Search" border="0" class="globnav_inputbtn fakeLink" width="19" height="19"></a><script type="text/javascript">if(typeof correctPNGImage == 'function') {correctPNGImage(document.getElementById('globnav_search_img'), 19, 19, 'http://images.friendster.com/images/search_go_on.png [3] => src="http://images.friendster.com/images/spacer.gif [4] => src="http://images.friendster.com/images/spacer.gif [5] => src="http://images.friendster.com/images/spacer.gif [6] => src="http://photos.friendster.com/photos/02/73/20373720/738411726m.jpg [7] => src="http://photos.friendster.com/photos/02/73/20373720/654616460m.jpg [8] => src="http://photos.friendster.com/photos/02/73/20373720/280126803m.jpg [9] => src="http://photos.friendster.com/photos/02/73/20373720/841893688m.jpg HTML: how do i use the script to detect photos.friendter.com/photos ? $data = file_get_contents("http://www.friendster.com/baddot"); $pattern = "/src=[\â€â€˜]?([^\â€â€˜]?.*(png|jpg|gif))[\â€â€˜]?/i"; //$pattern="photos.friendster.com/photos"; preg_match_all($pattern, $data, $images); print_r($images); PHP:
<?php function grab_friendster_photos( $name ) { preg_match_all( '~src="(http://photos.friendster.com/photos/(.*?).jpg)"~si', file_get_contents( sprintf('http://www.friendster.com/%s', $name ) ), $img ); return $img[1]; } function grab_several_friendster_photos( $names, $assoc = false ) { $start = 0 ; $end = count( $names ); do { preg_match_all( '~src="(http://photos.friendster.com/photos/(.*?).jpg)"~si', file_get_contents( sprintf('http://www.friendster.com/%s', $names[$start] ) ), $img ); $returns[ $assoc ? $names[$start] : $start ] = array_unique( $img[1] ) ; $start++; } while( $start < $end ); return $returns ; } /** print_r( grab_friendster_photos( 'baddot' ) ); **/ /** foreach( grab_friendster_photos( 'baddot' ) as $image ) { printf("<img src='%s' />\n", $image ); } **/ /** // I don't have more than one name to work with, but this will work when you do, returns associative array of return[name] => array( photos ) print_r( grab_several_friendster_photos( array( 'baddot', 'baddot', 'baddot' ), true ) ); **/ /** // returns array return[int] => array( photos ) print_r( grab_several_friendster_photos( array( 'baddot', 'baddot', 'baddot' ), false ) ); **/ PHP:
To do this scraping business you need to learn to use regular expressions. You can see some tutorials for this at: http://www.regular-expressions.info http://weblogtoolscollection.com/regex/regex.php Regular expressions are standard and is supported in many programing languages. It is a valuable knowledge. ~ Thomas
erm what if i just need the variable for photos.friendster.com only how do i do that ? $pattern = "/([^\"']?.*(photos.friendster.com))[\"']?/i"; correct ?
for example i manage to get the codes from www.friendster.com/photos/memberid $data = file_get_contents("http://www.friendster.com/photos/20373720"); $pattern = "/class=[\"']?http://([^\"']?.*(png|jpg|gif|\"))[\"']?/i"; //$pattern="photos.friendster.com/photos"; preg_match_all($pattern, $data, $images); //print_r($images); foreach ($images[0] as $key => $value) { if (eregi("photos.friendster.com", $value)) { echo "<img $value\"><BR>\n"; } } PHP: but how do i remove the javascript codes just to detect the http://filename.jpg ?