I don't have any PHP experience or anything like that to write my own script to gather text off another site. What should I use to do this? Is there a program for someone like me without a lot of programming experience?
To get HTML codes of another website use file_get_contents its pretty easy. For example: <?php $siteurl="http://anysite.com"; $getsite=file_get_contents($siteurl); echo $getsite; ?> PHP: Hope it helps.
There's a lot of functions, like "curl" that allows you to download HTML of a certain site. Then you can work with that HTML inside a variable, to filter, modify or extract information into several other variables and then print the results in your site, following your layout and design.
learn the following functions and you can already achieve a lot: file_get_contents() explode() eregi_replace() make yourself familar what arrays are and how to reverse and sort them. php is a very big language, but you can achieve a lot with some of the most basic commands.
Curl is very recommended script for you to learn, because many hosting not allowed file function such us file_get_contents($siteurl); to run on their host. A simple curl application to grab the page: <?php $url_l = 'http://google.com'; $c2 = curl_init(); curl_setopt( $c2, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT'] ); curl_setopt( $c2, CURLOPT_FOLLOWLOCATION, 1 ); curl_setopt ($c2, CURLOPT_HEADER,1); curl_setopt($c2, CURLOPT_URL, $url_l); curl_setopt($c2, CURLOPT_RETURNTRANSFER, true); $output2 = curl_exec($c2); curl_close($c2); // do some extraction using explode, strstr, str_replace, eregi_replace, etc echo $output2; ?> Code (markup):
Check out this. It's called htmlSQL it takes html pages in the direction of doing mysql query's and it very useful for scraping information from other websites. http://www.jonasjohn.de/lab/htmlsql.htm
curl is no language, curl is a tool curl can crawl anything and mimic a real users browser, but to weed out the contents you have to use PHP or some other stringhandling-able scripting language or facility.