How do I Scrape Content Off Another Site?

Shadowplay Peon

Messages:: 394

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 0

#1

I don't have any PHP experience or anything like that to write my own script to gather text off another site. What should I use to do this? Is there a program for someone like me without a lot of programming experience?

Shadowplay, Oct 26, 2008 IP

Sillysoft Active Member

Messages:: 177

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 58

#2

Snoopy class is a good one for scraping off another site.

Sillysoft, Oct 26, 2008 IP

mehdi Peon

Messages:: 258

Likes Received:: 12

Best Answers:: 0

Trophy Points:: 0

#3

To get HTML codes of another website use file_get_contents its pretty easy.

For example:
<?php
$siteurl="http://anysite.com";
$getsite=file_get_contents($siteurl);
echo $getsite;
?>
PHP:
Hope it helps.

mehdi, Oct 27, 2008 IP

six.sigma Peon

Messages:: 42

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#4

There's a lot of functions, like "curl" that allows you to download HTML of a certain site.
Then you can work with that HTML inside a variable, to filter, modify or extract information into several other variables and then print the results in your site, following your layout and design.

six.sigma, Oct 31, 2008 IP

happpy Well-Known Member

Messages:: 926

Likes Received:: 14

Best Answers:: 0

Trophy Points:: 120

#5

learn the following functions and you can already achieve a lot:

file_get_contents()
explode()
eregi_replace()

make yourself familar what arrays are and how to reverse and sort them.

php is a very big language, but you can achieve a lot with some of the most basic commands.

happpy, Oct 31, 2008 IP

Conello Member

Messages:: 59

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 45

#6

Curl is very recommended script for you to learn, because many hosting not allowed file function such us file_get_contents($siteurl); to run on their host.

A simple curl application to grab the page:
<?php
  $url_l = 'http://google.com';
  $c2 = curl_init();
  curl_setopt( $c2, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT'] );
  curl_setopt( $c2, CURLOPT_FOLLOWLOCATION, 1 );
  curl_setopt ($c2, CURLOPT_HEADER,1);
  curl_setopt($c2, CURLOPT_URL, $url_l);
  curl_setopt($c2, CURLOPT_RETURNTRANSFER, true);
  $output2 = curl_exec($c2);
  curl_close($c2);

  // do some extraction using explode, strstr, str_replace, eregi_replace, etc
  echo $output2;
?>
Code (markup):

Conello, Nov 1, 2008 IP

exodus Well-Known Member

Messages:: 1,900

Likes Received:: 35

Best Answers:: 0

Trophy Points:: 165

#7

Check out this. It's called htmlSQL it takes html pages in the direction of doing mysql query's and it very useful for scraping information from other websites.

http://www.jonasjohn.de/lab/htmlsql.htm

exodus, Nov 1, 2008 IP

techcone Banned

Messages:: 206

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#8

Curl is the master of all scraping languages

techcone, Nov 1, 2008 IP

Calon Peon

Messages:: 25

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#9

You don't even need PHP to do it, but you can use curl.

Calon, Nov 2, 2008 IP

happpy Well-Known Member

Messages:: 926

Likes Received:: 14

Best Answers:: 0

Trophy Points:: 120

#10

curl is no language, curl is a tool

curl can crawl anything and mimic a real users browser, but to weed out the contents you have to use PHP or some other stringhandling-able scripting language or facility.

happpy, Nov 2, 2008 IP

Log in or Sign up

How do I Scrape Content Off Another Site?

Shadowplay Peon

Sillysoft Active Member

mehdi Peon

six.sigma Peon

happpy Well-Known Member

Conello Member

exodus Well-Known Member

techcone Banned

Calon Peon

happpy Well-Known Member

Useful Searches