Pull data from other website

kichus Peon

Messages:: 188

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#1

Hi there,

How can I pull data from other websites and get it stored in my database.I need the data for reporting process.

Thanks

kichus, Aug 25, 2007 IP

Coder Banned

Messages:: 311

Likes Received:: 8

Best Answers:: 0

Trophy Points:: 0

#2

What type of data you need to pull?

Coder, Aug 25, 2007 IP

Kuldeep1952 Active Member

Messages:: 290

Likes Received:: 18

Best Answers:: 0

Trophy Points:: 60

#3

In order to get data from other websites in PHP, you can use cURL. You can find more info at http://au.php.net/curl.

Kuldeep1952, Aug 25, 2007 IP

greenrob Peon

Messages:: 58

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#4

or you can use the php file function and then post process teh data

greenrob, Aug 26, 2007 IP

Andy Peters Peon

Messages:: 430

Likes Received:: 22

Best Answers:: 0

Trophy Points:: 0

#5

But don't bother, use curl because it's faster and easier.

$url="http://anything";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec ($ch);
curl_close ($ch);
// you can do something with $data like explode(); or a preg match regex to get the exact information you need
echo $data;

PHP:

Andy Peters, Aug 26, 2007 IP

ErectADirectory Guest

Messages:: 656

Likes Received:: 65

Best Answers:: 0

Trophy Points:: 0

#6

If you have access to the function, file_get_contents() is faster (to type) than all those extra lines of cURL. Check it out ...
$text = file_get_contents('http://www.mypage.com/') ; // scrape page into variable
preg_match ("/([^`]*?)/", $text, $temp); // get data out of the page
echo htmlentities($temp[0]) ; // spits out the 1st occurance of your data
PHP:
It can get more complicated than the above code but it really depends on what you need harvested.

If you don't have access to file_get_contents() you could write a function to automate all the cURL stuff that will work the same as file_get_contents. I think cURL is a bit faster so it might be smart to go ahead and use it.
function file_get_the_contents($url) {
  $ch = curl_init();
  $timeout = 10; // set to zero for no timeout
  curl_setopt ($ch, CURLOPT_URL, $url);
  curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
  $file_contents = curl_exec($ch);
  curl_close($ch);
  return $file_contents;
}

// now start your data harvesting
$text = file_get_the_contents('http://www.mypage.com/') ; // scrape page into variable
preg_match ("/([^`]*?)/", $text, $temp); // get data out of the page
echo htmlentities($temp[0]) ; // spits out the 1st occurance of your data
PHP:

ErectADirectory, Aug 26, 2007 IP

ritadebock Peon

Messages:: 344

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#7

ErectADirectory said: ↑
If you have access to the function, file_get_contents() is faster (to type) than all those extra lines of cURL. Check it out ...
$text = file_get_contents('http://www.mypage.com/') ; // scrape page into variable
preg_match ("/([^`]*?)/", $text, $temp); // get data out of the page
echo htmlentities($temp[0]) ; // spits out the 1st occurance of your data
PHP:
It can get more complicated than the above code but it really depends on what you need harvested.

If you don't have access to file_get_contents() you could write a function to automate all the cURL stuff that will work the same as file_get_contents. I think cURL is a bit faster so it might be smart to go ahead and use it.
function file_get_the_contents($url) {
  $ch = curl_init();
  $timeout = 10; // set to zero for no timeout
  curl_setopt ($ch, CURLOPT_URL, $url);
  curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
  $file_contents = curl_exec($ch);
  curl_close($ch);
  return $file_contents;
}

// now start your data harvesting
$text = file_get_the_contents('http://www.mypage.com/') ; // scrape page into variable
preg_match ("/([^`]*?)/", $text, $temp); // get data out of the page
echo htmlentities($temp[0]) ; // spits out the 1st occurance of your data
PHP:
Click to expand...
nice, good to know

ritadebock, Aug 26, 2007 IP

ssanders82 Peon

Messages:: 77

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 0

#8

The benefit of curl over file_get_contents is curl allows you to do stuff like post data, follow redirects, spoof user agent, accept cookies, etc.

ssanders82, Aug 27, 2007 IP

kichus Peon

Messages:: 188

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#9

Hi All,

I used curl_init() and it worked..

used some RegExp to get the particular piece of information i needed.

Special thanks to Andy Peters and ErectADirectory.

kichus, Sep 16, 2007 IP

dados Greenhorn

Messages:: 14

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 11

#10

Can you please give me some advice for this problem, I do it site in CMS, I have boxes on site http://www.istoots.com and in this boxes must pull automatic information from this site http://www.dodtracker.com/ but only in couple boxes, can somebody help me to give me advice hove I can do this...

Thanks.

dados, Jun 3, 2009 IP

radiotiger Peon

Messages:: 33

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#11

Why do you want to pull data from others websites, will it not be a duplicate content ?

radiotiger, Jan 29, 2010 IP

radiotiger Peon

Messages:: 33

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#12

Also there is something called WPRobot for automated posting of content on your wordpress. are you talking about such a thing?

radiotiger, Jan 29, 2010 IP

GeorgeBaker Peon

Messages:: 9

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#13

Hi Andy,

Hope your still in here sometimes

How would the code look like if i need to login to a website?

I have this code but it doesn't work.
$username = 'xxxxxx';
$password = 'yyyyy1';

$url = 'http://www.fracsoft.com';

$context = stream_context_create(array(
'http' => array(
'header' => "Authorization: Basic " . base64_encode("$username:$password")
)
));
$data = file_get_contents($url, false, $context);
// echo $data

GeorgeBaker, Jun 20, 2010 IP

JavaDeveloper Peon

Messages:: 1

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#14

I have created a program in Java that pulls data from a website.
To read in page:
public static BufferedReader read(String url) throws Exception
{
    return new BufferedReader
   (
     new InputStreamReader
     (
       new URL(url).openStream()
     )
   );
}
Code (markup):
Then I find instances of particular exclusive chars to find the starting point of my data
int start = line.indexOf(">") + 1;
Code (markup):
and then afterward, I find instances of the next char to end mark the end of the information I am looking for
int start = line.indexOf("/") - 4;
Code (markup):
then I run a loop from the start to the finish and append a String
String whatIwant = "";

for (int i = start; i < end; i++)
{
       whatIwant = (whatIwant + line.charAt(i));
}
Code (markup):
Then I finally print that data to a file or screen.

This may be slow but I have not had any trouble getting all the data in a situation where the pages announce the changing value in the url... I increment the value (or pull the data from a predefined text file) and reinitiate the URL from another section of code... The advantage is that it actually loads the entire page to gather the data so you are able to capture anything that is sent to the presentation layer without risking 'hacking' the website. Simply put, for them to block this, they would have to block an address for accessing their website to many times. As it stands, I am increasing their ranking anyway.

Any questions will not be answered unless written on a $20 bill and sent to my address.
(for those not familiar with Java... this will be wasted... if you are, you have enough information to do what I have done.)

JavaDeveloper, Jan 8, 2012 IP

kris.baj422 Peon

Messages:: 2

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#15

file_gets_contents() is the best option....

kris.baj422, Jan 8, 2012 IP

kris.baj422 Peon

Messages:: 2

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#16

JavaDeveloper said: ↑
I have created a program in Java that pulls data from a website.
To read in page:
public static BufferedReader read(String url) throws Exception
{
    return new BufferedReader
   (
     new InputStreamReader
     (
       new URL(url).openStream()
     )
   );
}
Code (markup):
Then I find instances of particular exclusive chars to find the starting point of my data
int start = line.indexOf(">") + 1;
Code (markup):
and then afterward, I find instances of the next char to end mark the end of the information I am looking for
int start = line.indexOf("/") - 4;
Code (markup):
then I run a loop from the start to the finish and append a String
String whatIwant = "";

for (int i = start; i < end; i++)
{
       whatIwant = (whatIwant + line.charAt(i));
}
Code (markup):
Then I finally print that data to a file or screen.

This may be slow but I have not had any trouble getting all the data in a situation where the pages announce the changing value in the url... I increment the value (or pull the data from a predefined text file) and reinitiate the URL from another section of code... The advantage is that it actually loads the entire page to gather the data so you are able to capture anything that is sent to the presentation layer without risking 'hacking' the website. Simply put, for them to block this, they would have to block an address for accessing their website to many times. As it stands, I am increasing their ranking anyway.

Any questions will not be answered unless written on a $20 bill and sent to my address.
(for those not familiar with Java... this will be wasted... if you are, you have enough information to do what I have done.)
Click to expand...
I think he need only PHP Code.

kris.baj422, Jan 8, 2012 IP

gabstero Peon

Messages:: 7

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#17

hi gang,

sorry to resurrect this thread, but I also would like to pull certain snippets of text from a web page that's wrapped in a div tag that has a class. Something similar to this:
<span class="OutOfStock">Out of stock</span>
HTML:
is this possible?

thanks,
gabstero

gabstero, Sep 4, 2012 IP

DomainerHelper Well-Known Member

Messages:: 445

Likes Received:: 20

Best Answers:: 0

Trophy Points:: 100

#18

Yes. It is possible to get any data from a page.

DomainerHelper, Sep 4, 2012 IP

gabstero Peon

Messages:: 7

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#19

DomainerHelper said: ↑

Yes. It is possible to get any data from a page.
Click to expand...

can you point me to a source, or code, please?

thanks,
gabstero

gabstero, Sep 4, 2012 IP

DomainerHelper Well-Known Member

Messages:: 445

Likes Received:: 20

Best Answers:: 0

Trophy Points:: 100

#20

Way too much to teach you for free bro. You need to learn regular expressions (regex) and functions like preg_match_all(). Any links I send would be via google, which you are capable of.

DomainerHelper, Sep 4, 2012 IP

Log in or Sign up

Pull data from other website

kichus Peon

Coder Banned

Kuldeep1952 Active Member

greenrob Peon

Andy Peters Peon

ErectADirectory Guest

ritadebock Peon

ssanders82 Peon

kichus Peon

dados Greenhorn

radiotiger Peon

radiotiger Peon

GeorgeBaker Peon

JavaDeveloper Peon

kris.baj422 Peon

kris.baj422 Peon

gabstero Peon

DomainerHelper Well-Known Member

gabstero Peon

DomainerHelper Well-Known Member

Useful Searches