getting source code

dcole07 Peon

Messages:: 135

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#1

how can I get the html source code of any webpage a browser can see with php? This includes dynamic pages!

if I use fopen, scripts can be blocked from seeing the code some how... try getting the source code of mogaard.ath.cx for example (with a script)

dcole07, Jul 27, 2006 IP

frankcow Well-Known Member

Messages:: 4,859

Likes Received:: 265

Best Answers:: 0

Trophy Points:: 180

#2

you can't get the source code of a server side script, that's a very basic security precaution

frankcow, Jul 27, 2006 IP

TheSyndicate Prominent Member

Messages:: 5,410

Likes Received:: 289

Best Answers:: 0

Trophy Points:: 365

#3

If you want it you need to ask some one to make a script for you that does the same thing.

Clone it as they say

TheSyndicate, Jul 27, 2006 IP

almrshal Guest

Messages:: 2

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#4

you can ... just try this code
<?


$host = "www.site.com";
$path = "/insidefolder/page.html";


if ($fp = @fsockopen($host, 80, $errno, $errstr, 5)) {
    fputs($fp, "GET $path HTTP/1.1\r\n");
    fputs($fp, "Host: $host\r\n");
    fputs($fp, "User-Agent: {$_SERVER['HTTP_USER_AGENT']}\r\n");
    fputs($fp, "\r\n");

   $content='';
    while (!feof($fp)) {
 
    $content.=fgets($fp, 1024);
    
       }
       
preg_match_all('|<[^>]+>(.*)</[^>]+>|U',$content,$output);




for($i=0 ;$i<2000;$i++; )
{

echo $output[0][$i] ;

echo"<br>";



};


} else {


    print "Unable to connect: $errno :: $errstr";


}


?>
PHP:
just try it ... change

$host = "www.site.com";
$path = "/insidefolder/page.html";

this was if ur link is :: www.site.com/insidefolder/page.html

or ::: php files , no matter but it will apear in html code , not php ...

---------------
for($i=0 ;$i<2000;$i++; )
{

echo $output[0][$i] ;

echo"<br>";



};
PHP:
change this peice of code in the previus script ... to control how to write code in ur page ...

regards

Almrshal

almrshal, Jul 27, 2006 IP

SexualChocolate Peon

Messages:: 111

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#5

You can get the html code with file_get_contents() (assuming you have PHP5, otherwise you will have to use fopen() )

SexualChocolate, Jul 27, 2006 IP

gastongr Well-Known Member

Messages:: 421

Likes Received:: 13

Best Answers:: 0

Trophy Points:: 108

#6

He wants to view the php code.
that's not possible, whenever you request a php file from a webserver it'll be executed and only the output will be sent to the browser.

gastongr, Jul 27, 2006 IP

PinoyIto Notable Member

Messages:: 5,863

Likes Received:: 170

Best Answers:: 0

Trophy Points:: 260

#7

gastongr said: ↑

He wants to view the php code.
that's not possible, whenever you request a php file from a webserver it'll be executed and only the output will be sent to the browser.
Click to expand...

Yes I agree, all server side script will be executed in server side before you see it. That is basic security as frankcow said.

PinoyIto, Jul 27, 2006 IP

dcole07 Peon

Messages:: 135

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#8

gastongr said: ↑

He wants to view the php code.
that's not possible, whenever you request a php file from a webserver it'll be executed and only the output will be sent to the browser.
Click to expand...

NO... I want the html code. But I want it done by a script! but using fopen the script doesn't work all the time, someone I was talking to said something like admin of the server could block scripts from getting the html code or something.

so how do I get the html code of a page 100% of the time. I have a script and it could get the HTML of W3Schools and my own site but not my friends site(http://mogaard.ath.cx)

dcole07, Jul 27, 2006 IP

mad4 Peon

Messages:: 6,986

Likes Received:: 493

Best Answers:: 0

Trophy Points:: 0

#9

Another site can block your IP address from accessing the site if they want to. Otherwise you can just use fopen or file_get_contents.

If you were screen scraping my site I would block your IP.

mad4, Jul 28, 2006 IP

coderlinks Peon

Messages:: 282

Likes Received:: 19

Best Answers:: 0

Trophy Points:: 0

#10

I think he is talking about the admin blocking the use of fopen() to open external URLs by disabling allow_url_fopen
http://in2.php.net/manual/en/ref.filesystem.php#ini.allow-url-fopen

You can try using cURL instead.
http://www.php.net/curl

Here is an example of how to do it:
$ch = curl_init(); // create new curl handle
//set the URL to fetch
curl_setopt($ch,CURLOPT_URL,"http://www.site.com/blah.php");
// do not output the HTTP reply header
curl_setopt($ch,CURLOPT_HEADER,0);
// return output in variable and not directly to browser
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
$output = curl_exec($ch); // send the request and get the output
curl_close($ch); // close the curl handle
PHP:
After that, $output would contain the HTML code if the data transfer worked without errors. It would contain FALSE if there was any error. Then you can get the error using:
echo curl_error($ch);
PHP:
I hope that was what you wanted.
Thomas

coderlinks, Jul 28, 2006 IP

T0PS3O Feel Good PLC

Messages:: 13,219

Likes Received:: 777

Best Answers:: 0

Trophy Points:: 0

#11

Indeed curl or simply file_get_contents() are easiest.

It works in PHP4 as well btw:

http://uk2.php.net/file_get_contents

T0PS3O, Jul 28, 2006 IP

frankcow Well-Known Member

Messages:: 4,859

Likes Received:: 265

Best Answers:: 0

Trophy Points:: 180

#12

Yellowberry.org said: ↑

If you want it you need to ask some one to make a script for you that does the same thing.

Clone it as they say
Click to expand...

correction: If you want it you need to pay some one to make a script for you that does the same thing.

frankcow, Jul 28, 2006 IP

dcole07 Peon

Messages:: 135

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#13

Well I'm developing a small php search engine...

I was just running the ranking part of it and it had some problems getting the source code of a friends site... so I knew he didn't block me (we even ran the script on his server)

I'm the admin of my server... it's in my house! so I'm not blocking anything from me.

dcole07, Jul 28, 2006 IP

T0PS3O Feel Good PLC

Messages:: 13,219

Likes Received:: 777

Best Answers:: 0

Trophy Points:: 0

#14

If you ran it on the same server you wanted to crawl then it's likely to be the DNS issue I had on my server when I tried to do the exact same thing. I forgot the details of it but it's to do with the firewall routing only external traffic to port 80, 'internal' requests, you crawling your own site, can get blocked that way. Not really blocked but there's just no route to the content. It's beyond my knowledge and interest of DNS stuff but that might well be it. If so, you'll find that if you run the same script from a different server, it indexes that site just fine.

T0PS3O, Jul 28, 2006 IP

dcole07 Peon

Messages:: 135

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#15

cool, that would most likely be the problem then!

Can fopen() or get_file_content() get dynamic pages... like http://example.com?page=013012

-edit- I just tested it and it worked so that's a yes... why do some people say search engine's can't get dynamic pages? (like the stuff after the ?)

dcole07, Jul 28, 2006 IP

TwistMyArm Peon

Messages:: 931

Likes Received:: 44

Best Answers:: 0

Trophy Points:: 0

#16

It's not that they can't, it's that they won't. For various reasons, but one big reason is that you can continually make links dynamically, making pages dynamically, making your site look bigger and so on, even though it's all the same basic stuff.

Somewhere on the Google site (don't make me look it up), they say that they won't spider any page that has an 'id' parameter in it, for example.

TwistMyArm, Jul 28, 2006 IP

PinoyIto Notable Member

Messages:: 5,863

Likes Received:: 170

Best Answers:: 0

Trophy Points:: 260

#17

TwistMyArm said: ↑

Somewhere on the Google site (don't make me look it up), they say that they won't spider any page that has an 'id' parameter in it, for example.
Click to expand...

I don't think so... check this out http://www.google.com/search?source...B2GGGL_en__177&q=site:forums.digitalpoint.com

PinoyIto, Jul 28, 2006 IP

born2win Well-Known Member

Messages:: 559

Likes Received:: 13

Best Answers:: 0

Trophy Points:: 128

#18

TwistMyArm said: ↑

Somewhere on the Google site (don't make me look it up), they say that they won't spider any page that has an 'id' parameter in it, for example.
Click to expand...

I wont agree with you Twist. My site is using id as parameter and so far my pages have been indexed in all serach engines including google.

born2win, Jul 28, 2006 IP

coderlinks Peon

Messages:: 282

Likes Received:: 19

Best Answers:: 0

Trophy Points:: 0

#19

dcole07 said: ↑

Well I'm developing a small php search engine...

I was just running the ranking part of it and it had some problems getting the source code of a friends site... so I knew he didn't block me (we even ran the script on his server)

I'm the admin of my server... it's in my house! so I'm not blocking anything from me.
Click to expand...

If you want to do something in PHP. Its probably been done already. I found this web spider written in PHP.

http://www.phpdig.net/
http://www.phpdig.net/navigation.php?action=download
http://sourceforge.net/projects/phpdig

All the above links go to the same thing.
Thomas

coderlinks, Jul 28, 2006 IP

TwistMyArm Peon

Messages:: 931

Likes Received:: 44

Best Answers:: 0

Trophy Points:: 0

#20

@PinoyIto and born2win:

You might be right, but I'm just saying what Google says. See the last point under 'Technical guidelines' at http://www.google.com/support/webmasters/bin/answer.py?answer=35769 .

TwistMyArm, Jul 29, 2006 IP

Log in or Sign up

getting source code

dcole07 Peon

frankcow Well-Known Member

TheSyndicate Prominent Member

almrshal Guest

SexualChocolate Peon

gastongr Well-Known Member

PinoyIto Notable Member

dcole07 Peon

mad4 Peon

coderlinks Peon

T0PS3O Feel Good PLC

frankcow Well-Known Member

dcole07 Peon

T0PS3O Feel Good PLC

dcole07 Peon

TwistMyArm Peon

PinoyIto Notable Member

born2win Well-Known Member

coderlinks Peon

TwistMyArm Peon

Useful Searches