getting source code

Discussion in 'PHP' started by dcole07, Jul 27, 2006.

  1. #1
    how can I get the html source code of any webpage a browser can see with php? This includes dynamic pages!

    if I use fopen, scripts can be blocked from seeing the code some how... try getting the source code of mogaard.ath.cx for example (with a script)
     
    dcole07, Jul 27, 2006 IP
  2. frankcow

    frankcow Well-Known Member

    Messages:
    4,859
    Likes Received:
    265
    Best Answers:
    0
    Trophy Points:
    180
    #2
    you can't get the source code of a server side script, that's a very basic security precaution
     
    frankcow, Jul 27, 2006 IP
  3. TheSyndicate

    TheSyndicate Prominent Member

    Messages:
    5,410
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    365
    #3
    If you want it you need to ask some one to make a script for you that does the same thing.

    Clone it as they say
     
    TheSyndicate, Jul 27, 2006 IP
  4. almrshal

    almrshal Guest

    Messages:
    2
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    you can ... just try this code

    <?
    
    
    $host = "www.site.com";
    $path = "/insidefolder/page.html";
    
    
    if ($fp = @fsockopen($host, 80, $errno, $errstr, 5)) {
        fputs($fp, "GET $path HTTP/1.1\r\n");
        fputs($fp, "Host: $host\r\n");
        fputs($fp, "User-Agent: {$_SERVER['HTTP_USER_AGENT']}\r\n");
        fputs($fp, "\r\n");
    
       $content='';
        while (!feof($fp)) {
     
        $content.=fgets($fp, 1024);
        
           }
           
    preg_match_all('|<[^>]+>(.*)</[^>]+>|U',$content,$output);
    
    
    
    
    for($i=0 ;$i<2000;$i++; )
    {
    
    echo $output[0][$i] ;
    
    echo"<br>";
    
    
    
    };
    
    
    } else {
    
    
        print "Unable to connect: $errno :: $errstr";
    
    
    }
    
    
    ?>
    
    
    
    
    
    PHP:


    just try it ... change

    $host = "www.site.com";
    $path = "/insidefolder/page.html";

    this was if ur link is :: www.site.com/insidefolder/page.html

    or ::: php files , no matter but it will apear in html code , not php ...


    ---------------

    
    for($i=0 ;$i<2000;$i++; )
    {
    
    echo $output[0][$i] ;
    
    echo"<br>";
    
    
    
    };
    
    
    
    PHP:
    change this peice of code in the previus script ... to control how to write code in ur page ...

    regards

    Almrshal
     
    almrshal, Jul 27, 2006 IP
  5. SexualChocolate

    SexualChocolate Peon

    Messages:
    111
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    You can get the html code with file_get_contents() (assuming you have PHP5, otherwise you will have to use fopen() )
     
    SexualChocolate, Jul 27, 2006 IP
  6. gastongr

    gastongr Well-Known Member

    Messages:
    421
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    108
    #6
    He wants to view the php code.
    that's not possible, whenever you request a php file from a webserver it'll be executed and only the output will be sent to the browser.
     
    gastongr, Jul 27, 2006 IP
  7. PinoyIto

    PinoyIto Notable Member

    Messages:
    5,863
    Likes Received:
    170
    Best Answers:
    0
    Trophy Points:
    260
    #7
    Yes I agree, all server side script will be executed in server side before you see it. That is basic security as frankcow said.
     
    PinoyIto, Jul 27, 2006 IP
  8. dcole07

    dcole07 Peon

    Messages:
    135
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #8
    NO... I want the html code. But I want it done by a script! but using fopen the script doesn't work all the time, someone I was talking to said something like admin of the server could block scripts from getting the html code or something.

    so how do I get the html code of a page 100% of the time. I have a script and it could get the HTML of W3Schools and my own site but not my friends site(http://mogaard.ath.cx)
     
    dcole07, Jul 27, 2006 IP
  9. mad4

    mad4 Peon

    Messages:
    6,986
    Likes Received:
    493
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Another site can block your IP address from accessing the site if they want to. Otherwise you can just use fopen or file_get_contents.

    If you were screen scraping my site I would block your IP.
     
    mad4, Jul 28, 2006 IP
  10. coderlinks

    coderlinks Peon

    Messages:
    282
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #10
    I think he is talking about the admin blocking the use of fopen() to open external URLs by disabling allow_url_fopen
    http://in2.php.net/manual/en/ref.filesystem.php#ini.allow-url-fopen

    You can try using cURL instead.
    http://www.php.net/curl

    Here is an example of how to do it:
    
    $ch = curl_init(); // create new curl handle
    //set the URL to fetch
    curl_setopt($ch,CURLOPT_URL,"http://www.site.com/blah.php");
    // do not output the HTTP reply header
    curl_setopt($ch,CURLOPT_HEADER,0);
    // return output in variable and not directly to browser
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
    $output = curl_exec($ch); // send the request and get the output
    curl_close($ch); // close the curl handle
    
    PHP:
    After that, $output would contain the HTML code if the data transfer worked without errors. It would contain FALSE if there was any error. Then you can get the error using:
    
    echo curl_error($ch);
    
    PHP:
    I hope that was what you wanted.
    Thomas
     
    coderlinks, Jul 28, 2006 IP
  11. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #11
    T0PS3O, Jul 28, 2006 IP
  12. frankcow

    frankcow Well-Known Member

    Messages:
    4,859
    Likes Received:
    265
    Best Answers:
    0
    Trophy Points:
    180
    #12
    correction: If you want it you need to pay some one to make a script for you that does the same thing.
     
    frankcow, Jul 28, 2006 IP
  13. dcole07

    dcole07 Peon

    Messages:
    135
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Well I'm developing a small php search engine...

    I was just running the ranking part of it and it had some problems getting the source code of a friends site... so I knew he didn't block me (we even ran the script on his server)

    I'm the admin of my server... it's in my house! so I'm not blocking anything from me.
     
    dcole07, Jul 28, 2006 IP
  14. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #14
    If you ran it on the same server you wanted to crawl then it's likely to be the DNS issue I had on my server when I tried to do the exact same thing. I forgot the details of it but it's to do with the firewall routing only external traffic to port 80, 'internal' requests, you crawling your own site, can get blocked that way. Not really blocked but there's just no route to the content. It's beyond my knowledge and interest of DNS stuff but that might well be it. If so, you'll find that if you run the same script from a different server, it indexes that site just fine.
     
    T0PS3O, Jul 28, 2006 IP
  15. dcole07

    dcole07 Peon

    Messages:
    135
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #15
    cool, that would most likely be the problem then!

    Can fopen() or get_file_content() get dynamic pages... like http://example.com?page=013012

    -edit- I just tested it and it worked so that's a yes... why do some people say search engine's can't get dynamic pages? (like the stuff after the ?)
     
    dcole07, Jul 28, 2006 IP
  16. TwistMyArm

    TwistMyArm Peon

    Messages:
    931
    Likes Received:
    44
    Best Answers:
    0
    Trophy Points:
    0
    #16
    It's not that they can't, it's that they won't. For various reasons, but one big reason is that you can continually make links dynamically, making pages dynamically, making your site look bigger and so on, even though it's all the same basic stuff.

    Somewhere on the Google site (don't make me look it up), they say that they won't spider any page that has an 'id' parameter in it, for example.
     
    TwistMyArm, Jul 28, 2006 IP
  17. PinoyIto

    PinoyIto Notable Member

    Messages:
    5,863
    Likes Received:
    170
    Best Answers:
    0
    Trophy Points:
    260
    #17
    PinoyIto, Jul 28, 2006 IP
  18. born2win

    born2win Well-Known Member

    Messages:
    559
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    128
    #18
    I wont agree with you Twist. My site is using id as parameter and so far my pages have been indexed in all serach engines including google.
     
    born2win, Jul 28, 2006 IP
  19. coderlinks

    coderlinks Peon

    Messages:
    282
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #19
    If you want to do something in PHP. Its probably been done already. I found this web spider written in PHP.

    http://www.phpdig.net/
    http://www.phpdig.net/navigation.php?action=download
    http://sourceforge.net/projects/phpdig

    All the above links go to the same thing.
    Thomas
     
    coderlinks, Jul 28, 2006 IP
  20. TwistMyArm

    TwistMyArm Peon

    Messages:
    931
    Likes Received:
    44
    Best Answers:
    0
    Trophy Points:
    0
    #20
    TwistMyArm, Jul 29, 2006 IP