Getting whole source from a website (HTML)

Discussion in 'PHP' started by imhawk, Mar 24, 2010.

  1. #1
    Hello,

    can't find it on google so I'll ask you guys over here.
    I know it is possible to get whole source of one website with cURL or any other similar function.

    But, I have a problem. I see only source of plain HTML but no javascript in the file. As an example, if you visit one website with javascript (with your browser) and look at the source with "firebug" or any other similar application, you will see you can actually see WHOLE source of the website after page loads.

    As far as I know, php CAN show whole source (including javascript source and other content)! But I don't know how, because in my opinion php can't run javascript because it is browser-side. But as I said, it is possible ...

    So, is anyone willing to tell me how can it be done? If it is hard to do, I can even pay ...
     
    imhawk, Mar 24, 2010 IP
  2. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #2
    <?php
    $page = file_get_contents('http://www.example.com/');
    echo $page;
    ?>
    PHP:
     
    MyVodaFone, Mar 24, 2010 IP
  3. shockworks

    shockworks Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    If you need source code of JS from some page, you need find src of included JS files (you can use regexp). Than you can this JS files download using fopen() or similar function.
     
    shockworks, Mar 24, 2010 IP
  4. imhawk

    imhawk Well-Known Member

    Messages:
    615
    Likes Received:
    44
    Best Answers:
    0
    Trophy Points:
    140
    #4
    @MyVodaFone, I know this function and I already used it, but it does not show me content with javascript included.
    This function only shows you source same as you would press ctrl+u in browser. It is nothing I need.

    But thanks for trying.

    @shockworks, not source from JS file but source of HTML file which runs JS. As an example look at source of this page, do you see source of ads on the top right? No you don't, so you need to use firebug or similar app to see it. Do you get it what I mean?
     
    imhawk, Mar 24, 2010 IP
  5. ThomasTwen

    ThomasTwen Peon

    Messages:
    113
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #5
    imhawk I think there is a misunderstanding. You dont want to see any javascript code, you want to see the website source code with the changes JavaScript has made to it.

    As you said, this is not possible, because PHP cannot execute JavaScript. Besides, even if it could, Google would probably detect it not being in a browser - and not show you any Google ad code.



    You might want to tell us what you are planning to do with this - there might be a workaround for the problem.
     
    ThomasTwen, Mar 24, 2010 IP
  6. imhawk

    imhawk Well-Known Member

    Messages:
    615
    Likes Received:
    44
    Best Answers:
    0
    Trophy Points:
    140
    #6
    Yes ThomasTwen, you are right. This is what I need.
    Well, I want to see changes happened to a website from an other website I'm parsing data from (website is using ajax to fill some divs).
     
    imhawk, Mar 24, 2010 IP
  7. shockworks

    shockworks Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    You have to scan the URL, which AJAX using (via Firebug etc.). And than you have to call this URL and parse the response. I think.
     
    shockworks, Mar 24, 2010 IP
  8. imhawk

    imhawk Well-Known Member

    Messages:
    615
    Likes Received:
    44
    Best Answers:
    0
    Trophy Points:
    140
    #8
    Can't do, it is somehow protected, page is blank.
    So I want to do it with php and parse the whole site including this new data from direct website.
     
    imhawk, Mar 24, 2010 IP
  9. shockworks

    shockworks Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Maybe a response isn't html, but json.
     
    shockworks, Mar 24, 2010 IP
  10. imhawk

    imhawk Well-Known Member

    Messages:
    615
    Likes Received:
    44
    Best Answers:
    0
    Trophy Points:
    140
    #10
    I get no response. At least not in html or in headers.
    Thats the only chance, to get data with php.
     
    imhawk, Mar 24, 2010 IP
  11. shockworks

    shockworks Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Can you try it, but I have never heard about any PHP function for this.

    But it's strange, that Firebug don't catch any response from some request and on the website is some change.
     
    shockworks, Mar 24, 2010 IP
  12. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #12
    MyVodaFone, Mar 24, 2010 IP
  13. imhawk

    imhawk Well-Known Member

    Messages:
    615
    Likes Received:
    44
    Best Answers:
    0
    Trophy Points:
    140
    #13
    There is no real example so I will point on ads for this one.

    Well there is "moneygram" on the top of your parsed page (in header) you posted ... there should be google ads. My script should be able to read the ads.
     
    imhawk, Mar 24, 2010 IP
  14. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #14
    If its a case of find replace something you could use something like this, see the LINK now, it says This could be your Advert

    
    <?php 
    $page = file_get_contents('http://forums.digitalpoint.com/showthread.php?t=1743456');
    $ads = preg_replace('#Moneygram#','This could be your Advert',$page);
    echo $ads;
    ?>
    
    PHP:
    If you wanted to target google ads, specifically this code will cover all types of google ads.

    
    <?php
    $page = file_get_contents('ENTER A STATIC URL');
    $adsense = 'PUB NUMBER HERE';
    $final_adsense = preg_replace('#google_ad_client = "(.*?)";#','google_ad_client = "pub-'.$adsense.'";',$page);
    $final_adsense = preg_replace('#google_ad_slot#','//google_ad_slot',$final_adsense);
    $final_adsense = preg_replace('#google_ad_channel#','//google_ad_channel',$final_adsense);
    $final_adsense = preg_replace('#google_ad_host#','//google_ad_host',$final_adsense);
    $final_adsense = preg_replace('#partner-pub-(.*?):#','partner-pub-'.$adsense.':',$final_adsense);
    $final_adsense = preg_replace('#google_ad_width = 728;[\r\n]google_ad_height = 15;#','google_ad_width = 728;google_ad_height = 15;google_ad_format = "728x15_0ads_al";',$final_adsense);
    $final_adsense = preg_replace('#google_ad_width = 468;[\r\n]google_ad_height = 15;#','google_ad_width = 468;google_ad_height = 15;google_ad_format = "468x15_0ads_al";',$final_adsense);
    $final_adsense = preg_replace('#google_ad_width = 200;[\r\n]google_ad_height = 90;#','google_ad_width = 200;google_ad_height = 90;google_ad_format = "200x90_0ads_al";',$final_adsense);
    $final_adsense = preg_replace('#google_ad_width = 180;[\r\n]google_ad_height = 90;#','google_ad_width = 180;google_ad_height = 90;google_ad_format = "180x90_0ads_al";',$final_adsense);
    $final_adsense = preg_replace('#google_ad_width = 160;[\r\n]google_ad_height = 90;#','google_ad_width = 160;google_ad_height = 90;google_ad_format = "160x90_0ads_al";',$final_adsense);
    $final_adsense = preg_replace('#google_ad_width = 120;[\r\n]google_ad_height = 90;#','google_ad_width = 120;google_ad_height = 90;google_ad_format = "120x90_0ads_al";',$final_adsense);
    
    echo $final_adsense;
    ?>
    
    PHP:
     
    MyVodaFone, Mar 24, 2010 IP
  15. imhawk

    imhawk Well-Known Member

    Messages:
    615
    Likes Received:
    44
    Best Answers:
    0
    Trophy Points:
    140
    #15
    It's not that, I'll show you screenshot.

    [​IMG]

    This is source of the AD which I can't get if I parse the page with php, maybe you can?
     
    imhawk, Mar 24, 2010 IP
  16. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #16
    In particular DP has its own adserver, and most likely it has its own restrictions set to its ads, I would say one of those restrictions is, ( if the url doesnt match forums.digitalpoint.com then it just shows a default ad, in this case thats the moneygram ad, however you dont need to see what the ad would or could show, you can place your own ads into to div tags.

    Please be specific with a reply so I can help you better.
     
    MyVodaFone, Mar 25, 2010 IP
  17. danx10

    danx10 Peon

    Messages:
    1,179
    Likes Received:
    44
    Best Answers:
    2
    Trophy Points:
    0
    #17
    In that case:

    <?php
    $url = 'http://www.example.com';
    $iframes = preg_match_all('/(iframe.*src\s*\=\s*(?:"|\')(.*)(?:"|\'))/Ui', @file_get_contents($url), $frames);
    
    foreach($frames[2] as $frame){
    highlight_string(file_get_contents($frame)).'<br>';
    }
    ?>
    PHP:
    This will show the source of the iframe'd content (which i presume is the ad source??)
     
    danx10, Mar 25, 2010 IP
  18. imhawk

    imhawk Well-Known Member

    Messages:
    615
    Likes Received:
    44
    Best Answers:
    0
    Trophy Points:
    140
    #18
    @MyVodaFone, digitalpoint's ads are just example I can use to show you what kind of informations I need.

    @danx10, this would work if actual iframe is used in the source code, but there is just <script src="....."> and some parameters for ads, so after you load the page, new code is generated with javascript. I can't see this code in PHP as you can't see/find it if you press CTRL+U to view this page's source.
     
    imhawk, Mar 25, 2010 IP
  19. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #19
    Ok, lets start again,,,,

    Can you PM me with a url you want to see the source code of ads with, please also explain what you want to do with the ads.

    Just remember that javascript runs from your browser and php runs from the websites server, but that doesnt mean we cant manipulate what is show through the browser, we can alter text/html, replacing almost anything.

    So when your ready, as above send me a url and what you would like changed to your advantage on that url.
     
    MyVodaFone, Mar 25, 2010 IP
  20. ThomasTwen

    ThomasTwen Peon

    Messages:
    113
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #20
    imhawk - It's just not possible with PHP. I'm sorry, but that's it. You might want to develop a Firefox extension that does the same thing - it can access the website contents after JavaScript was executed!
     
    ThomasTwen, Mar 27, 2010 IP