view generated source with php

Discussion in 'PHP' started by dramiditis, Aug 9, 2009.

  1. #1
    Hi,
    I would like to know is it possible to "view generated source" of html page with php.
    Thanks
     
    Last edited: Aug 9, 2009
    dramiditis, Aug 9, 2009 IP
  2. kblessinggr

    kblessinggr Peon

    Messages:
    539
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Can you be more specific? any html output by php can be viewed from nearly any browser with 'view source', or using curl in php. Unless you meant something else...
     
    kblessinggr, Aug 9, 2009 IP
  3. dramiditis

    dramiditis Peon

    Messages:
    87
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I can get html source with "file_get_contents" in my php script, but if page have java script that hide some information, like email, phone number, location... than it output something like   and not the real thing. So, if I turn off java script than I don't get anything.
    I have google it around and found add on for mozilla to "view generated source" from webpage and this thing work, it give real address. Also javascript:'<xmp>'%20+%20window.document.body.outerHTML+%20'</xmp>' in IE work, but I would like to do this in my PHP script.

    Thanks
     
    dramiditis, Aug 9, 2009 IP
  4. Kazumael

    Kazumael Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    As far as I know, this is not possible. Javascript runs on the client instead of the server.

    What you could do is reverse engineer the Javascript code which encodes theses hashes and decode the hashes with a PHP script. But this can be tricky job. :eek:

    I'm not trying to poke my nose in others business, but information like email/phone # is encoded for a reason (SPAM!). Remember that in some cases, scraping is illegal. ;)
     
    Kazumael, Aug 9, 2009 IP
  5. dramiditis

    dramiditis Peon

    Messages:
    87
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Yes,

    but I have noticed that with Crome Inspector I can see this, but I would like to do with php this thing.
     
    dramiditis, Aug 9, 2009 IP
  6. Kazumael

    Kazumael Peon

    Messages:
    14
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    I guess Chrome first runs the Javascript before it displays in the inspector.

    Just check the URL with something like cURL or the Lynx browser, which display websites in - real - plain text.
     
    Kazumael, Aug 9, 2009 IP
  7. dimitar christoff

    dimitar christoff Active Member

    Messages:
    882
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    90
    #7
    to do that you need to fetch the url via curl through the listener url. if fopen on remote urls is allowed, that would also work.
     
    dimitar christoff, Aug 9, 2009 IP
  8. kblessinggr

    kblessinggr Peon

    Messages:
    539
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Just like Safari and Firefox , javascript gets run when the document is ready. PHP especially with curl or file_get_contents (which some servers don't allow) cannot execute javascript and such, as a result it will only see what the initial code looks like. This is of course why its important regarding SEO especially if you use a lot of Ajax.
     
    kblessinggr, Aug 9, 2009 IP
  9. dramiditis

    dramiditis Peon

    Messages:
    87
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Yes, I'm trying with curl and file_get_contents, but I still get changed text. Here is exemple:

    http://yellow.local.ch/en/d/Stabio/...psFf8HGW7eDCw?what=luigi&where=Ticino+(Canton)

    As human I can see the email, but with normal "view page source" it becomes ;&#xA0;. With firefox addons or chrome inspector I can see <a href=mailto:xxx@xxxxxx.com>xxxxx</a>.

    So any suggestions, how can I do something like this with php or how can I connect my PHP script with chrome inspector and than from chrome inspector to scrap the real content.

    Thanks
     
    dramiditis, Aug 9, 2009 IP
  10. dannywwww

    dannywwww Well-Known Member

    Messages:
    804
    Likes Received:
    18
    Best Answers:
    0
    Trophy Points:
    110
    #10
    So basically you want to few the HTML code of a web page?

    If so, you can either use file_get_contents() or cURL to fetch the page, then use htmlentities() to parse the page as HTML code.
     
    dannywwww, Aug 9, 2009 IP
  11. dramiditis

    dramiditis Peon

    Messages:
    87
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #11
    So basically I want to grab email address. It doesn't work with htmlentities(). In example above, when you visit webpage you can see that email is : farmacia_pestoni@bluewin.ch, but when I use curl or file_get_contents() instead of email address I get &#xA0;. Also, I try to turn off java script, but than does't get anything.
     
    dramiditis, Aug 9, 2009 IP
  12. dimitar christoff

    dimitar christoff Active Member

    Messages:
    882
    Likes Received:
    62
    Best Answers:
    0
    Trophy Points:
    90
    #12
    er, you won't experience the page as a browser with javascript does, you will only get the serverside generated source. obviously, further modifications to the rendered html can be applied via JAVASCRIPT.

    for instance with the email, it looks like it may be escape() / unescape()'d in javascript in order to obtuse it from bots. my own "mail me" links look like this in the source code:

    <a href="mailto:" class="mailLink" data-user="christoff" data-domain="gmail.com" data-subject="link exchange">mail me</a>
    HTML:
    I then concatenate the parts through javascript so the user can click it w/o a problem. it is something similar to that being used (http://fragged.org/masking-your-email-address-from-links-to-prevent-data-capture-by-bots_522.html).

    <span class="obfuscml" title="hc.niweulb@inotsep_aicamraf">hc.niweulb@inotsep_aicamraf</span> -> looks like the parts of the email reversed - hc. -> becomes .ch - so farmacia_pestoni @ bluewin ch - you CAN write a parser for this in PHP - look for spam class-"obfuscml" and grab the title="" then split it in parts, reverse them and concatenate them
     
    Last edited: Aug 9, 2009
    dimitar christoff, Aug 9, 2009 IP
  13. dramiditis

    dramiditis Peon

    Messages:
    87
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #13
    It was in front of my eyes, but I couldn't see it. Yes letters are in reversed order. Always is simple, but you have to figure it. Thank you man, now I can do it without any problems.

    Thank you a lot.

    Regards
     
    dramiditis, Aug 10, 2009 IP
  14. regster

    regster Guest

    Messages:
    23
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Cheers, I was looking for a solution to this as well.
     
    regster, Aug 10, 2009 IP