How to do perfect browser simulation?

Discussion in 'PHP' started by aliweb, Aug 19, 2010.

  1. #1
    Hi,

    I am using Snoopy class to read contents of a web page. But it doesn't perfectly simulate the browser because the content of web page changes when I use Snoopy to fetch it. And if I view it in browser directly then content is different.
    I even set the user agent string in Snoopy but still no luck.
    How can we perfectly simulate browser so it is indistinguishable whether we are using some script or a browser to view web page?

    Thanks
     
    aliweb, Aug 19, 2010 IP
  2. ThomasTwen

    ThomasTwen Peon

    Messages:
    113
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #2
    What do mean by "different content"? Does it show nothing, or a warning advising you to stop using a bot?

    A possible explanation might be the fact that Snoopy cant execute JavaScript code on the site.
     
    ThomasTwen, Aug 19, 2010 IP
  3. danx10

    danx10 Peon

    Messages:
    1,179
    Likes Received:
    44
    Best Answers:
    2
    Trophy Points:
    0
    #3
    I believe Snoopys dependencies are cURL and/or fsockopen().

    cURL is PHP's effort for browser simulation, theirfore thats the only way to simulate a browser using pure PHP.

    However you can take advantage of the options cURL can accept, such as the referer, ip (use a proxy), user agent to make it seem more realistic to the remote site.
     
    danx10, Aug 19, 2010 IP
  4. Narrator

    Narrator Active Member

    Messages:
    392
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    80
    #4
    If want to pm me the url, I'll take a look.
     
    Narrator, Aug 19, 2010 IP
  5. aliweb

    aliweb Well-Known Member

    Messages:
    250
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    108
    #5
    Got it. It's happening because the site tries to set cookies and since Snoopy can't do it (or may be it does and I don't know!?) so it shows different content. I tested it by disabling COOKIES in my browser, then checked the site and now its content was the same as shown when using Snoopy.
    So how do I allow sites to set cookies using Snoopy?
     
    aliweb, Aug 19, 2010 IP