1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Login to another website from my own

Discussion in 'PHP' started by benny306, Nov 15, 2013.

  1. #1
    Hi Everyone,


    I’m trying to figure out a way to log in to my account at https://users.premierleague.com/PremierUser/account/login.html remotely from my own website. Then scrape my fantasy team data, and league data.


    Would anyone be able to point me in the right direction for this. Basically what I want to do is:


    1. I enter my email and password on a form on my website

    2. That form information is then passed to https://users.premierleague.com/PremierUser/account/login.html to log in.

    3. I scrape details like my team name, current points, players, leagues etc.

    4. Return the scraped details to my website where I can manipulate the data.


    Is something like this possible?
     
    benny306, Nov 15, 2013 IP
  2. Vooler

    Vooler Well-Known Member

    Messages:
    1,146
    Likes Received:
    64
    Best Answers:
    4
    Trophy Points:
    150
    #2
    Yes it is possible.
    - Your PHP script logs you in using CURL - with cookies enabled
    - With Second request or same request you scrap the certain page

    I hope it helps.
     
    Vooler, Nov 15, 2013 IP
  3. benny306

    benny306 Active Member

    Messages:
    65
    Likes Received:
    0
    Best Answers:
    1
    Trophy Points:
    51
    #3
    Hi, I think I have my head around how to scrape the information once I'm logged in. But I just can't seem to find a way to login.

    This is what I've tried, (I'm a beginner):

    ?php

    $login_url = 'https://users.premierleague.com/PremierUser/account/login.html';

    //These are the post data username and password
    $post_data = 'j_username=usernamehere&j_password=mypasswordhere';

    //Create a curl object
    $ch = curl_init();

    //Set the useragent
    $agent = $_SERVER["HTTP_USER_AGENT"];
    curl_setopt($ch, CURLOPT_USERAGENT, $agent);

    //Set the URL
    curl_setopt($ch, CURLOPT_URL, $login_url );

    //This is a POST query
    curl_setopt($ch, CURLOPT_POST, 1 );

    //Set the post data
    curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);

    //We want the content after the query
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

    //Follow Location redirects
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

    /*
    Set the cookie storing files
    Cookie files are necessary since we are logging and session data needs to be saved
    */

    curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
    curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');

    //Execute the action to login
    $postResult = curl_exec($ch);

    ?>
     
    benny306, Nov 15, 2013 IP
  4. Vooler

    Vooler Well-Known Member

    Messages:
    1,146
    Likes Received:
    64
    Best Answers:
    4
    Trophy Points:
    150
    #4
    To me the code seems just exactly what it should be, make sure 'cookie.txt' is writable and already exists. create empty cookie.txt in same folder and chmod file to 777 through your ftp client.

    Next step : try fetching data from the listing page [using curl again] by enabling same cookie file in next curl request.
     
    Vooler, Nov 15, 2013 IP
  5. benny306

    benny306 Active Member

    Messages:
    65
    Likes Received:
    0
    Best Answers:
    1
    Trophy Points:
    51
    #5
    Hi, thanks for your help. Do you know if there are any tutorials on this sort of thing?

    I keep getting an error that says curl_init() has been disabled for security reasons?
     
    benny306, Nov 15, 2013 IP
  6. Vooler

    Vooler Well-Known Member

    Messages:
    1,146
    Likes Received:
    64
    Best Answers:
    4
    Trophy Points:
    150
    #6
    You need to have curl enabled on server to do these operations. Ask your hosting company if they can enable it for you, which they usually don't on shared hosting.
     
    Vooler, Nov 15, 2013 IP
  7. stephan2307

    stephan2307 Well-Known Member

    Messages:
    1,277
    Likes Received:
    33
    Best Answers:
    7
    Trophy Points:
    150
    #7
    hostgator.com has curl enabled by default on shared hosting just in case you need to look for new hosting.
     
    stephan2307, Nov 15, 2013 IP
  8. kutchbhi

    kutchbhi Active Member

    Messages:
    130
    Likes Received:
    4
    Best Answers:
    2
    Trophy Points:
    70
    #8
    Its called scraping :) . I learned it all from a book "webbots, spiders and webscrapers" : something like that
     
    kutchbhi, Nov 15, 2013 IP
  9. benny306

    benny306 Active Member

    Messages:
    65
    Likes Received:
    0
    Best Answers:
    1
    Trophy Points:
    51
    #9
    Hi, appreciate the help and advice. I'll have to look into that book. I keep on getting an error message when I run the above script, any ideas??
    Forbidden (403)
    CSRF verification failed. Request aborted.

    More information is available with DEBUG=True.
     
    benny306, Nov 19, 2013 IP
  10. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #10
    Try setting the referrer to the login URL: CURLOPT_REFERER
    Disable SSL checks: CURLOPT_SSL_VERIFYPEER, CURLOPT_VERIFYHOST = false
    Try adding the redirectURL field even if you leave it blank.
     
    nico_swd, Nov 19, 2013 IP
  11. benny306

    benny306 Active Member

    Messages:
    65
    Likes Received:
    0
    Best Answers:
    1
    Trophy Points:
    51
    #11
    Hi, thanks for everyone's help. I've managed to get the login to work. Just one more query; When I put inecho $postResult; the entire page contents gets returned. Is there way to parse the html so that I can extract only the bits of information I need, rather than returning the entire page?
     
    benny306, Nov 21, 2013 IP
  12. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #12
    Well the answer is yes, but you need to be a little more specific if you need further help.
     
    nico_swd, Nov 21, 2013 IP
  13. benny306

    benny306 Active Member

    Messages:
    65
    Likes Received:
    0
    Best Answers:
    1
    Trophy Points:
    51
    #13
    Sorry for being so vague, but let's say that I only want to grab data within a table? I don't want the entire page content. For example, this is the table with my fantasy leagues:
    <h2 class="ismTableHeading">Classic leagues</h2>
    <table class="ismTable ismLeagueTable">
    <colgroup>
    <col class="ismCol1">
    <col class="ismCol2">
    <col class="ismCol3">
    </colgroup>
    <thead class="ismHideContent">
    <tr>
    <th scope="col">&nbsp;</th>
    <th scope="col">Rank</th>
    <th scope="col">League</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <td><img width="10" height="10" alt="up" src="image"></td>
    <td>1</td>
    <td>
    <a href="/my-leagues/3xxxx/standings/">league 1</a>
    </td>
    </tr>
    <tr>
    <td><img width="10" height="10" alt="up" src="image"></td>
    <td>1</td>
    <td>
    <a href="/my-leagues/1xxxx/standings/">League 2</a>
    </td>
    </tr>
    <tr>
    <td><img width="10" height="10" alt="up" src="image"></td>
    <td>3</td>
    <td>
    <a href="/my-leagues/4xxx/standings/">League 3</a>
    </td>
    </tr>
    </tbody>
    </table>
     
    benny306, Nov 21, 2013 IP