Hi Everyone, I’m trying to figure out a way to log in to my account at https://users.premierleague.com/PremierUser/account/login.html remotely from my own website. Then scrape my fantasy team data, and league data. Would anyone be able to point me in the right direction for this. Basically what I want to do is: 1. I enter my email and password on a form on my website 2. That form information is then passed to https://users.premierleague.com/PremierUser/account/login.html to log in. 3. I scrape details like my team name, current points, players, leagues etc. 4. Return the scraped details to my website where I can manipulate the data. Is something like this possible?
Yes it is possible. - Your PHP script logs you in using CURL - with cookies enabled - With Second request or same request you scrap the certain page I hope it helps.
Hi, I think I have my head around how to scrape the information once I'm logged in. But I just can't seem to find a way to login. This is what I've tried, (I'm a beginner): ?php $login_url = 'https://users.premierleague.com/PremierUser/account/login.html'; //These are the post data username and password $post_data = 'j_username=usernamehere&j_password=mypasswordhere'; //Create a curl object $ch = curl_init(); //Set the useragent $agent = $_SERVER["HTTP_USER_AGENT"]; curl_setopt($ch, CURLOPT_USERAGENT, $agent); //Set the URL curl_setopt($ch, CURLOPT_URL, $login_url ); //This is a POST query curl_setopt($ch, CURLOPT_POST, 1 ); //Set the post data curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data); //We want the content after the query curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //Follow Location redirects curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); /* Set the cookie storing files Cookie files are necessary since we are logging and session data needs to be saved */ curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt'); curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt'); //Execute the action to login $postResult = curl_exec($ch); ?>
To me the code seems just exactly what it should be, make sure 'cookie.txt' is writable and already exists. create empty cookie.txt in same folder and chmod file to 777 through your ftp client. Next step : try fetching data from the listing page [using curl again] by enabling same cookie file in next curl request.
Hi, thanks for your help. Do you know if there are any tutorials on this sort of thing? I keep getting an error that says curl_init() has been disabled for security reasons?
You need to have curl enabled on server to do these operations. Ask your hosting company if they can enable it for you, which they usually don't on shared hosting.
hostgator.com has curl enabled by default on shared hosting just in case you need to look for new hosting.
Its called scraping . I learned it all from a book "webbots, spiders and webscrapers" : something like that
Hi, appreciate the help and advice. I'll have to look into that book. I keep on getting an error message when I run the above script, any ideas?? Forbidden (403) CSRF verification failed. Request aborted. More information is available with DEBUG=True.
Try setting the referrer to the login URL: CURLOPT_REFERER Disable SSL checks: CURLOPT_SSL_VERIFYPEER, CURLOPT_VERIFYHOST = false Try adding the redirectURL field even if you leave it blank.
Hi, thanks for everyone's help. I've managed to get the login to work. Just one more query; When I put inecho $postResult; the entire page contents gets returned. Is there way to parse the html so that I can extract only the bits of information I need, rather than returning the entire page?
Sorry for being so vague, but let's say that I only want to grab data within a table? I don't want the entire page content. For example, this is the table with my fantasy leagues: <h2 class="ismTableHeading">Classic leagues</h2> <table class="ismTable ismLeagueTable"> <colgroup> <col class="ismCol1"> <col class="ismCol2"> <col class="ismCol3"> </colgroup> <thead class="ismHideContent"> <tr> <th scope="col"> </th> <th scope="col">Rank</th> <th scope="col">League</th> </tr> </thead> <tbody> <tr> <td><img width="10" height="10" alt="up" src="image"></td> <td>1</td> <td> <a href="/my-leagues/3xxxx/standings/">league 1</a> </td> </tr> <tr> <td><img width="10" height="10" alt="up" src="image"></td> <td>1</td> <td> <a href="/my-leagues/1xxxx/standings/">League 2</a> </td> </tr> <tr> <td><img width="10" height="10" alt="up" src="image"></td> <td>3</td> <td> <a href="/my-leagues/4xxx/standings/">League 3</a> </td> </tr> </tbody> </table>