Hey all, I'm trying to implement a twitter application using the twitter api. What I would like to do is to implement a crawler in php that gathers all the data for me. So far I have it implemented as such I can gather the data I want up until I run out of requests. As soon as I run out of requests, I put the thread to sleep. My intention for this is for the script to continue where it left off after it sleeps until the requests are replenished. Trouble is, it doesn't seem to start back up again. When the requests are replenished, it doesn't keep crawling through twitter. Here is a basic outline of what my code looks like: function sleepIfNecessary() { $url = sprintf('http://api.twitter.com/1/account/rate_limit_status.xml'); $content = $this->OAuthRequest($url, 'GET', array()); try{ $xml = new SimpleXMLElement($content); } catch(Exception $e){ $sec = "1"; $page = $_SERVER['PHP_SELF']; header("Refresh: $sec; url=$page"); } $hits = $xml->{'remaining-hits'}; if($hits < 1) { $d = $xml->{'reset-time-in-seconds'}; settype($d,"integer"); sleep($d); } } function gatherInformation($user_name) { $queue = array(); settype($user_name, "string"); $queue[] = $user_name; while(count($queue) > 0) { /*inside here I gather data and put into a mysql database from $user_name, ie their status, number of friends and followers etc*/ /*then, as this data is gathered, I add each friend and follower to $queue so they can be processed in the same way*/ /*In doing such, I make requests, and use the above method (sleepIfNecessary()) before each request, so that if there are no more requests, the thread is put to sleep*/ } } Any tips?
My problem is that I can't seem to get my crawler to work. I want it to gather data up until I run out of requests. Then I put the thread to sleep for an hour, or until the requests are replenished, and then I wish to continue to gather data. However, it doesn't continue to crawl through twitter; after it sleeps it just continues to display that it's loading. In short, I want to crawl through twitter to gather data. When I run out of requests I put the thread to sleep. After it sleeps I want to continue to gather data, but it doesn't continue. Any ideas? I'm not even sure if putting the thread to sleep is even the greatest idea, so if you have any alternatives that would be great too
Sure. What I'm gathering at any point is time is user centric. I look at a single user, take their name, status, number of friends, number of followers, and calculate how many times they mention their friends or are mentioned by their followers, then put that into a database. Then what I do is put these friends and followers into a queue, so I can then gather this same data for each of these users. This occurs in the while loop of my first post.
Oh! Seems to be a complex one! If you're going to calculate yourself only by taking the data from Twitter then you can use DOM to rip that But i am not quite strong with this idea!