Twitter API and php crawler

Discussion in 'PHP' started by meekotherogue, Jun 3, 2010.

  1. #1
    Hey all,
    I'm trying to implement a twitter application using the twitter api. What I would like to do is to implement a crawler in php that gathers all the data for me. So far I have it implemented as such I can gather the data I want up until I run out of requests. As soon as I run out of requests, I put the thread to sleep. My intention for this is for the script to continue where it left off after it sleeps until the requests are replenished. Trouble is, it doesn't seem to start back up again. When the requests are replenished, it doesn't keep crawling through twitter.

    Here is a basic outline of what my code looks like:

    function sleepIfNecessary()
    {
    $url = sprintf('http://api.twitter.com/1/account/rate_limit_status.xml');
    $content = $this->OAuthRequest($url, 'GET', array());

    try{
    $xml = new SimpleXMLElement($content);
    }
    catch(Exception $e){
    $sec = "1";
    $page = $_SERVER['PHP_SELF'];
    header("Refresh: $sec; url=$page");
    }

    $hits = $xml->{'remaining-hits'};
    if($hits < 1)
    {
    $d = $xml->{'reset-time-in-seconds'};
    settype($d,"integer");
    sleep($d);
    }
    }

    function gatherInformation($user_name)
    {
    $queue = array();
    settype($user_name, "string");
    $queue[] = $user_name;

    while(count($queue) > 0)
    {
    /*inside here I gather data and put into a mysql database from $user_name, ie their status, number of friends and followers etc*/

    /*then, as this data is gathered, I add each friend and follower to $queue so they can be processed in the same way*/

    /*In doing such, I make requests, and use the above method (sleepIfNecessary()) before each request, so that if there are no more requests, the thread is put to sleep*/
    }
    }

    Any tips?
     
    meekotherogue, Jun 3, 2010 IP
  2. roopajyothi

    roopajyothi Active Member

    Messages:
    1,302
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    80
    #2
    You're confusing please write your problem clearly!
    Let me try to help!
     
    roopajyothi, Jun 3, 2010 IP
  3. meekotherogue

    meekotherogue Peon

    Messages:
    3
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    My problem is that I can't seem to get my crawler to work. I want it to gather data up until I run out of requests. Then I put the thread to sleep for an hour, or until the requests are replenished, and then I wish to continue to gather data. However, it doesn't continue to crawl through twitter; after it sleeps it just continues to display that it's loading.

    In short, I want to crawl through twitter to gather data. When I run out of requests I put the thread to sleep. After it sleeps I want to continue to gather data, but it doesn't continue.

    Any ideas? I'm not even sure if putting the thread to sleep is even the greatest idea, so if you have any alternatives that would be great too :)
     
    meekotherogue, Jun 4, 2010 IP
  4. roopajyothi

    roopajyothi Active Member

    Messages:
    1,302
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    80
    #4
    What type of Data you're going to gather??
    May i know that??
     
    roopajyothi, Jun 4, 2010 IP
  5. meekotherogue

    meekotherogue Peon

    Messages:
    3
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Sure. What I'm gathering at any point is time is user centric. I look at a single user, take their name, status, number of friends, number of followers, and calculate how many times they mention their friends or are mentioned by their followers, then put that into a database. Then what I do is put these friends and followers into a queue, so I can then gather this same data for each of these users. This occurs in the while loop of my first post.
     
    meekotherogue, Jun 4, 2010 IP
  6. roopajyothi

    roopajyothi Active Member

    Messages:
    1,302
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    80
    #6
    Oh! Seems to be a complex one!
    If you're going to calculate yourself only by taking the data from Twitter then you can use DOM to rip that
    But i am not quite strong with this idea!
     
    roopajyothi, Jun 4, 2010 IP