deley EMAIL scraper

Discussion in 'PHP' started by dramiditis, Jan 29, 2009.

  1. #1
    Hi there:

    I'm working on EMAIL scraper and I have array with 15000 url (my target group)
    So, when I start this script it scraps around 200 url and than stops. I think that server timeouts and if I put in the scrip only 200 urls and after that new 200 urls all that manually, than everything works fine.
    So, I would like to stop the script on every 200 scraps wait 2 sec and to continue with next 200 and so on.

    Ok, this is my scrip:

    <?php
      
     include ("database.php");
    
    
     $url2 = array ("http://example.com", "http://example2.com", "http://example.com3", ...); / 15000 urls total
     
    
     foreach ($url2 as $url)
    { 
     
     $html=file_get_contents ($url);
    
        $str = $html;
    
       $emails = get_emails ($str);
    
      foreach ($emails as $emails3)
    {
     
         $check = mysql_query("SELECT * FROM table WHERE mail = '$emails3'")
    or die(mysql_error());
    $check2 = mysql_num_rows($check);
    
    //if the name exists it gives an error
    if ($check2 == 0) {
    
    $insert2 = "INSERT INTO  table (mail)
    VALUES ('$emails3')";
    $add_member2 = mysql_query($insert2);
    }
    	
    }	
    }
    
    ?>
    PHP:
    So any suggestion ?

    Best Regards
     
    dramiditis, Jan 29, 2009 IP
  2. crivion

    crivion Notable Member

    Messages:
    1,669
    Likes Received:
    45
    Best Answers:
    0
    Trophy Points:
    210
    Digital Goods:
    3
    #2
    For stoping use function sleep()
    also use set_time_limit() function
     
    crivion, Jan 29, 2009 IP
  3. dramiditis

    dramiditis Peon

    Messages:
    87
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Where is better to use function sleep(), because I have two "foreach" one for get contents and second for adding emails. And for set_time_limit(), where I should put it, I'm litle bit confused at the moment. :(
     
    dramiditis, Jan 29, 2009 IP
  4. dramiditis

    dramiditis Peon

    Messages:
    87
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Actually how can I set sleep() after every 200 scraps?
     
    dramiditis, Jan 29, 2009 IP
  5. dramiditis

    dramiditis Peon

    Messages:
    87
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Ok, I have set set_time_limit() to set_time_limit(0) unlimited, but how can i stop every 200 url from array because my code is "foreach". Where should I use sleep(), or should I use something else?
     
    dramiditis, Jan 29, 2009 IP
  6. darkmessiah

    darkmessiah Peon

    Messages:
    500
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #6
    I believe is has something to do with how your script isn't finishing after a certain amount of time.

    Might be a limitation with the server your script is being hosted on.

    on the outside of your foreach have a variable called $cnt = 0;

    inside the foreach, do a ++$cnt; if($cnt==200){sleep();$cnt=0;}
     
    darkmessiah, Jan 29, 2009 IP
  7. Danltn

    Danltn Well-Known Member

    Messages:
    679
    Likes Received:
    36
    Best Answers:
    0
    Trophy Points:
    120
    #7
    Just insert ++$i. (remember to initialize it to 0. $i = 0;)

    Then:

    if(0 == $i % 200) sleep(1); /* Pause a second every 200th */
     
    Danltn, Jan 29, 2009 IP