Help! Runaway Crawler!!

Discussion in 'PHP' started by Phil.2007, Dec 2, 2007.

  1. #1
    I've just been developing a simple web crawler today, but it went "off-piste" and started crawling the entire web (well it got as far as Google before I pressed 'stop' in my browser).

    The thing is I'm worried that the script's still running even though I did press stop - can anyone clarify this?

    The setup is this - the script basically starts at a certain URL, get's the links out of that page then follows them one by one getting more links etc. etc. - each time it finds a new link it print()'s it to the browser - so I sit there watching the links appear as the script runs. The core component of it is a recursive loop so I'm worried that it'll never stop... :eek:
     
    Phil.2007, Dec 2, 2007 IP
  2. krampus

    krampus Active Member

    Messages:
    29
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    88
    #2
    well.. as I know.. there sholud be a server TIME to stop that script.. it is 30 seconds.. if you start some script.. and it is not over in 30 seconds.. server will stop it automaticly
     
    krampus, Dec 2, 2007 IP
  3. steb

    steb Peon

    Messages:
    213
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #3
    you can set the execute time to unlimited..if you did this, its probably still going :)

    stop + restart php,or reboot the server...that'll kill it!
     
    steb, Dec 2, 2007 IP
  4. mojojuju

    mojojuju Peon

    Messages:
    53
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Login to your host via SSH, type 'ps aux', find the process number of the runaway script, and KILL it!!
     
    mojojuju, Dec 2, 2007 IP
  5. wmtips

    wmtips Well-Known Member

    Messages:
    601
    Likes Received:
    70
    Best Answers:
    1
    Trophy Points:
    150
    #5
    You could change the max script execution time via set_time_limit(). BUT regarding "pressing STOP in browser": script will be automatically terminated on pressing STOP. Maybe with some delay needed to finish already started downloading tasks. If you need to continue script execution after client disconnect, you should use ignore_user_abort(true).
     
    wmtips, Dec 2, 2007 IP
  6. Phil.2007

    Phil.2007 Peon

    Messages:
    5
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Sorry, didn't check this thread until now - thanks to everyone who replied. I think the default 30-second timeout would have stopped it.

    I'm doing it in Java now anyway - PHP is totally not the right language to use for this
     
    Phil.2007, Dec 12, 2007 IP
  7. Estevan

    Estevan Peon

    Messages:
    120
    Likes Received:
    8
    Best Answers:
    1
    Trophy Points:
    0
    #7
    hello no use PHP by browser use in command line is more fast !
     
    Estevan, Dec 12, 2007 IP
  8. tonybogs

    tonybogs Peon

    Messages:
    462
    Likes Received:
    13
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Yup, all you need to do is check 'ps'

    Its very doubtful that its still running at this stage. Surely you'd notice it hogging all your resources by now, that is, assuming you are collecting some sort of info from all these sites.
     
    tonybogs, Dec 12, 2007 IP