Scraping Webpages Resource Usage

Discussion in 'PHP' started by Kennedy, Feb 29, 2008.

  1. #1
    Hypothetically, if you had a site that scraped around a thousand pages a day, could a basic hosting plan handle it?

    I have no idea how severe the load is on a shared server.
     
    Kennedy, Feb 29, 2008 IP
  2. Panzer

    Panzer Active Member

    Messages:
    381
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    58
    #2
    Probably not. You'd need to tell the script not to time out, and that would instintively use as much system resources as it could (to get it done fast), you'd most likely get kicked off.

    I would suggest getting your own VPS or just run the scraper script off a localhost server, then upload the database/xml data (Whatever storage method) to your engine.
     
    Panzer, Feb 29, 2008 IP
  3. Kennedy

    Kennedy Peon

    Messages:
    994
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #3
    About how many pages could a basic hosting plan scrape per day before it sends up a red flag?
     
    Kennedy, Feb 29, 2008 IP
  4. crazyryan

    crazyryan Well-Known Member

    Messages:
    3,087
    Likes Received:
    165
    Best Answers:
    0
    Trophy Points:
    175
    #4
    That would depend on how big the pages you were scraping are.
     
    crazyryan, Mar 1, 2008 IP
  5. Kennedy

    Kennedy Peon

    Messages:
    994
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    0
    #5
    about the size of this one.
     
    Kennedy, Mar 1, 2008 IP
  6. lephron

    lephron Active Member

    Messages:
    204
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    53
    #6
    I think you should be OK, although it really depends of what you do with the data once you've scraped it. if you're putting it into a DB then there will be lots of DB writes, which will probably throw up a red flag. Scraping itself isn't very intensive.
     
    lephron, Mar 2, 2008 IP