Desperately need help with extracting data from one site and providing to another

Discussion in 'PHP' started by okhawaja, May 24, 2011.

  1. #1
    Hi, I have spent the entire week and 7 hours atleast each day trying to learn how I can extract calendar data such as events on a specific date. I am reallly confused as I am a fairly new scripting programmer, I have known basic PHP all the way upto file reading but now I am having to learn more...I am really frustrated because I don't know what to do. Apparently screen scrapping wont work because if there are updates in the calendar the whole thing crashes, not that I fully know how screen scrapping works...I don't know howe to parse html data etc any help is realllllyyyy appreciated...im stuck..
     
    okhawaja, May 24, 2011 IP
  2. sarahk

    sarahk iTamer Staff

    Messages:
    28,899
    Likes Received:
    4,555
    Best Answers:
    123
    Trophy Points:
    665
    #2
    Ask the site to provide an RSS feed of events and import that.
     
    sarahk, May 25, 2011 IP
  3. ntomsheck

    ntomsheck Peon

    Messages:
    87
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #3
    If it's a weak to medium strength site, you can use CURL.
    On medium strength sites, you'll have to do a little referrer spoofing. Honestly, would be best to get the site owners permission.
    Without referrer spoofing, a simple script can see that the request is coming from a site not of their own, and just block the request.
     
    ntomsheck, May 25, 2011 IP
  4. Mak3MyDay

    Mak3MyDay Peon

    Messages:
    25
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Mak3MyDay, May 25, 2011 IP
  5. niks00789

    niks00789 Well-Known Member

    Messages:
    188
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    110
    #5
    i guess you can get the content of a particular page using curl

    Then analyze the source code for a particular page

    Even if it is getting updated, still the layout and format of the page might be same.

    Look out for a particular pattern

    Then you can use preg_match to get the contents of the exact portion of that page

    Then on further filtering that data, i guess you will be able to achieve what you intend to.

    The important things you should look out for are : using curl, preg_match, preg_match_all

    Then you can run loops and gather data easily from the entire site.

    Hope it helps. :)
     
    niks00789, May 26, 2011 IP