Fetch Data from another website

Discussion in 'PHP' started by milestonesweb, Jul 28, 2010.

  1. #1
    There is a school locator script at https://www.ocps.net/parents/pages/FindaSchool.aspx
    When we input any address it returns schools located in that locality.

    For example:

    Use these details and submit the form

    Street Number : 3902
    Street Name : Bobolink
    Street Type : Lane
    City : Orlando


    It returns a row having three schools Elementary,Middle and High school
    On clicking the more button it takes to the respective school details
    So for each school I need the respective school names

    Audobon Elementary
    Glenridge Middle
    Winter Park High

    Please suggest some ways to achieve this functionality


    Thanks
     
    milestonesweb, Jul 28, 2010 IP
  2. Deacalion

    Deacalion Peon

    Messages:
    438
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Use cURL to submit the form and retrieve the page (look at CURLOPT_POST and CURLOPT_POSTFIELDS).
    Once you have the page, use a regular expression to scrape out the school names.
     
    Deacalion, Jul 28, 2010 IP
  3. milestonesweb

    milestonesweb Peon

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I used curl to submit the form and scrape the result at step 1.

    But in the second step that to click more button to see school names,curl gives error.

    here is my code:

    $url="https://www.ocps.net/parents/pages/findaschool.aspx";
    //$logFileName = "ocfl_import_school_".date('Y-m-d').".log";
    //$logFileHandle = fopen($logFileName, 'a');
    echo $_SERVER['HTTP_USER_AGENT'];
    $fields = array(
    '__SPSCEditMenu' => 'true',
    'MSOWebPartPage_PostbackSource' => '',
    'MSOTlPn_SelectedWpId' => '',
    'MSOTlPn_View' => '0',
    'MSOTlPn_ShowSettings' => 'False',
    'MSOGallery_SelectedLibrary' => '',
    'MSOGallery_FilterString' => '',
    'MSOTlPn_Button' => 'none',
    '__EVENTTARGET' => 'ctl00$m$g_5e6ff926_878b_4831_ae5a_37603a021d6e$gridview1$ctl02$CR_ELEM',
    '__EVENTARGUMENT' => '',
    '__REQUESTDIGEST' =>'0x0494115FE2800AB046FFA276752A8BCBACE244D0FC6E1AE75FC9567ABF3FAE0EB9462B6141C2EC6EFCE01B97DCE32A21B57BEAC400DCE9C2D15CE3FA746D8422,27 Jul 2010 12:23:10 -0000',
    'MSOAuthoringConsole_FormContext' => '',
    'MSOAC_EditDuringWorkflow' => '',
    'MSOSPWebPartManager_DisplayModeName' => 'Browse',
    'MSOWebPartPage_Shared' => '',
    'MSOLayout_LayoutChanges' => '',
    'MSOLayout_InDesignMode' => '',
    'MSOSPWebPartManager_OldDisplayModeName' => 'Browse',
    'MSOSPWebPartManager_StartWebPartEditingName' => 'false',
    'ctl00$m$g_5e6ff926_878b_4831_ae5a_37603a021d6e$gridview1$ctl02$CR_ELEM' => 'more',
    '__EVENTVALIDATION' =>'/wEWCQLYg4XOCQLw9Z3WCwLO/Y/kBQLZ1b79AwKc2spsAoWBoqcFAqq6wIsCAoOEl/0OAv318ZIMas32fAcmrAMZA+eOERUqECA+YaQ',
    'WPQ3streetNum' =>'',
    'WPQ3streetName' =>'',
    'WPQ3st_type' => 'all',
    'WPQ3City' => 'orlando',
    '__VIEWSTATE' => '/wEPDwUBMA9kFgJmD2QWAgIBD2QWBAIBD2QWAgIHD2QWAmYPZBYCAgEPFgIeE1ByZXZpb3VzQ29udHJvbE1vZGULKYgBTWljcm9zb2Z0LlNoYXJlUG9pbnQuV2ViQ29udHJvbHMuU1BDb250cm9sTW9kZSwgTWljcm9zb2Z0LlNoYXJlUG9pbnQsIFZlcnNpb249MTIuMC4wLjAsIEN1bHR1cmU9bmV1dHJhbCwgUHVibGljS2V5VG9rZW49NzFlOWJjZTExMWU5NDI5YwFkAgMPZBYIAgIPZBYEBSZnXzdhMWJjMGVlX2Q4Y2VfNDUyMF85YTU0X2ViZGU3MjI2NjM1Mg8PFhAeBVRpdGxlBRdFbnRlciBZb3VyIEFkZHJlc3MgSGVyZR4LRGVzY3JpcHRpb24FN1VzZSB0byBjb25uZWN0IHNpbXBsZSBmb3JtIGNvbnRyb2xzIHRvIG90aGVyIFdlYiBQYXJ0cy4eCkNocm9tZVR5cGUCAh4HQ2xpY2tlZGceCURpcmVjdGlvbgsqKlN5c3RlbS5XZWIuVUkuV2ViQ29udHJvbHMuQ29udGVudERpcmVjdGlvbgAeBVdpZHRoHB4GSGVpZ2h0HB4EXyFTQgKAgwhkZAUmZ181ZTZmZjkyNl84NzhiXzQ4MzFfYWU1YV8zNzYwM2EwMjFkNmUPZBYEZg8PZA8PFCsABRYGHgROYW1lBQZzdHJlZXQeDERlZmF1bHRWYWx1ZQUKJUJPQk9MSU5LJR4OUGFyYW1ldGVyVmFsdWVkFgYfCQUIdHlwZWFkZHIfCgUCTE4fC2QWBh8JBQRjaXR5HwoFB09STEFORE8fC2QWBh8JBQtmcm9tYWRkcmVzcx8KBQQzOTAyHwtkFgYfCQUJdG9hZGRyZXNzHwoFBDM5MDIfC2QUKwEFAgMCAwIDAgMCA2RkAgIPPCsADQEADxYEHgtfIURhdGFCb3VuZGceC18hSXRlbUNvdW50AgFkFgJmD2QWBgIBD2QWFGYPDxYCHgRUZXh0BQUwMzYwMGRkAgEPDxYCHw4FBTA0MDk5ZGQCAg8PFgIfDgUBIGRkAgMPDxYCHw4FBiZuYnNwO2RkAgQPDxYCHw4FCEJPQk9MSU5LZGQCBQ8PFgIfDgUCTE5kZAIGDw8WAh8OBQdPUkxBTkRPZGQCBw9kFgJmDw8WAh4PQ29tbWFuZEFyZ3VtZW50BUc/c2Nob29sbnVtYmVyPTA1MzEmTnVtYmVyPTM5MDImU3RyZWV0PUJPQk9MSU5LJlR5cGVBZGRyPUxOJkNpdHk9T1JMQU5ET2RkAggPZBYCZg8PFgIfDwVHP3NjaG9vbG51bWJlcj0wNTcxJk51bWJlcj0zOTAyJlN0cmVldD1CT0JPTElOSyZUeXBlQWRkcj1MTiZDaXR5PU9STEFORE9kZAIJD2QWAmYPDxYCHw8FRz9zY2hvb2xudW1iZXI9MTQxMSZOdW1iZXI9MzkwMiZTdHJlZXQ9Qk9CT0xJTksmVHlwZUFkZHI9TE4mQ2l0eT1PUkxBTkRPZGQCAg8PFgIeB1Zpc2libGVoZGQCAw8PFgIfEGhkZAIID2QWAgIBD2QWAmYPD2QWAh4FY2xhc3MFGG1zLXNidGFibGUgbXMtc2J0YWJsZS1leGQCCg9kFgICAQ9kFgQCAQ9kFgICAQ8WAh8QaBYCZg9kFgQCAg9kFgYCAQ8WAh8QaGQCAw8WAh8QaGQCBQ8WAh8QaGQCAw8PFgIeCUFjY2Vzc0tleQUBL2RkAgMPZBYCAgEPDxYCHxBoZBYEAgEPDxYCHxBoZGQCAw8PFgIfEGhkFgICAQ8PFgIfEGdkFgQCAQ8PFgIfEGhkFhwCAQ8PFgIfEGhkZAIDDxYCHxBoZAIFDw8WAh8QaGRkAgcPFgIfEGhkAgkPDxYCHxBoZGQCCw8PFgIfEGhkZAINDw8WAh8QaGRkAg8PDxYEHgdFbmFibGVkaB8QaGRkAhEPDxYCHxBoZGQCEw8PFgQfE2gfEGhkZAIVDw8WAh8QaGRkAhcPFgIfEGhkAhkPFgIfEGhkAhsPDxYCHxBnZGQCAw8PFgIfEGdkFgYCAQ8PFgIfEGdkZAIDDw8WAh8QZ2RkAgUPDxYCHxBnZGQCMg9kFgICAQ9kFgJmDw8WAh8QaGRkGAIFOGN0bDAwJG0kZ181ZTZmZjkyNl84NzhiXzQ4MzFfYWU1YV8zNzYwM2EwMjFkNmUkZ3JpZHZpZXcxDzwrAAoBCAIBZAUVY3RsMDAkUXVpY2tMYXVuY2hNZW51Dw9kBQdTY2hvb2xzZOGvfu3GsFV2Wbtmxm+7ozp0VnMG');

    $fields_string = "";
    $count = 0;
    foreach($fields as $key=>$value) {
    if ($count > 0 )
    $fields_string .= '&';
    $fields_string .= $key.'='.$value;
    $count++;
    }

    echo $fields_string;

    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_POST,count($fields));
    curl_setopt($ch,CURLOPT_POSTFIELDS,$fields_string);
    curl_setopt($ch, CURLOPT_USERAGENT,"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt ($ch, CURLOPT_SSL_VERIFYHOST, 0);
    curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, 0);

    $result = curl_exec($ch);

    curl_close($ch);
    echo $result;


    Please have look at it and suggest what is the issues?

    Thanks
     
    milestonesweb, Jul 29, 2010 IP
  4. Kaizoku

    Kaizoku Well-Known Member

    Messages:
    1,261
    Likes Received:
    20
    Best Answers:
    1
    Trophy Points:
    105
    #4
    Looks like encrypted sessions, you will need to fetch the data first as raw html, then parse in the VIEWSTATE, and possibly some other fields, as they are random each time, and cannot be hard coded.
     
    Kaizoku, Jul 29, 2010 IP
  5. milestonesweb

    milestonesweb Peon

    Messages:
    4
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Yes I downloaded the html source and then create the curl field values..When post the html page it submits but curl doesnt submits successfully.

    Have another question:
    Do you have any idea how to add regular expression for curl url parameters
     
    milestonesweb, Jul 29, 2010 IP