Hello, I need some help. I am creating an educational website on which I will be keeping the information of universities. What I would like to do is to create a script that visits the university websites and fetches the admission date from a specified page. Could someone please guide as how can i achieve this? Thanks
You need a good developer to build such a script. you can find free scripts, but i wouldn't count on them
If I were to do this I would use something like PHP's file_get_contents to read the content of the web page. Then I would do something like stristr() to get any content that is after phrases such as "Admission Date" , "Start Date", etc. Then once you have everything past "Admission Date" it's a matter of checking the rest of the string for the next available date. You'll have to do some logic to grab the date, and you may not know if you're going to get a date like 03-30-2012 or March 30th, 2012 so you'll have to do something to account for all of the variations in dates out there.
try using simple_html_dom parser class.. i'm using it for a lot of data scraping project, it's very easy to use
I do a lot of scraping and I would suggest the following. 1) use curl to get the page 2) check the html and do a simple preg_match on the date and a little bit of code before and after the date 3) if it can't find the date send yourself an email so you can check if the website layout changed. I would keep it really simple because you will have to custom code the script for every website as every website is structured differently. If you are smart then you can create a small system that lets you reuse as much of the code as possible.
You can use curl to get contents directly from their website and you can than arrange yoiur contents in the mysql table to fetch the data later in your website
Ok guys, I've used this approach to get my thing.. I used Simple Html Dom to fetch plain text from the website. And then, I split the text from a word before date. After that, i fetched date using preg_match and fetched the required date. Here's the code: include('../simple_html_dom.php'); $url = ""; // link $text_check = ""; // Added on $format = 0; // dd mm yy $pattern = ""; // date pattern check $display_date = "none"; // form data if(isset($_POST['submit'])) { $url = $_POST['url']; $text_check = $_POST['text']; $format = $_POST['format']; // end form data // http://www.ilmkidunya.com/admission_notices/admission-in-fsc-icom-ics-9356.aspx // http://www.jobz.pk/opf-girls-college-islamabad-admissions-_admissions-95.html $plain_text = file_get_html($url)->plaintext; //echo $plain_text; $after_pruning = strstr($plain_text, $text_check); if($format==01) { $pattern = "/([a-z]{1,10}|[A-Z]{1,10})+[\s]+[0-9]{1,2}+[,\s]+[0-9]{1,4}/i"; //Semptember 01, 2003 } elseif($format==02) { $pattern = "/[0-9]{1,2}+[\s]+[a-zA-Z0-9]{3}+[\s]+[0-9]{1,4}/i"; //05 Mar 2000 } if(preg_match($pattern, $after_pruning, $res)) { $display_date = $res[0]; } PHP: Can you please guide if there is any better approach than this? Moreover, I've little problem that is if the website has two dates side by side, how will I fetch the second date? As the approach which i use is helpful for fetching that date which has some alphabatic text beside it. But how if the dates were like july 20, 2010 july 10, 2012 Thanks