Hi, I'm working at a new site, and I need to verify if a URL given by the user is a webpage, and not a PDF, MP3 or other formats.... this is because i have to work with the given URL, and i would not like to parse stupid files, that newbee users might enter.... Thanks, adolix
Look for a common HTML tag like <head>, <html>, <body>, <p>, <div>, <h1> etc. You can also check the extension of the file.
That won't work, not all HTML pages have HTML markups. The only I see is that you download the file and then check the file type. Peace,
That won't work, not all HTML pages have extensions that would indicate the filetype. And doing it manually would be no-go on bulk indexing. I'd confidently say 99% of HTML pages contain HTML mark-up so it's a safe bet.
The first step is to make sure the submitted URL does not contain the extension for a pdf, mp3 type file, etc. This will allow you to tell the user up front that links must be to ordinary pages web pages and not PDFs, music, video, graphic and similar files. After that you need to do as azizny suggests -- download and check file type.
dowloading is exactly what i want NOT TO DO because if the links is of a 15 MB pdf.... the user will wait very much, and then he will be given an error... other ideas ? thanks
Clancy has the best order: 1. Check extension 2. file_get_contents() (Or fopen and read in just N KB) and check for <html>
You can use cURL library. this will help you to get only headers for your request (to the page you want to verify) - you can specify this option with curl_setpot(). If you do not understand me - write me back, I will describe By the way, PHP 5 has cURL bugfixed (Anouncement here)
crazyden, I am using exactly curl, because I need to search for a certain string in the file..... right now I am doing this: function parseit($url) { set_time_limit(0); $ch = curl_init(); curl_setopt($ch,CURLOPT_FOLLOWLOCATION,true); curl_setopt($ch,CURLOPT_RETURNTRANSFER,true); curl_setopt($ch,CURLOPT_MAXREDIRS,100); curl_setopt($ch,CURLOPT_URL,$url); $buffer = curl_exec($ch); return $buffer; } but I would like to see if the file really is a normal webpage, and not a PDF/MP3 etc, without downloading the entire file, which can be 15 MB... i am looking forward to your help thanks, adolix