im loading a webpage with fopen() and fread(), then how to strip tags befor loading??

Discussion in 'PHP' started by free-designer, Jul 8, 2010.

  1. #1
    Hey...
    I think the title explains what i need, im using this function to get the title from a passed url.

    the function is doing the below:
    1- Im loading the first 7500 chars
    2- then with regexp getting the title tag
    --------------------------------------

    
    function page_title($url) {
    		
    		$fp = fopen($url, "r");
    		
    
    		$str = fread($fp, 7500 );  
    		
    		fclose($fp);
    		
            $res = preg_match("|<[\s]*title[\s]*>([^<]+)<[\s]*/[\s]*title[\s]*>|Ui", $str, $fp);
            
            if (!$res){
                return "Coudn't get the title of: $url";
            }else{
            	$title = $fp[1];
            	return $title;
            }
        }
    
    echo page_title("http://www.google.com");
    
    
    PHP:
    i can use the file_get_contents() function instead of using fopen() and the fread() functions, because i don't have to download all the page i just need the top of the codes and not all of it cuz im getting the title, so i only get the first 7500 function.

    so the problem is that for example vbulletin3 putting the title tag at the end of the <head> and the problem that <head> having very very mush tags so i coudn't get the title.

    i want to strip the |<script> and <style> and <link>| tags.

    any ideas...
     
    free-designer, Jul 8, 2010 IP
  2. danx10

    danx10 Peon

    Messages:
    1,179
    Likes Received:
    44
    Best Answers:
    2
    Trophy Points:
    0
    #2
    Erm you might aswell just use file_get_contents() ? it would save you the trouble/hassle in the future..not all websites may have the title tag within the first 7500 characters.

    
    function paqge_title($url) {
    
    $content = file_get_contents($url);
    
    if (preg_match('~<title>(.+?)</title>~i', $content, $a)) {
    
    return $a[1];
    
    } else return "Coudn't get the title of: $url";
    
    }
    PHP:
    Or to answer your question, you can use the strip_tags() function.
     
    danx10, Jul 8, 2010 IP
  3. free-designer

    free-designer Peon

    Messages:
    79
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    first: i can use file_get_contents() but, it's so slow because it have to download all of the page content and then look for the title, and there is some sites that have a looooot of content so people would have to wait for the downloading and then it will get the title, not cool.

    so i think only getting the top parts would be better, cuz the title on the top

    so i used the code that i provided befor, but the problem is it will get the first 7500 chars and somesites have the title down down down :mad:, so i think striping tags is good.

    all i need is to strip tags befor getting the content ... all i want to strip the <style>,<script> befor it get the first 7500 chars, so do you think it is possible
     
    free-designer, Jul 8, 2010 IP