Get TEXT only with CURL ?

Discussion in 'PHP' started by technojuice, Aug 9, 2008.

  1. #1
    Hey !

    Is there any way I can get only the text on the HTML document i open with CURL ? Sort of outertext rather then the innerhtml !

    Any help appreciated !
     
    technojuice, Aug 9, 2008 IP
  2. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #2
    Outertext? Can you be a bit more specific?

    Maybe strip_tags() is what you're looking for?
     
    nico_swd, Aug 9, 2008 IP
  3. technojuice

    technojuice Peon

    Messages:
    207
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    will CURLOPT_TRANSFERTEXT help ? how to use it ?

    All I want is to get the text of the HTML document !

    Strip tag does not remove from javascript !
     
    technojuice, Aug 9, 2008 IP
  4. nico_swd

    nico_swd Prominent Member

    Messages:
    4,153
    Likes Received:
    344
    Best Answers:
    18
    Trophy Points:
    375
    #4
    
    $text = 'Some HTML';
    
    if (preg_match('~<body[^>]*>(.*?)</body>~si', $text, $body))
    {
    	// Strip the naughty stuff
    	$text = preg_replace(
    		array('~[\r\n]+~', '~<(script|object|embed)[^>]*>(?:.*?)</\1>~si'),
    		array(' ', null),
    		$body[1]
    	);
    	// Strip the rest
    	$text = strip_tags($text);
    		
    	echo $text;
    }
    
    PHP:
     
    nico_swd, Aug 9, 2008 IP