Stripping Links with PHP?

smashedpumpkins Peon

Messages:: 40

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#1

I'm using a WordPress plugin to grab articles related to my website content. The plugin takes the article and copies it identically and posts it to my blog. However, with many articles, it's copping links that I do not want. I'd like to strip all links, but leave the rest of the formatting in tac. I don't think it should be horrible hard to do, but I can't figure it out myself. If it's a simple line or two can anyone help me out? I believe it should be added between Line 121 - 135. (These lines define the content that is displayed. The next 20 lines after this code is a second option that removes all formatting. However I only want to remove links. I really appreciate any help you can offer!


		$xpath = new DOMXPath($dom);
		$paras = $xpath->query("//div[@id='KonaBody']//p"); 

		for ($i = 0;  $i < $paras->length; $i++ ) {  //$paras->length

			$para = $paras->item($i);
			$paragraph = $para->textContent;
			
			if ($paragraph != '') {
					if (function_exists('ma_translate') && get_option('ma_trans_article') == 1) {$paragraph = ma_translate($paragraph);}
			
				$content .= $paragraph . ' ';
				$content .= "<br/><br/>";
			}
		}

PHP:

For a better understanding here's the entire files code.


<?php

function ma_articlepost($keyword,$cat,$num,$which) {
   global $wpdb, $ma_dbtable;
   
	// Debug
   	debug_log('- EZA');	 
	$keyword2 = $keyword;	
	$keyword = str_replace( " ","+",$keyword );	
	$keyword = urlencode($keyword);
	
	  $blist[] = "Mozilla/5.0 (compatible; Konqueror/4.0; Microsoft Windows) KHTML/4.0.80 (like Gecko)";
      $blist[] = "Mozilla/5.0 (compatible; Konqueror/3.92; Microsoft Windows) KHTML/3.92.0 (like Gecko)";
      $blist[] = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; WOW64; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506; Media Center PC 5.0; .NET CLR 1.1.4322; Windows-Media-Player/10.00.00.3990; InfoPath.2";
      $blist[] = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; Dealio Deskball 3.0)";
      $blist[] = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; NeosBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)";
      $ua = $blist[array_rand($blist)];	
	
	$source = get_option('ma_eza_source');
	
   // SOOPERARTICLES	
   if($source == "sooperarticles") {   
   
	$startat = $num;
	
	if ($startat == 0) {
		$startpage = 1;
		$sk = 1;
	} else {
		$xz = $startat / 15;
		$startpage = ceil($xz);
		$sk = $startat - ( $startpage -1 ) * 15;
	}
	$l = $startpage;
	$sk = $sk -1;
 
	$search_url = "http://www.sooperarticles.com/search/?t=titles&s=$keyword&p=$l";
	// make the cURL request to $search_url
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_USERAGENT, 'Firefox (WindowsXP) - Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6');
	curl_setopt($ch, CURLOPT_URL,$search_url);
	curl_setopt($ch, CURLOPT_FAILONERROR, true);
	curl_setopt($ch, CURLOPT_AUTOREFERER, true);	
	curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
	curl_setopt($ch, CURLOPT_TIMEOUT, 45);
	$html= curl_exec($ch);
	if (!$html) {
		echo "<br />cURL error number:" .curl_errno($ch);
		echo "<br />cURL error:" . curl_error($ch);
		exit;
	}
	curl_close($ch); 

	// parse the html into a DOMDocument  

		$dom = new DOMDocument();
		@$dom->loadHTML($html);

	// Grab Product Links  

		$xpath = new DOMXPath($dom);
		$paras = $xpath->query("//div/h3/a");

		$para = $paras->item($sk);
		if($para == '' | $para == null) {
			echo '<div class="updated"><p>No articles found!</p></div>';
			return "nothing";
			break;
		} else {		
		$target_url = $para->getAttribute('href');

 	// make the cURL request to $search_url
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_USERAGENT, $ua);
	curl_setopt($ch, CURLOPT_URL,$target_url);
	curl_setopt($ch, CURLOPT_FAILONERROR, true);
	curl_setopt($ch, CURLOPT_AUTOREFERER, true);	
	curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
	curl_setopt($ch, CURLOPT_TIMEOUT, 45);
	$html= curl_exec($ch);
	if (!$html) {
		echo "<br />cURL error number:" .curl_errno($ch);
		echo "<br />cURL error:" . curl_error($ch);
		exit;
	}
	curl_close($ch);
	
	// parse the html into a DOMDocument  

		$dom = new DOMDocument();
		@$dom->loadHTML($html);
		

	// Grab Article Title 

		$xpath = new DOMXPath($dom);
		$paras = $xpath->query("//div/h1");
		
		$para = $paras->item(0);
		$title = $para->textContent;		
		$title2 = $title;
		
		if (function_exists('ma_translate') && get_option('ma_trans_title') == 1 && get_option('ma_trans_article') == 1) {$title = ma_translate($title2);}			

		
	// Check X
		
		$xpath = new DOMXPath($dom);
		$paras = $xpath->query("//div[@id='KonaBody']/div[@class='arightside']"); 
		
		$para = $paras->item(0);

		if($para != "" && $para != null) {
			return false;
			break;	
		}
		
 	// Grab Article	
	
	if (get_option('ma_eza_grabmethod')=='old') {
		$xpath = new DOMXPath($dom);
		$paras = $xpath->query("//div[@id='KonaBody']//p"); 

		for ($i = 0;  $i < $paras->length; $i++ ) {  //$paras->length

			$para = $paras->item($i);
			$paragraph = $para->textContent;
			
			if ($paragraph != '') {
					if (function_exists('ma_translate') && get_option('ma_trans_article') == 1) {$paragraph = ma_translate($paragraph);}
			
				$content .= $paragraph . ' ';
				$content .= "<br/><br/>";
			}
		}		
	} elseif (get_option('ma_eza_grabmethod')=='new') {
		$xpath = new DOMXPath($dom);
		$paras = $xpath->query("//div[@id='KonaBody']"); 
		$para = $paras->item(0);		
		$string = $dom->saveXml($para);	
		$tags = array('div','iframe','script');
		$string = ma_strip_selected_tags($string, $tags);	
		$string = str_replace("]]>", "", $string);
		$string = str_replace("<![CDATA[", "", $string);
		
		if (function_exists('ma_translate') && get_option('ma_trans_article') == 1) {$string = ma_translate($string);}

		$content .= $string . ' ';
	}
	
	// Grab Ressource Box	

		$xpath = new DOMXPath($dom);
		$paras = $xpath->query("//div[@class='author-signature']");
		$para = $paras->item(0);		
		$ressourcetext = $dom->saveXml($para);	
		if (function_exists('ma_translate') && get_option('ma_trans_articlebox') == 1) {$ressourcetext = ma_translate($ressourcetext);}
		
		if ($ressourcetext != '') {
		$authorbox = "<div style=\"margin:5px;padding:5px;border:1px solid #c1c1c1;font-size: 10px;\">" . $ressourcetext . "</div>";	
		}	
 
 	}
	}  
   
   // ARTICLESBASE
   if($source == "articlesbase") {
   
   /* Select Proxy
	$proxy ="";
	$burl = get_bloginfo('url');;
	$arr=@file("$burl/wp-content/plugins/WPRobot/modules/proxies.txt");
	if($arr) {
		$noprox = count($arr) - 1;
		$rprox = rand(0,$noprox);
		list($proxy,$proxytype,$proxyuser)=explode("|",$arr[$rprox]);
	}
	*/

		$page = $num / 15;
		$page = (string) $page; 
		$page = explode(".", $page);	
		$page=(int)$page[0];	
		$page++;	
	
		if($page == 0) {$page = 1;}
		$prep = floor($num / 15);
		$numb = $num - $prep * 15;

	
	/*
	$numb = $num;
	$num = $num / 15;
	$num = (string) $num; 
	$num = explode(".", $num);
	$page=(int)$num[0];	
	$page++;				
	$cnum=(int)$num[1]; 
	$l = $page;
	$sk = $cnum;*/
	
	$lang = get_option('ma_eza_lang');
	if($lang == "en") {
		$search_url = "http://www.articlesbase.com/find-articles.php?q=$keyword&page=$page";
	} elseif($lang == "fr") {
		$search_url = "http://fr.articlesbase.com/find-articles.php?q=$keyword&page=$page";	
	} elseif($lang == "es") {
		$search_url = "http://www.articuloz.com/find-articles.php?q=$keyword&page=$page";
	} elseif($lang == "pg") {
		$search_url = "http://www.artigonal.com/find-articles.php?q=$keyword&page=$page";
	} elseif($lang == "ru") {
		$search_url = "http://www.rusarticles.com/find-articles.php?q=$keyword&page=$page";
	}

	// make the cURL request to $search_url
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_USERAGENT, 'Firefox (WindowsXP) - Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6');
	curl_setopt($ch, CURLOPT_URL,$search_url);
	curl_setopt($ch, CURLOPT_FAILONERROR, true);
	curl_setopt($ch, CURLOPT_AUTOREFERER, true);
	/*
Proxy
	if($proxy != "") {
	curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1); 
	curl_setopt($ch, CURLOPT_PROXY, $proxy);
	if($proxyuser) {curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxyuser);}
	if($proxytype == "socks") {curl_setopt ($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);}
	}
*/
	curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
	curl_setopt($ch, CURLOPT_TIMEOUT, 45);
	$html= curl_exec($ch);
	if (!$html) {
		echo "<br />cURL error number:" .curl_errno($ch);
		echo "<br />cURL error:" . curl_error($ch);
	}
	curl_close($ch); 	
	//$html = file_get_contents($search_url);

	// parse the html into a DOMDocument  

		$dom = new DOMDocument();
		@$dom->loadHTML($html);

	// Grab Product Links  

		$xpath = new DOMXPath($dom);
		$paras = $xpath->query("//div//h3/a");

		$para = $paras->item($numb);
		if($para == '' | $para == null) {
			//echo '<div class="updated"><p>No articles found!</p></div>';
			return "nothing";
			break;
		} else {		
	if($lang == "en") {
		$target_url = $para->getAttribute('href'); // $target_url = "http://www.articlesbase.com" . $para->getAttribute('href');
	} elseif($lang == "fr") {
		$target_url = $para->getAttribute('href'); // $target_url = "http://fr.articlesbase.com" . $para->getAttribute('href');	
	} elseif($lang == "es") {
		$target_url = $para->getAttribute('href'); // $target_url = "http://www.articuloz.com" . $para->getAttribute('href');	
	} elseif($lang == "pg") {
		$target_url = $para->getAttribute('href'); // $target_url = "http://www.artigonal.com" . $para->getAttribute('href');	
	} elseif($lang == "ru") {
		$target_url = $para->getAttribute('href'); // $target_url = "http://www.rusarticles.com" . $para->getAttribute('href');	
	}		

	// make the cURL request to $search_url
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_USERAGENT, $ua);
	curl_setopt($ch, CURLOPT_URL,$target_url);
	curl_setopt($ch, CURLOPT_FAILONERROR, true);
	curl_setopt($ch, CURLOPT_AUTOREFERER, true);
	/*
	 Proxy
	if($proxy != "") {
	curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1); 
	curl_setopt($ch, CURLOPT_PROXY, $proxy);
	if($proxyuser) {curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxyuser);}
	if($proxytype == "socks") {curl_setopt ($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);}
	}	
	*/
	curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
	curl_setopt($ch, CURLOPT_TIMEOUT, 45);
	$html= curl_exec($ch);
	if (!$html) {
		echo "<br />cURL error number:" .curl_errno($ch);
		echo "<br />cURL error:" . curl_error($ch);
		exit;
	}
	curl_close($ch);

	// parse the html into a DOMDocument  

		$dom = new DOMDocument();
		@$dom->loadHTML($html);
		

	// Grab Article Title 

		$xpath = new DOMXPath($dom);
		$paras = $xpath->query("//div/h1");
		
		$para = $paras->item(0);
		$title = $para->textContent;		
		$title2 = $title;
		
		if (function_exists('ma_translate') && get_option('ma_trans_title') == 1 && get_option('ma_trans_article') == 1) {$title = ma_translate($title2);}			

						
	// Grab Article	
	
	if (get_option('ma_eza_grabmethod')=='old') {
		$xpath = new DOMXPath($dom);
		$paras = $xpath->query("//div[@class='article_cnt KonaBody']//p"); 

		for ($i = 0;  $i < $paras->length; $i++ ) {  //$paras->length

			$para = $paras->item($i);
			$paragraph = $para->textContent;
			
			if ($paragraph != '') {
					if (function_exists('ma_translate') && get_option('ma_trans_article') == 1) {$paragraph = ma_translate($paragraph);}
			
				$content .= $paragraph . ' ';
				$content .= "<br/><br/>";
			}
		}		
	} elseif (get_option('ma_eza_grabmethod')=='new') {
		$xpath = new DOMXPath($dom);
		$paras = $xpath->query("//div[@class='article_cnt KonaBody']"); 
		$para = $paras->item(0);		
		$string = $dom->saveXml($para);	

		$string = strip_tags($string,'<p><strong><b><a><br>');
		$string = str_replace('<div class="KonaBody">', "", $string);	
		$string = str_replace("</div>", "", $string);			
		if (function_exists('ma_translate') && get_option('ma_trans_article') == 1) {$string = ma_translate($string);}

		$content .= $string . ' ';
	}
	
	// Grab Ressource Box	

		$xpath = new DOMXPath($dom);
		$parax = $xpath->query("//div[@class='author_details']/p");
		//$para = $paras->item(0);		
		//$ressourcetext = $dom->saveXml($para);	
		
		for ($i = 0;  $i < $parax->length; $i++ ) {  //$paras->length
			$parac = $parax->item($i);
			$ressourcetext .= $dom->saveXml($parac);	
		}			

		if (function_exists('ma_translate') && get_option('ma_trans_articlebox') == 1) {$ressourcetext = ma_translate($ressourcetext);}
		
		if ($ressourcetext != '') {
		$authorbox = "<div style=\"margin:5px;padding:5px;border:1px solid #c1c1c1;font-size: 10px;\">" . $ressourcetext . "</div>";	
		}	

}		

}
		$textc = $content;
		//$textc=substr_replace($textc, "...<!--more-->", 100, 0);
		
		//$textc = htmlspecialchars($textc, ENT_QUOTES);
		if($lang == "es") {
			//$textc = utf8_decode($textc);	
		}
		$authorbox = utf8_decode($authorbox);
		$title = utf8_decode($title);
		$content = get_option( 'ma_eza_template'); //'{thumbnail}{description}{link}');;	
						// Clickbank		
						$pos = strpos($content, "{clickbank}");		
						if ($pos === false) {
						} else {
							$cbad = ma_getclickbank($keyword,"no");
							if($cbad[4] != "") {
								$content = str_replace("{clickbank}", $cbad[4], $content);							
							} else {	
								$content = str_replace("{clickbank}", "", $content);								
							}							
						}				
						// Youtube			
						$pos = strpos($content, "{video}");		
						if ($pos === false) {
						} else {
							$vid = ma_getvideo($keyword2,1,0);
							if($vid[8] != "") {
								$content = str_replace("{video}", $vid[8], $content);							
							} else {	
								$content = str_replace("{video}", "", $content);								
							}							
						}		
						// Flickr
						preg_match('#\{image(.*)\}#iU', $content, $matches);
						if ($matches[0] == false) {
						} else {
							if($matches[1] != false ) {$imgkeyword = substr($matches[1], 1);} else {$imgkeyword = $keyword;}
					
							$img = ma_getimage($imgkeyword,1,0);
							if($img[4] != "" && $img[4] != "i") {
								$image = '<img style="float:left;margin: 0 20px 10px 0;" src="'.$img[4].'" width="'.get_option("ma_fl_twidth").'" />';
								$content = str_replace("{date}", $img[1], $content);
								$content = str_replace("{owner}", $img[2] , $content);
								$content = str_replace("{largeimage}", $img[6], $content);
								$content = str_replace($matches[0], $image, $content);
								$fllink = 'http://www.flickr.com/photos/'.$img[7].'/'.$img[8];
								$content = str_replace("{imageurl}", $fllink, $content);
							} else {
								$content = str_replace("{date}", "", $content);
								$content = str_replace("{owner}", "", $content);
								$content = str_replace("{largeimage}","", $content);								
								$content = str_replace($matches[0], "", $content);
								$content = str_replace("{imageurl}", "", $content);								
							}
						}	
						// eBay
						preg_match('#\{auction(.*)\}#iU', $content, $matches);
						if ($matches[0] == false) {
						} else {
							if($matches[1] != false ) {$aucnum = substr($matches[1], 1);} else {$aucnum = 1;}
							$content = str_replace($matches[0], '[eba kw="'.$keyword2.'" num="'.$aucnum.'" ebcat=""]', $content);						
						}	
		$content = str_replace("{article}", $textc, $content);			
		$content = str_replace("{authorbox}", $authorbox, $content);	
		$content = str_replace("{keyword}", $keyword2, $content);	
		$content = str_replace("{url}", $target_url, $content);	
		
		$insert = ma_insertpost($content,$title,$cat);			
		if ($insert == false) {return false;} else {return true;} //ma_post($which);		

}

function ma_eza_options() {
?>
		<table width="100%" cellspacing="2" cellpadding="5" class="editform"> 
			<tr valign="top"> 
				<td width="30%" scope="row">Article Source:</td> 
				<td>
				<select name="ma_eza_source" id="ma_eza_source">
					<option value="articlesbase" <?php if (get_option('ma_eza_source')=='articlesbase') {echo 'selected';} ?>>Articlesbase.com</option>
					<option value="sooperarticles" <?php if (get_option('ma_eza_source')=='sooperarticles') {echo 'selected';} ?>>Sooperarticles.com</option>
				</select>
				</td> 
			</tr>			
			<tr valign="top"> 
				<td width="30%" scope="row">Article Formatting Method:</td> 
				<td>
				<select name="ma_eza_grabmethod" id="ma_eza_grabmethod">
					<option value="new" <?php if (get_option('ma_eza_grabmethod')=='new') {echo 'selected';} ?>>Leave Formatting Intact</option>
					<option value="old" <?php if (get_option('ma_eza_grabmethod')=='old') {echo 'selected';} ?>>Replace Formatting</option>
				</select> <a href="http://wprobot.net/documentation/#34"><b>?</b></a>
				</td> 
			</tr>	
			<tr valign="top"> 
				<td width="30%" scope="row">Article Language:</td> 
				<td>
				<select name="ma_eza_lang" id="ma_eza_lang">
					<option value="en" <?php if(get_option('ma_eza_lang')=="en"){_e('selected');}?>>English</option>
					<option value="fr" <?php if(get_option('ma_eza_lang')=="fr"){_e('selected');}?>>French</option>
					<option value="es" <?php if(get_option('ma_eza_lang')=="es"){_e('selected');}?>>Spanish</option>
					<option value="pg" <?php if(get_option('ma_eza_lang')=="pg"){_e('selected');}?>>Portuguese</option>
					<option value="ru" <?php if(get_option('ma_eza_lang')=="ru"){_e('selected');}?>>Russian</option>
				</select>
			</td> 
			</tr>				
			<tr valign="top"> 
				<td width="30%" scope="row">Post Template:</td> 
				<td>			
			<textarea name="ma_eza_template" rows="2" cols="30"><?php echo get_option('ma_eza_template');?></textarea>	
				</td> 
			</tr>					
		</table>
<?php
}
?>

PHP:

smashedpumpkins, Mar 27, 2010 IP

Alex Roxon Active Member

Messages:: 424

Likes Received:: 11

Best Answers:: 7

Trophy Points:: 80

#2

I guess you try preg_replace. Something like:
$Content = preg_replace( '/\<a href\=[\"\\\'].*?\<\/a\>/i', '', $Content);
PHP:

Alex Roxon, Mar 27, 2010 IP

hugsbunny Peon

Messages:: 23

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#3

I think you can use strip_tags and type <a> tag , I dont remember the syntax , but I think it will work

hugsbunny, Mar 27, 2010 IP

JAY6390 Peon

Messages:: 918

Likes Received:: 31

Best Answers:: 0

Trophy Points:: 0

#4

$content = preg_replac(%'</?a\b[^>]*>%', '', $content);
PHP:
Not read your code, but that is how you do it. Just replace $content with your content variable

JAY6390, Mar 27, 2010 IP

smashedpumpkins Peon

Messages:: 40

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#5

Great, I really appreciate it! I'll try it out and report back. Thanks

smashedpumpkins, Mar 27, 2010 IP

danx10 Peon

Messages:: 1,179

Likes Received:: 44

Best Answers:: 2

Trophy Points:: 0

#6

JAY6390 said: ↑
$content = preg_replac(%'</?a\b[^>]*>%', '', $content);
PHP:
Not read your code, but that is how you do it. Just replace $content with your content variable
Click to expand...
$content = preg_replace('%</?a\b[^>]*>%', '', $content);
PHP:

danx10, Mar 27, 2010 IP

smashedpumpkins Peon

Messages:: 40

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#7

I went ahead and tried all 3 listed codes, but none worked. I tried it on the content variable and the paragraph variable. The links always remained intact. By looking at my example can you see why this might not work? I wish I could be of more help. I really appreciate the help you're all offering.

smashedpumpkins, Mar 28, 2010 IP

smashedpumpkins Peon

Messages:: 40

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#8

rguy84 said:

Do they know you are? Like you have permission to do so?

You will need to use regex functions to find a url (ie <a href), get the words between ></a>, strip the link tags and re-insert the words.
Click to expand...

They're taken from an open and free article database.

I've tried using the following with no success.
$content = preg_replace('%</?a\b[^>]*>%', '', $content);
PHP:
I noticed the script has a similar option for other features. I've tried to copy it over, but I can't figure it out. I've used the below code with both paragraph and content, but the links still exist! Any ideas based on this code?
$paragraph = ma_strip_selected_tags($paragraph, array('a','iframe','script'));
PHP:
function ma_strip_selected_tags($text, $tags = array()) {
 $args = func_get_args();
 $text = array_shift($args);
 $tags = func_num_args() > 2 ? array_diff($args,array($text)) : (array)$tags;
 foreach ($tags as $tag){
 while(preg_match('/<'.$tag.'(|\W[^>]*)>(.*)<\/'. $tag .'>/iusU', $text, $found)){
 $text = str_replace($found[0],$found[2],$text);
 }
 }
 return preg_replace('/(<('.join('|',$tags).')(|\W.*)\/>)/iusU', '', $text);
}
PHP:

smashedpumpkins, Mar 28, 2010 IP

danx10 Peon

Messages:: 1,179

Likes Received:: 44

Best Answers:: 2

Trophy Points:: 0

#9

Try:

$content = preg_replace('~<(a.*)href=(?:"|\')(.*?)(?:"|\')(.*)</a>~i', '', $content);

PHP:

Last edited: Mar 28, 2010

danx10, Mar 28, 2010 IP

JAY6390 Peon

Messages:: 918

Likes Received:: 31

Best Answers:: 0

Trophy Points:: 0

#10

The regex I gave works perfectly well. The content must not be correct

JAY6390, Mar 28, 2010 IP

smashedpumpkins Peon

Messages:: 40

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#11

As you can see here I used the preg_replace code. (I've tried every example you've all given me) Paragraph has all of the content in the variable. I don't understand why it's not working either. It's becoming a real pain. I appreciate the help.
		$xpath = new DOMXPath($dom);
		$paras = $xpath->query("//div[@id='KonaBody']//p"); 

		for ($i = 0; $i < $paras->length; $i++ ) { //$paras->length

			$para = $paras->item($i);
			$paragraph = $para->textContent;
			$paragraph = preg_replace('~<(a.*)href=(?:"|\')(.*?)(?:"|\')(.*)</a>~i', '', $paragraph);
			
			if ($paragraph != '') {
					if (function_exists('ma_translate') && get_option('ma_trans_article') == 1) {$paragraph = ma_translate($paragraph);}
				$content .= $paragraph . ' ';
				$content .= " ";
			}
		}
PHP:

smashedpumpkins, Mar 28, 2010 IP

danx10 Peon

Messages:: 1,179

Likes Received:: 44

Best Answers:: 2

Trophy Points:: 0

#12

Reply with the html source which contains the links you want replaced, you can get the source by:
highlight_string($content);
PHP:
and then copying the about ^.

danx10, Mar 29, 2010 IP

smashedpumpkins Peon

Messages:: 40

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#13

danx10 said: ↑
Reply with the html source which contains the links you want replaced, you can get the source by:
highlight_string($content);
PHP:
and then copying the about ^.
Click to expand...
I've finally had some success! I've probably spent over 20 hours on this issue. I found 2 different spots I have to place the code. There's a duplicate location for another article source. Any who it's working! I did find one article that snuck by with some links. I checked our the html code and here it is. I'm using the php code below from danx10. Would I need a different code to get rid of a link like this?

Here's the HTML code for the link.
<a onclick="javascript:pageTracker._trackPageview('/outgoing/article_exit_link');" rel="nofollow" href="http://www.datingonlinesingles.blogspot.com">click here to join the best free dating site..</a>
Code (markup):
Here's the PHP code I'm using.
$content = preg_replace('~<(a.*)href=(?:"|\')(.*?)(?:"|\')(.*)</a>~i', '', $content);
PHP:
Oh and can you possibly help with one other thing. I've found that many articles have links that aren't active. So instead of a clickable link the user has to copy and paste it. How can I use the preg_replace to search for lets says www?

EDIT: After looking at the code I'm assuming you made it more detailed by adding in href? I'm going to try your previous example as well to see if it'll clear them all. EDIT: Didn't work. Same issue as above.
$content = preg_replace('%</?a\b[^>]*>%', '', $content);
PHP:

Last edited: Mar 29, 2010

smashedpumpkins, Mar 29, 2010 IP

danx10 Peon

Messages:: 1,179

Likes Received:: 44

Best Answers:: 2

Trophy Points:: 0

#14

This would work:
$content = preg_replace('~<(a.*)href=(?:"|\')(.*?)(?:"|\')(.*)</a>~i', '', $content);
PHP:
But it could easily be shortend too:
$content = preg_replace('~<(a.*)href=(.*?)</a>~i', '', $content);
PHP:
Proof that it works:
<?php

//input
$content = <<<eof
Test text containing urls..
<a onclick="javascript:pageTracker._trackPageview('/outgoing/article_exit_link');" rel="nofollow" href="http://www.datingonlinesingles.blogspot.com">click here to join the best free dating site..</a>

<a href="http://digitalpoint.com">Test Url</a>
eof;


$content = preg_replace('~<(a.*)href=(.*?)</a>~i', '', $content);

//output - Test text containing urls..
echo $content;

?>
PHP:
Oh and can you possibly help with one other thing. I've found that many articles have links that aren't active. So instead of a clickable link the user has to copy and paste it. How can I use the preg_replace to search for lets says www?
Click to expand...

Ellaborate?, give me an example inactive link, and what you'd want it look like (removed, replaced/formatted in a specific way??)

danx10, Mar 29, 2010 IP

smashedpumpkins Peon

Messages:: 40

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#15

It works! I can now remove all links! I found a better spot to place it and it worked. WOW finally! Now I have 2 questions.

I'd like to remove links without A HREF tags. How can I go about removing these links? For example, the link below does not use A HREF tags. Therefore it won't be removed from the article.
http://www.google.com
Code (markup):
This link has A HREF tags and will be removed.
[url]http://www.google.com[/url]
Code (markup):
Second, is there a way to remove the link tags, but keep the words? For example, instead of a link below it would say just say Google.
Google

smashedpumpkins, Mar 29, 2010 IP

K.Meier Well-Known Member

Messages:: 281

Likes Received:: 4

Best Answers:: 0

Trophy Points:: 110

#16

That would be the explode function you are looking for. Look at the examples and see if you can come up with something. Shouldn't be that hard.

K.Meier, Mar 29, 2010 IP

danx10 Peon

Messages:: 1,179

Likes Received:: 44

Best Answers:: 2

Trophy Points:: 0

#17

@smashedpumpkins

You can use the following regex:
$content = preg_replace('~<(a.*)href=(.*?)>(.*)</a>~i', '$3', $content);
PHP:
This should remove all a href links, and if they contain link text, the link text will remain but the link will be removed. Furthermore non a href links won't be touched (theirfore will not be effected, so you don't need a regex for that).

Example:
<?php

//input
$content = <<<eof


http://www.google.com

<!--a href tag containing link text example...->

<a href="http://www.google.com" target="_blank">Google</a>
eof;


$content = preg_replace('~<(a.*)href=(.*?)>(.*)</a>~i', '$3', $content);

//ouput - http://www.google.com Google
echo $content;

?>
PHP:

danx10, Mar 30, 2010 IP

stOK Active Member

Messages:: 114

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 53

#18

danx10, please stop suggesting crap. Better spend some time doing homework and study why your code works only on few examples.

JAY6390 already gave almost perfect solution except the regexp should be case insensitive.
$content = preg_replace('%</?a\b[^>]*>%[COLOR="Red"]i[/COLOR]', '', $content);
Code (markup):

stOK, Mar 30, 2010 IP

JAY6390 Peon

Messages:: 918

Likes Received:: 31

Best Answers:: 0

Trophy Points:: 0

#19

In fairness, the regex shouldn't need to be case insensitive, any valid html would have lowercase <a> tags (which is why I didn't put the insensitive flag on it to begin with

JAY6390, Mar 30, 2010 IP

stOK Active Member

Messages:: 114

Likes Received:: 2

Best Answers:: 0

Trophy Points:: 53

#20

JAY6390 said: ↑

In fairness, the regex shouldn't need to be case insensitive, any valid html would have lowercase <a> tags (which is why I didn't put the insensitive flag on it to begin with
Click to expand...

Wrong. you might mean xhtml. HTML element names are case-insensitive.

stOK, Mar 30, 2010 IP

Log in or Sign up

Advertising (learn more)

Stripping Links with PHP?

smashedpumpkins Peon

Alex Roxon Active Member

hugsbunny Peon

JAY6390 Peon

smashedpumpkins Peon

danx10 Peon

smashedpumpkins Peon

smashedpumpkins Peon

danx10 Peon

JAY6390 Peon

smashedpumpkins Peon

danx10 Peon

smashedpumpkins Peon

danx10 Peon

smashedpumpkins Peon

K.Meier Well-Known Member

danx10 Peon

stOK Active Member

JAY6390 Peon

stOK Active Member

Log in or Sign up

Advertising (learn more)

Stripping Links with PHP?

smashedpumpkins Peon

Alex Roxon Active Member

hugsbunny Peon

JAY6390 Peon

smashedpumpkins Peon

danx10 Peon

smashedpumpkins Peon

smashedpumpkins Peon

danx10 Peon

JAY6390 Peon

smashedpumpkins Peon

danx10 Peon

smashedpumpkins Peon

danx10 Peon

smashedpumpkins Peon

K.Meier Well-Known Member

danx10 Peon

stOK Active Member

JAY6390 Peon

stOK Active Member

Useful Searches