I'm using a WordPress plugin to grab articles related to my website content. The plugin takes the article and copies it identically and posts it to my blog. However, with many articles, it's copping links that I do not want. I'd like to strip all links, but leave the rest of the formatting in tac. I don't think it should be horrible hard to do, but I can't figure it out myself. If it's a simple line or two can anyone help me out? I believe it should be added between Line 121 - 135. (These lines define the content that is displayed. The next 20 lines after this code is a second option that removes all formatting. However I only want to remove links. I really appreciate any help you can offer! $xpath = new DOMXPath($dom); $paras = $xpath->query("//div[@id='KonaBody']//p"); for ($i = 0; $i < $paras->length; $i++ ) { //$paras->length $para = $paras->item($i); $paragraph = $para->textContent; if ($paragraph != '') { if (function_exists('ma_translate') && get_option('ma_trans_article') == 1) {$paragraph = ma_translate($paragraph);} $content .= $paragraph . ' '; $content .= "<br/><br/>"; } } PHP: For a better understanding here's the entire files code. <?php function ma_articlepost($keyword,$cat,$num,$which) { global $wpdb, $ma_dbtable; // Debug debug_log('- EZA'); $keyword2 = $keyword; $keyword = str_replace( " ","+",$keyword ); $keyword = urlencode($keyword); $blist[] = "Mozilla/5.0 (compatible; Konqueror/4.0; Microsoft Windows) KHTML/4.0.80 (like Gecko)"; $blist[] = "Mozilla/5.0 (compatible; Konqueror/3.92; Microsoft Windows) KHTML/3.92.0 (like Gecko)"; $blist[] = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; WOW64; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506; Media Center PC 5.0; .NET CLR 1.1.4322; Windows-Media-Player/10.00.00.3990; InfoPath.2"; $blist[] = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; Dealio Deskball 3.0)"; $blist[] = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; NeosBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"; $ua = $blist[array_rand($blist)]; $source = get_option('ma_eza_source'); // SOOPERARTICLES if($source == "sooperarticles") { $startat = $num; if ($startat == 0) { $startpage = 1; $sk = 1; } else { $xz = $startat / 15; $startpage = ceil($xz); $sk = $startat - ( $startpage -1 ) * 15; } $l = $startpage; $sk = $sk -1; $search_url = "http://www.sooperarticles.com/search/?t=titles&s=$keyword&p=$l"; // make the cURL request to $search_url $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, 'Firefox (WindowsXP) - Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6'); curl_setopt($ch, CURLOPT_URL,$search_url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 45); $html= curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } curl_close($ch); // parse the html into a DOMDocument $dom = new DOMDocument(); @$dom->loadHTML($html); // Grab Product Links $xpath = new DOMXPath($dom); $paras = $xpath->query("//div/h3/a"); $para = $paras->item($sk); if($para == '' | $para == null) { echo '<div class="updated"><p>No articles found!</p></div>'; return "nothing"; break; } else { $target_url = $para->getAttribute('href'); // make the cURL request to $search_url $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $ua); curl_setopt($ch, CURLOPT_URL,$target_url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 45); $html= curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } curl_close($ch); // parse the html into a DOMDocument $dom = new DOMDocument(); @$dom->loadHTML($html); // Grab Article Title $xpath = new DOMXPath($dom); $paras = $xpath->query("//div/h1"); $para = $paras->item(0); $title = $para->textContent; $title2 = $title; if (function_exists('ma_translate') && get_option('ma_trans_title') == 1 && get_option('ma_trans_article') == 1) {$title = ma_translate($title2);} // Check X $xpath = new DOMXPath($dom); $paras = $xpath->query("//div[@id='KonaBody']/div[@class='arightside']"); $para = $paras->item(0); if($para != "" && $para != null) { return false; break; } // Grab Article if (get_option('ma_eza_grabmethod')=='old') { $xpath = new DOMXPath($dom); $paras = $xpath->query("//div[@id='KonaBody']//p"); for ($i = 0; $i < $paras->length; $i++ ) { //$paras->length $para = $paras->item($i); $paragraph = $para->textContent; if ($paragraph != '') { if (function_exists('ma_translate') && get_option('ma_trans_article') == 1) {$paragraph = ma_translate($paragraph);} $content .= $paragraph . ' '; $content .= "<br/><br/>"; } } } elseif (get_option('ma_eza_grabmethod')=='new') { $xpath = new DOMXPath($dom); $paras = $xpath->query("//div[@id='KonaBody']"); $para = $paras->item(0); $string = $dom->saveXml($para); $tags = array('div','iframe','script'); $string = ma_strip_selected_tags($string, $tags); $string = str_replace("]]>", "", $string); $string = str_replace("<![CDATA[", "", $string); if (function_exists('ma_translate') && get_option('ma_trans_article') == 1) {$string = ma_translate($string);} $content .= $string . ' '; } // Grab Ressource Box $xpath = new DOMXPath($dom); $paras = $xpath->query("//div[@class='author-signature']"); $para = $paras->item(0); $ressourcetext = $dom->saveXml($para); if (function_exists('ma_translate') && get_option('ma_trans_articlebox') == 1) {$ressourcetext = ma_translate($ressourcetext);} if ($ressourcetext != '') { $authorbox = "<div style=\"margin:5px;padding:5px;border:1px solid #c1c1c1;font-size: 10px;\">" . $ressourcetext . "</div>"; } } } // ARTICLESBASE if($source == "articlesbase") { /* Select Proxy $proxy =""; $burl = get_bloginfo('url');; $arr=@file("$burl/wp-content/plugins/WPRobot/modules/proxies.txt"); if($arr) { $noprox = count($arr) - 1; $rprox = rand(0,$noprox); list($proxy,$proxytype,$proxyuser)=explode("|",$arr[$rprox]); } */ $page = $num / 15; $page = (string) $page; $page = explode(".", $page); $page=(int)$page[0]; $page++; if($page == 0) {$page = 1;} $prep = floor($num / 15); $numb = $num - $prep * 15; /* $numb = $num; $num = $num / 15; $num = (string) $num; $num = explode(".", $num); $page=(int)$num[0]; $page++; $cnum=(int)$num[1]; $l = $page; $sk = $cnum;*/ $lang = get_option('ma_eza_lang'); if($lang == "en") { $search_url = "http://www.articlesbase.com/find-articles.php?q=$keyword&page=$page"; } elseif($lang == "fr") { $search_url = "http://fr.articlesbase.com/find-articles.php?q=$keyword&page=$page"; } elseif($lang == "es") { $search_url = "http://www.articuloz.com/find-articles.php?q=$keyword&page=$page"; } elseif($lang == "pg") { $search_url = "http://www.artigonal.com/find-articles.php?q=$keyword&page=$page"; } elseif($lang == "ru") { $search_url = "http://www.rusarticles.com/find-articles.php?q=$keyword&page=$page"; } // make the cURL request to $search_url $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, 'Firefox (WindowsXP) - Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6'); curl_setopt($ch, CURLOPT_URL,$search_url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); /* Proxy if($proxy != "") { curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1); curl_setopt($ch, CURLOPT_PROXY, $proxy); if($proxyuser) {curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxyuser);} if($proxytype == "socks") {curl_setopt ($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);} } */ curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 45); $html= curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); } curl_close($ch); //$html = file_get_contents($search_url); // parse the html into a DOMDocument $dom = new DOMDocument(); @$dom->loadHTML($html); // Grab Product Links $xpath = new DOMXPath($dom); $paras = $xpath->query("//div//h3/a"); $para = $paras->item($numb); if($para == '' | $para == null) { //echo '<div class="updated"><p>No articles found!</p></div>'; return "nothing"; break; } else { if($lang == "en") { $target_url = $para->getAttribute('href'); // $target_url = "http://www.articlesbase.com" . $para->getAttribute('href'); } elseif($lang == "fr") { $target_url = $para->getAttribute('href'); // $target_url = "http://fr.articlesbase.com" . $para->getAttribute('href'); } elseif($lang == "es") { $target_url = $para->getAttribute('href'); // $target_url = "http://www.articuloz.com" . $para->getAttribute('href'); } elseif($lang == "pg") { $target_url = $para->getAttribute('href'); // $target_url = "http://www.artigonal.com" . $para->getAttribute('href'); } elseif($lang == "ru") { $target_url = $para->getAttribute('href'); // $target_url = "http://www.rusarticles.com" . $para->getAttribute('href'); } // make the cURL request to $search_url $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $ua); curl_setopt($ch, CURLOPT_URL,$target_url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); /* Proxy if($proxy != "") { curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1); curl_setopt($ch, CURLOPT_PROXY, $proxy); if($proxyuser) {curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxyuser);} if($proxytype == "socks") {curl_setopt ($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);} } */ curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 45); $html= curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } curl_close($ch); // parse the html into a DOMDocument $dom = new DOMDocument(); @$dom->loadHTML($html); // Grab Article Title $xpath = new DOMXPath($dom); $paras = $xpath->query("//div/h1"); $para = $paras->item(0); $title = $para->textContent; $title2 = $title; if (function_exists('ma_translate') && get_option('ma_trans_title') == 1 && get_option('ma_trans_article') == 1) {$title = ma_translate($title2);} // Grab Article if (get_option('ma_eza_grabmethod')=='old') { $xpath = new DOMXPath($dom); $paras = $xpath->query("//div[@class='article_cnt KonaBody']//p"); for ($i = 0; $i < $paras->length; $i++ ) { //$paras->length $para = $paras->item($i); $paragraph = $para->textContent; if ($paragraph != '') { if (function_exists('ma_translate') && get_option('ma_trans_article') == 1) {$paragraph = ma_translate($paragraph);} $content .= $paragraph . ' '; $content .= "<br/><br/>"; } } } elseif (get_option('ma_eza_grabmethod')=='new') { $xpath = new DOMXPath($dom); $paras = $xpath->query("//div[@class='article_cnt KonaBody']"); $para = $paras->item(0); $string = $dom->saveXml($para); $string = strip_tags($string,'<p><strong><b><a><br>'); $string = str_replace('<div class="KonaBody">', "", $string); $string = str_replace("</div>", "", $string); if (function_exists('ma_translate') && get_option('ma_trans_article') == 1) {$string = ma_translate($string);} $content .= $string . ' '; } // Grab Ressource Box $xpath = new DOMXPath($dom); $parax = $xpath->query("//div[@class='author_details']/p"); //$para = $paras->item(0); //$ressourcetext = $dom->saveXml($para); for ($i = 0; $i < $parax->length; $i++ ) { //$paras->length $parac = $parax->item($i); $ressourcetext .= $dom->saveXml($parac); } if (function_exists('ma_translate') && get_option('ma_trans_articlebox') == 1) {$ressourcetext = ma_translate($ressourcetext);} if ($ressourcetext != '') { $authorbox = "<div style=\"margin:5px;padding:5px;border:1px solid #c1c1c1;font-size: 10px;\">" . $ressourcetext . "</div>"; } } } $textc = $content; //$textc=substr_replace($textc, "...<!--more-->", 100, 0); //$textc = htmlspecialchars($textc, ENT_QUOTES); if($lang == "es") { //$textc = utf8_decode($textc); } $authorbox = utf8_decode($authorbox); $title = utf8_decode($title); $content = get_option( 'ma_eza_template'); //'{thumbnail}{description}{link}');; // Clickbank $pos = strpos($content, "{clickbank}"); if ($pos === false) { } else { $cbad = ma_getclickbank($keyword,"no"); if($cbad[4] != "") { $content = str_replace("{clickbank}", $cbad[4], $content); } else { $content = str_replace("{clickbank}", "", $content); } } // Youtube $pos = strpos($content, "{video}"); if ($pos === false) { } else { $vid = ma_getvideo($keyword2,1,0); if($vid[8] != "") { $content = str_replace("{video}", $vid[8], $content); } else { $content = str_replace("{video}", "", $content); } } // Flickr preg_match('#\{image(.*)\}#iU', $content, $matches); if ($matches[0] == false) { } else { if($matches[1] != false ) {$imgkeyword = substr($matches[1], 1);} else {$imgkeyword = $keyword;} $img = ma_getimage($imgkeyword,1,0); if($img[4] != "" && $img[4] != "i") { $image = '<img style="float:left;margin: 0 20px 10px 0;" src="'.$img[4].'" width="'.get_option("ma_fl_twidth").'" />'; $content = str_replace("{date}", $img[1], $content); $content = str_replace("{owner}", $img[2] , $content); $content = str_replace("{largeimage}", $img[6], $content); $content = str_replace($matches[0], $image, $content); $fllink = 'http://www.flickr.com/photos/'.$img[7].'/'.$img[8]; $content = str_replace("{imageurl}", $fllink, $content); } else { $content = str_replace("{date}", "", $content); $content = str_replace("{owner}", "", $content); $content = str_replace("{largeimage}","", $content); $content = str_replace($matches[0], "", $content); $content = str_replace("{imageurl}", "", $content); } } // eBay preg_match('#\{auction(.*)\}#iU', $content, $matches); if ($matches[0] == false) { } else { if($matches[1] != false ) {$aucnum = substr($matches[1], 1);} else {$aucnum = 1;} $content = str_replace($matches[0], '[eba kw="'.$keyword2.'" num="'.$aucnum.'" ebcat=""]', $content); } $content = str_replace("{article}", $textc, $content); $content = str_replace("{authorbox}", $authorbox, $content); $content = str_replace("{keyword}", $keyword2, $content); $content = str_replace("{url}", $target_url, $content); $insert = ma_insertpost($content,$title,$cat); if ($insert == false) {return false;} else {return true;} //ma_post($which); } function ma_eza_options() { ?> <table width="100%" cellspacing="2" cellpadding="5" class="editform"> <tr valign="top"> <td width="30%" scope="row">Article Source:</td> <td> <select name="ma_eza_source" id="ma_eza_source"> <option value="articlesbase" <?php if (get_option('ma_eza_source')=='articlesbase') {echo 'selected';} ?>>Articlesbase.com</option> <option value="sooperarticles" <?php if (get_option('ma_eza_source')=='sooperarticles') {echo 'selected';} ?>>Sooperarticles.com</option> </select> </td> </tr> <tr valign="top"> <td width="30%" scope="row">Article Formatting Method:</td> <td> <select name="ma_eza_grabmethod" id="ma_eza_grabmethod"> <option value="new" <?php if (get_option('ma_eza_grabmethod')=='new') {echo 'selected';} ?>>Leave Formatting Intact</option> <option value="old" <?php if (get_option('ma_eza_grabmethod')=='old') {echo 'selected';} ?>>Replace Formatting</option> </select> <a href="http://wprobot.net/documentation/#34"><b>?</b></a> </td> </tr> <tr valign="top"> <td width="30%" scope="row">Article Language:</td> <td> <select name="ma_eza_lang" id="ma_eza_lang"> <option value="en" <?php if(get_option('ma_eza_lang')=="en"){_e('selected');}?>>English</option> <option value="fr" <?php if(get_option('ma_eza_lang')=="fr"){_e('selected');}?>>French</option> <option value="es" <?php if(get_option('ma_eza_lang')=="es"){_e('selected');}?>>Spanish</option> <option value="pg" <?php if(get_option('ma_eza_lang')=="pg"){_e('selected');}?>>Portuguese</option> <option value="ru" <?php if(get_option('ma_eza_lang')=="ru"){_e('selected');}?>>Russian</option> </select> </td> </tr> <tr valign="top"> <td width="30%" scope="row">Post Template:</td> <td> <textarea name="ma_eza_template" rows="2" cols="30"><?php echo get_option('ma_eza_template');?></textarea> </td> </tr> </table> <?php } ?> PHP:
I guess you try preg_replace. Something like: $Content = preg_replace( '/\<a href\=[\"\\\'].*?\<\/a\>/i', '', $Content); PHP:
I think you can use strip_tags and type <a> tag , I dont remember the syntax , but I think it will work
$content = preg_replac(%'</?a\b[^>]*>%', '', $content); PHP: Not read your code, but that is how you do it. Just replace $content with your content variable
I went ahead and tried all 3 listed codes, but none worked. I tried it on the content variable and the paragraph variable. The links always remained intact. By looking at my example can you see why this might not work? I wish I could be of more help. I really appreciate the help you're all offering.
They're taken from an open and free article database. I've tried using the following with no success. $content = preg_replace('%</?a\b[^>]*>%', '', $content); PHP: I noticed the script has a similar option for other features. I've tried to copy it over, but I can't figure it out. I've used the below code with both paragraph and content, but the links still exist! Any ideas based on this code? $paragraph = ma_strip_selected_tags($paragraph, array('a','iframe','script')); PHP: function ma_strip_selected_tags($text, $tags = array()) { $args = func_get_args(); $text = array_shift($args); $tags = func_num_args() > 2 ? array_diff($args,array($text)) : (array)$tags; foreach ($tags as $tag){ while(preg_match('/<'.$tag.'(|\W[^>]*)>(.*)<\/'. $tag .'>/iusU', $text, $found)){ $text = str_replace($found[0],$found[2],$text); } } return preg_replace('/(<('.join('|',$tags).')(|\W.*)\/>)/iusU', '', $text); } PHP:
As you can see here I used the preg_replace code. (I've tried every example you've all given me) Paragraph has all of the content in the variable. I don't understand why it's not working either. It's becoming a real pain. I appreciate the help. $xpath = new DOMXPath($dom); $paras = $xpath->query("//div[@id='KonaBody']//p"); for ($i = 0; $i < $paras->length; $i++ ) { //$paras->length $para = $paras->item($i); $paragraph = $para->textContent; $paragraph = preg_replace('~<(a.*)href=(?:"|\')(.*?)(?:"|\')(.*)</a>~i', '', $paragraph); if ($paragraph != '') { if (function_exists('ma_translate') && get_option('ma_trans_article') == 1) {$paragraph = ma_translate($paragraph);} $content .= $paragraph . ' '; $content .= "<br/><br/>"; } } PHP:
Reply with the html source which contains the links you want replaced, you can get the source by: highlight_string($content); PHP: and then copying the about ^.
I've finally had some success! I've probably spent over 20 hours on this issue. I found 2 different spots I have to place the code. There's a duplicate location for another article source. Any who it's working! I did find one article that snuck by with some links. I checked our the html code and here it is. I'm using the php code below from danx10. Would I need a different code to get rid of a link like this? Here's the HTML code for the link. <a onclick="javascript:pageTracker._trackPageview('/outgoing/article_exit_link');" rel="nofollow" href="http://www.datingonlinesingles.blogspot.com"><strong>click here to join the best free dating site.</strong>.</a> Code (markup): Here's the PHP code I'm using. $content = preg_replace('~<(a.*)href=(?:"|\')(.*?)(?:"|\')(.*)</a>~i', '', $content); PHP: Oh and can you possibly help with one other thing. I've found that many articles have links that aren't active. So instead of a clickable link the user has to copy and paste it. How can I use the preg_replace to search for lets says www? EDIT: After looking at the code I'm assuming you made it more detailed by adding in href? I'm going to try your previous example as well to see if it'll clear them all. EDIT: Didn't work. Same issue as above. $content = preg_replace('%</?a\b[^>]*>%', '', $content); PHP:
This would work: $content = preg_replace('~<(a.*)href=(?:"|\')(.*?)(?:"|\')(.*)</a>~i', '', $content); PHP: But it could easily be shortend too: $content = preg_replace('~<(a.*)href=(.*?)</a>~i', '', $content); PHP: Proof that it works: <?php //input $content = <<<eof Test text containing urls.. <a onclick="javascript:pageTracker._trackPageview('/outgoing/article_exit_link');" rel="nofollow" href="http://www.datingonlinesingles.blogspot.com"><strong>click here to join the best free dating site.</strong>.</a> <a href="http://digitalpoint.com">Test Url</a> eof; $content = preg_replace('~<(a.*)href=(.*?)</a>~i', '', $content); //output - Test text containing urls.. echo $content; ?> PHP: Ellaborate?, give me an example inactive link, and what you'd want it look like (removed, replaced/formatted in a specific way??)
It works! I can now remove all links! I found a better spot to place it and it worked. WOW finally! Now I have 2 questions. I'd like to remove links without A HREF tags. How can I go about removing these links? For example, the link below does not use A HREF tags. Therefore it won't be removed from the article. http://www.google.com Code (markup): This link has A HREF tags and will be removed. [url]http://www.google.com[/url] Code (markup): Second, is there a way to remove the link tags, but keep the words? For example, instead of a link below it would say just say Google. Google
That would be the explode function you are looking for. Look at the examples and see if you can come up with something. Shouldn't be that hard.
@smashedpumpkins You can use the following regex: $content = preg_replace('~<(a.*)href=(.*?)>(.*)</a>~i', '$3', $content); PHP: This should remove all a href links, and if they contain link text, the link text will remain but the link will be removed. Furthermore non a href links won't be touched (theirfore will not be effected, so you don't need a regex for that). Example: <?php //input $content = <<<eof <!--non a href tag example...--> http://www.google.com <!--a href tag containing link text example...-> <a href="http://www.google.com" target="_blank">Google</a> eof; $content = preg_replace('~<(a.*)href=(.*?)>(.*)</a>~i', '$3', $content); //ouput - http://www.google.com Google echo $content; ?> PHP:
danx10, please stop suggesting crap. Better spend some time doing homework and study why your code works only on few examples. JAY6390 already gave almost perfect solution except the regexp should be case insensitive. $content = preg_replace('%</?a\b[^>]*>%[COLOR="Red"]i[/COLOR]', '', $content); Code (markup):
In fairness, the regex shouldn't need to be case insensitive, any valid html would have lowercase <a> tags (which is why I didn't put the insensitive flag on it to begin with