Can somebody explain the following regexp to me? /\bCreditor[A-Za-z]*?\b(?=([^"]*"[^"]*")*[^"]*$)/i I am trying to debug somebody else's code: $glossary_title = "Creditor"; //just a hardcoded example $glossary_search = '/\b'.$glossary_title.'[A-Za-z]*?\b(?=([^"]*"[^"]*")*[^"]*$)/i'; $glossary_replace = '<a....>$0</a>'; $content_temp = preg_replace($glossary_search, $glossary_replace, $content); The problem I am having with the above is it will also match and wrap in <a></a> tags Creditors, Creditor's where it should only match Creditor creditor (strictly). I'm also not sure if the above regex will work with words with spaces in them, and apostraphies ' which it ideally should. Any help would be greatly appreciated. Thanks
Could you give an example of the content you want changed and the same content after you have changed it? (in the way you want it to work)
Sure, well, it is just a testing sample, so don't read too much into it Would become It is basically searching a lot of text/html, and if any words are in the glossary they get highlighted (via link). Case incentive too. The code I provided is from the wordpress plugin that does this, but doesn't work in some scenarios. words with ' in them etc.
Add to that scenario: If an instance of a word (which is a word that is in the glossary) is already previously linked then it should be left alone.
Something like this should work. I've made it so each keyword can have a different link, this makes sense because google only uses the anchortext it finds in the first link on a page. It also gives a little more freedom. You can add as many more as you want. <?php // your content $content = <<<END yada yada yada <a href="already-linked">creditor</a> yada yada yada creditor yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada Creditor's Petition yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada Creditor. yada yada and this following one won't get highlighted cos it is part of another word creditors yada yada yada yada yada yada yada END; // define each word to be matched with it's corresponding link // start with the longest words first, ie: 'Creditors Pension', then 'Creditor' $patterns[] = "Creditor's Petition"; $links[] = 'http://www.first.co.uk/'; $patterns[] = "creditors"; $links[] = 'http://www.second.co.uk/'; $patterns[] = "creditor"; $links[] = 'http://www.third.co.uk/'; // as many words as you want.... // build patterns foreach ($patterns as $k => $v) $patterns[$k] = '/[^>]('.str_replace(' ', '\s', addslashes($v)).')[^<]/i'; foreach ($links as $k => $v) $links[$k] = ' <a href="'.$v.'">$1</a> '; // execute replace $contentWithLinks = preg_replace($patterns, $links, $content); // output new content echo $contentWithLinks; ?> PHP:
Wow. Thanks Now I got to work out how to morph that into the existing code which I can' get my head around : function red_glossary_parse_content($content){ global $terms_done; //Run the glossary parser $glossaryPageID = get_option('red_glossaryID'); if (((!is_page() && get_option('red_glossaryOnlySingle') == 0) OR (!is_page() && get_option('red_glossaryOnlySingle') == 1 && is_single()) OR (is_page() && get_option('red_glossaryOnPages') == 1))){ $glossary_index = get_children(array( 'post_type' => 'glossary', 'post_status' => 'publish', )); usort($glossary_index,'sortByLength'); if ($glossary_index){ $timestamp = time(); foreach($glossary_index as $glossary_item){ $timestamp++; $glossary_title = $glossary_item->post_title; $glossary_search = '/\b'.$glossary_title.'[A-Za-z]*?\b(?=([^"]*"[^"]*")*[^"]*$)/i'; $glossary_replace = '<a'.$timestamp.'>$0</a'.$timestamp.'>'; $content_temp = preg_replace($glossary_search, $glossary_replace, $content); $content_temp = rtrim($content_temp); $link_search = '/<a'.$timestamp.'>('.$glossary_item->post_title.'[A-Za-z]*?)<\/a'.$timestamp.'>/i'; if (get_option('red_glossaryTooltip') == 1) { $link_replace = '<a class="glossaryLink" href="' . get_permalink($glossary_item) . '" title="Glossary: '. $glossary_title . '" onmouseover="tooltip.show(\'' . addslashes($glossary_item->post_content) . '\');" onmouseout="tooltip.hide();">$1</a>'; } else { $link_replace = '<a class="glossaryLink" href="' . get_permalink($glossary_item) . '" title="Glossary: '. $glossary_title . '">$1</a>'; } if (!in_array($glossary_title,$terms_done)) { $content_temp_before = $content_temp; $content_temp = preg_replace($link_search, $link_replace, $content_temp,1); if ($content_temp_before != $content_temp) $terms_done[] = $glossary_title; $content = $content_temp; } } } } return $content; } PHP: Where: $glossary_title ($glossary_item->post_title) = patterns
I just tried your code in standalone file, and one problem I found is if the you had a word, say, creditors and that word wasn't in the glossary: eg: //$patterns[] = "creditors"; //$links[] = 'http://www.second.co.uk/'; PHP: Creditors word would get replaced with Creditor which is a partial match, and in the glossary: $patterns[] = "creditor"; $links[] = 'http://www.third.co.uk/'; PHP: Of course this is a stupid example as they are both the same word, but just to show an example.
lol. Yeah, that would be the regular expression - I match any character that isn't '<'. So it doesn't match already existing links, but in the case of 'creditors' the letter 's' falls into the category of 'not being <' - so it matches. Change the line to this and it should work ok (untested): foreach ($patterns as $k => $v) $patterns[$k] = '/[^>]('.str_replace(' ', '\s', addslashes($v)).')[^<\w]/i'; PHP: As for intergrating it into Wordpress - I have no idea why that code is going about it the way it is (timestamps?!). I could have a look tomorrow.
Thanks, yeah that seems work I can wait till tomorrow if you don't mind helping. I just can't get my mind around the way the original developer did it, with timestamps and all.
@Deacalion str_replace(' ', '\s', addslashes($v)) PHP: woul'dnt the following be more reliable?: preg_quote($v) PHP:
I installed it and had a little look. It seems when you try to do this within Wordpress you're faced with a few problems . Somewhere along the line Wordpress runs the post content through html_entities, because all single quotes are converted to & # 8 2 1 7 ; (without the spaces) - this is why the regex didn't work. Put this somewhere near the top of the function: $content = str_replace('& # 8 2 1 7 ;', "'", $content); // without the spaces PHP: Couple more changes: $glossary_search = '/[^>]('.preg_replace('/\s+/', '\s', $glossary_title).')[^<\w]/i'; $glossary_replace = ' <a'.$timestamp.'>$1</a'.$timestamp.'> '; PHP: Should almost be working then - you just need to order the glossary items by title length before you run through the loop.
Thanks for that, I have already done the ordering part, longest glossary word to shortest. It doesn't pick up creditors/creditor's as being the same as creditor, which would be ideal. Also am getting some weird things happening with this, it is outputting html code in it, which I think is to do with adding the glossary to sub-words, like: Creditor Petition is getting glossary on Creditor Petition and subword, Creditor. So nested <a> tags. I think that is the reason. ps. Thanks for your efforts and time
Another eg: Voluntary bankruptcy is picking up both in the glossary: voluntary bankruptcy and bankruptcy So nested <a></a> Where ideally it should only be picking up: voluntary bankruptcy
Another example The words: Involuntary bankruptcy is being picked up with and highlighted with 3 three definitions (only should pick up the first): Involuntary bankruptcy voluntary bankruptcy bankruptcy
Actually the problem (or a different problem) is if the word in the glossary description is also in the glossary. Say the description is: The act of putting somebody into bankruptcy The word bankruptcy is in the glossary, so the <a href tooltip gets detected as having a glossary word in it, and then problems occur. I PMd you an example page.
Yeah, it changes the content - then loops over itself and changes the content it's just added. Trickier than it first appeared
I solved that problem, I think, by adding (?=([^"]*"[^"]*")*[^"]*$), (which was in the original code) making: $glossary_search = '/[^>]('.preg_replace('/\s+/', '\s', $glossary_title).')[^<\w](?=([^"]*"[^"]*")*[^"]*$)/i'; PHP: I have no idea what that does, well a lil idea only Still have the problem where creditors, creditor's, isn't picked up by creditor (the version of the word in the glossary).