1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

preg_replace regex help

Discussion in 'PHP' started by Lucky Bastard, Jul 16, 2010.

  1. #1
    Can somebody explain the following regexp to me?

    /\bCreditor[A-Za-z]*?\b(?=([^"]*"[^"]*")*[^"]*$)/i

    I am trying to debug somebody else's code:

    $glossary_title = "Creditor"; //just a hardcoded example

    $glossary_search = '/\b'.$glossary_title.'[A-Za-z]*?\b(?=([^"]*"[^"]*")*[^"]*$)/i';

    $glossary_replace = '<a....>$0</a>';

    $content_temp = preg_replace($glossary_search, $glossary_replace, $content);

    The problem I am having with the above is it will also match and wrap in <a></a> tags Creditors, Creditor's where it should only match Creditor creditor (strictly).

    I'm also not sure if the above regex will work with words with spaces in them, and apostraphies ' which it ideally should.

    Any help would be greatly appreciated. Thanks
     
    Lucky Bastard, Jul 16, 2010 IP
  2. Deacalion

    Deacalion Peon

    Messages:
    438
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #2
    Could you give an example of the content you want changed and the same content after you have changed it? (in the way you want it to work)
     
    Deacalion, Jul 16, 2010 IP
  3. Lucky Bastard

    Lucky Bastard Peon

    Messages:
    406
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Sure, well, it is just a testing sample, so don't read too much into it :)

    Would become
    It is basically searching a lot of text/html, and if any words are in the glossary they get highlighted (via link).

    Case incentive too.

    The code I provided is from the wordpress plugin that does this, but doesn't work in some scenarios. words with ' in them etc.
     
    Lucky Bastard, Jul 16, 2010 IP
  4. Lucky Bastard

    Lucky Bastard Peon

    Messages:
    406
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #4
    Add to that scenario:
    If an instance of a word (which is a word that is in the glossary) is already previously linked then it should be left alone.
     
    Lucky Bastard, Jul 16, 2010 IP
  5. Deacalion

    Deacalion Peon

    Messages:
    438
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #5
    Something like this should work. I've made it so each keyword can have a different link, this makes sense because google only uses the anchortext it finds in the first link on a page. It also gives a little more freedom.
    You can add as many more as you want.

    
    <?php
    // your content
    $content = <<<END
    yada yada yada <a href="already-linked">creditor</a> yada yada yada creditor yada yada
    yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada
    yada yada yada yada yada yada yada yada yada Creditor's Petition yada yada yada yada yada
    yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada
    yada yada yada yada Creditor. yada yada and this following one won't get highlighted cos
    it is part of another word creditors yada yada yada yada yada yada yada
    END;
    
    // define each word to be matched with it's corresponding link
    // start with the longest words first, ie: 'Creditors Pension', then 'Creditor'
    $patterns[] = "Creditor's Petition";
    $links[]    = 'http://www.first.co.uk/';
    
    $patterns[] = "creditors";
    $links[]    = 'http://www.second.co.uk/';
    
    $patterns[] = "creditor";
    $links[]    = 'http://www.third.co.uk/';
    // as many words as you want....
    
    // build patterns
    foreach ($patterns as $k => $v) $patterns[$k] = '/[^>]('.str_replace(' ', '\s', addslashes($v)).')[^<]/i';
    foreach ($links as $k => $v) $links[$k] = ' <a href="'.$v.'">$1</a> '; 
    
    // execute replace
    $contentWithLinks = preg_replace($patterns, $links, $content);
    
    // output new content
    echo $contentWithLinks;
    ?>
    
    PHP:
     
    Deacalion, Jul 16, 2010 IP
  6. Lucky Bastard

    Lucky Bastard Peon

    Messages:
    406
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Wow. Thanks
    Now I got to work out how to morph that into the existing code which I can' get my head around :):
    
    function red_glossary_parse_content($content){
    	global $terms_done;
    	//Run the glossary parser
    	
    	$glossaryPageID = get_option('red_glossaryID');
    	if (((!is_page() && get_option('red_glossaryOnlySingle') == 0) OR
    	(!is_page() && get_option('red_glossaryOnlySingle') == 1 && is_single()) OR
    	(is_page() && get_option('red_glossaryOnPages') == 1))){
    		$glossary_index = get_children(array(
    											'post_type'		=> 'glossary',
    											'post_status'	=> 'publish',
    											));
    		usort($glossary_index,'sortByLength');
    		 
    		if ($glossary_index){
    			$timestamp = time();
    			foreach($glossary_index as $glossary_item){
    				$timestamp++;
    				
    				$glossary_title = $glossary_item->post_title;
    				
    				$glossary_search = '/\b'.$glossary_title.'[A-Za-z]*?\b(?=([^"]*"[^"]*")*[^"]*$)/i';
    				
    				$glossary_replace = '<a'.$timestamp.'>$0</a'.$timestamp.'>';
    				$content_temp = preg_replace($glossary_search, $glossary_replace, $content);
    				$content_temp = rtrim($content_temp);
    
    					$link_search = '/<a'.$timestamp.'>('.$glossary_item->post_title.'[A-Za-z]*?)<\/a'.$timestamp.'>/i';
    					if (get_option('red_glossaryTooltip') == 1) {
    						$link_replace = '<a class="glossaryLink" href="' . get_permalink($glossary_item) . '" title="Glossary: '. $glossary_title . '" onmouseover="tooltip.show(\'' . addslashes($glossary_item->post_content) . '\');" onmouseout="tooltip.hide();">$1</a>';
    					}
    					else {
    						$link_replace = '<a class="glossaryLink" href="' . get_permalink($glossary_item) . '" title="Glossary: '. $glossary_title . '">$1</a>';
    					}
    					
    					if (!in_array($glossary_title,$terms_done)) {
    						
    						$content_temp_before = $content_temp;
    						$content_temp = preg_replace($link_search, $link_replace, $content_temp,1);
    						if ($content_temp_before != $content_temp) $terms_done[] = $glossary_title;
    						$content = $content_temp;
    					}
    					
    					
    					
    					
    			}
    		}
    	}
    	
    	return $content;
    }
    
    PHP:
    Where:
    $glossary_title ($glossary_item->post_title) = patterns
     
    Lucky Bastard, Jul 16, 2010 IP
  7. Lucky Bastard

    Lucky Bastard Peon

    Messages:
    406
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #7
    I just tried your code in standalone file, and one problem I found is if the you had a word, say, creditors and that word wasn't in the glossary:
    eg:
    
    //$patterns[] = "creditors";
    //$links[]    = 'http://www.second.co.uk/';
    
    PHP:
    Creditors word would get replaced with Creditor which is a partial match, and in the glossary:
    
    $patterns[] = "creditor";
    $links[]    = 'http://www.third.co.uk/';
    
    PHP:
    Of course this is a stupid example as they are both the same word, but just to show an example. :)
     
    Lucky Bastard, Jul 16, 2010 IP
  8. Deacalion

    Deacalion Peon

    Messages:
    438
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #8
    lol. Yeah, that would be the regular expression - I match any character that isn't '<'. So it doesn't match already existing links, but in the case of 'creditors' the letter 's' falls into the category of 'not being <' - so it matches.

    Change the line to this and it should work ok (untested):
    
    foreach ($patterns as $k => $v) $patterns[$k] = '/[^>]('.str_replace(' ', '\s', addslashes($v)).')[^<\w]/i';   
    
    PHP:
    As for intergrating it into Wordpress - I have no idea why that code is going about it the way it is (timestamps?!). I could have a look tomorrow. :)
     
    Last edited: Jul 16, 2010
    Deacalion, Jul 16, 2010 IP
  9. Lucky Bastard

    Lucky Bastard Peon

    Messages:
    406
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Thanks, yeah that seems work :)
    I can wait till tomorrow if you don't mind helping. I just can't get my mind around the way the original developer did it, with timestamps and all.
     
    Lucky Bastard, Jul 16, 2010 IP
  10. danx10

    danx10 Peon

    Messages:
    1,179
    Likes Received:
    44
    Best Answers:
    2
    Trophy Points:
    0
    #10
    @Deacalion

    str_replace(' ', '\s', addslashes($v))
    PHP:
    woul'dnt the following be more reliable?:

    preg_quote($v)
    PHP:
     
    danx10, Jul 17, 2010 IP
  11. Lucky Bastard

    Lucky Bastard Peon

    Messages:
    406
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Any update here?
     
    Lucky Bastard, Jul 18, 2010 IP
  12. Deacalion

    Deacalion Peon

    Messages:
    438
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #12
    What plugin is this? so I can take a look.
     
    Deacalion, Jul 18, 2010 IP
  13. Lucky Bastard

    Lucky Bastard Peon

    Messages:
    406
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Hi

    A slightly modified version of the WordPress Plugin: TooltipGlossary
     
    Lucky Bastard, Jul 18, 2010 IP
  14. Deacalion

    Deacalion Peon

    Messages:
    438
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #14
    I installed it and had a little look. It seems when you try to do this within Wordpress you're faced with a few problems :).
    Somewhere along the line Wordpress runs the post content through html_entities, because all single quotes are converted to & # 8 2 1 7 ; (without the spaces) - this is why the regex didn't work.

    Put this somewhere near the top of the function:
    
        $content = str_replace('& # 8 2 1 7 ;', "'", $content); // without the spaces 
    
    PHP:
    Couple more changes:
    $glossary_search = '/[^>]('.preg_replace('/\s+/', '\s', $glossary_title).')[^<\w]/i';
    $glossary_replace = ' <a'.$timestamp.'>$1</a'.$timestamp.'> ';
    
    PHP:
    Should almost be working then - you just need to order the glossary items by title length before you run through the loop. :)
     
    Deacalion, Jul 18, 2010 IP
  15. Lucky Bastard

    Lucky Bastard Peon

    Messages:
    406
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #15
    Thanks for that, I have already done the ordering part, longest glossary word to shortest.
    It doesn't pick up creditors/creditor's as being the same as creditor, which would be ideal.

    Also am getting some weird things happening with this, it is outputting html code in it, which I think is to do with adding the glossary to sub-words, like:
    Creditor Petition is getting glossary on
    Creditor Petition and subword, Creditor. So nested <a> tags.
    I think that is the reason.

    ps. Thanks for your efforts and time :)
     
    Last edited: Jul 18, 2010
    Lucky Bastard, Jul 18, 2010 IP
  16. Lucky Bastard

    Lucky Bastard Peon

    Messages:
    406
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Another eg:
    Voluntary bankruptcy is picking up both in the glossary:
    voluntary bankruptcy
    and bankruptcy
    So nested <a></a>
    Where ideally it should only be picking up:
    voluntary bankruptcy
     
    Lucky Bastard, Jul 18, 2010 IP
  17. Lucky Bastard

    Lucky Bastard Peon

    Messages:
    406
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #17
    Another example :)
    The words: Involuntary bankruptcy
    is being picked up with and highlighted with 3 three definitions (only should pick up the first):
    Involuntary bankruptcy
    voluntary bankruptcy
    bankruptcy

    :)
     
    Lucky Bastard, Jul 18, 2010 IP
  18. Lucky Bastard

    Lucky Bastard Peon

    Messages:
    406
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #18
    Actually the problem (or a different problem) is if the word in the glossary description is also in the glossary. Say the description is:
    The act of putting somebody into bankruptcy
    The word bankruptcy is in the glossary, so the <a href tooltip gets detected as having a glossary word in it, and then problems occur.
    I PMd you an example page.
     
    Lucky Bastard, Jul 18, 2010 IP
  19. Deacalion

    Deacalion Peon

    Messages:
    438
    Likes Received:
    11
    Best Answers:
    0
    Trophy Points:
    0
    #19
    Yeah, it changes the content - then loops over itself and changes the content it's just added. Trickier than it first appeared :p
     
    Deacalion, Jul 18, 2010 IP
  20. Lucky Bastard

    Lucky Bastard Peon

    Messages:
    406
    Likes Received:
    10
    Best Answers:
    0
    Trophy Points:
    0
    #20
    I solved that problem, I think, by adding (?=([^"]*"[^"]*")*[^"]*$), (which was in the original code) making:
    $glossary_search = '/[^>]('.preg_replace('/\s+/', '\s', $glossary_title).')[^<\w](?=([^"]*"[^"]*")*[^"]*$)/i';
    PHP:
    I have no idea what that does, well a lil idea only :)

    Still have the problem where creditors, creditor's, isn't picked up by creditor (the version of the word in the glossary).
     
    Lucky Bastard, Jul 18, 2010 IP