preg_replace regex help

Lucky Bastard Peon

Messages:: 406

Likes Received:: 10

Best Answers:: 0

Trophy Points:: 0

#1

Can somebody explain the following regexp to me?

/\bCreditor[A-Za-z]*?\b(?=([^"]*"[^"]*")*[^"]*$)/i

I am trying to debug somebody else's code:

$glossary_title = "Creditor"; //just a hardcoded example

$glossary_search = '/\b'.$glossary_title.'[A-Za-z]*?\b(?=([^"]*"[^"]*")*[^"]*$)/i';

$glossary_replace = '<a....>$0</a>';

$content_temp = preg_replace($glossary_search, $glossary_replace, $content);

The problem I am having with the above is it will also match and wrap in <a></a> tags Creditors, Creditor's where it should only match Creditor creditor (strictly).

I'm also not sure if the above regex will work with words with spaces in them, and apostraphies ' which it ideally should.

Any help would be greatly appreciated. Thanks

Lucky Bastard, Jul 16, 2010 IP

Deacalion Peon

Messages:: 438

Likes Received:: 11

Best Answers:: 0

Trophy Points:: 0

#2

Could you give an example of the content you want changed and the same content after you have changed it? (in the way you want it to work)

Deacalion, Jul 16, 2010 IP

Lucky Bastard Peon

Messages:: 406

Likes Received:: 10

Best Answers:: 0

Trophy Points:: 0

#3

Sure, well, it is just a testing sample, so don't read too much into it

yada yada yada yada yada yada creditor yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada Creditor's Petition yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada Creditor. yada yada and this following one won't get highlighted cos it is part of another word creditoryada yada yada yada yada yada yada
Click to expand...

Would become

yada yada yada yada yada yada <a href="somelink">creditor</a> yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada <a href="somelink">Creditor's Petition</a> yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada <a href="somelink">Creditor</a>. yada yada and this following one won't get highlighted cos it is part of another word creditoryada yada yada yada yada yada yada
Click to expand...

It is basically searching a lot of text/html, and if any words are in the glossary they get highlighted (via link).

Case incentive too.

The code I provided is from the wordpress plugin that does this, but doesn't work in some scenarios. words with ' in them etc.

Lucky Bastard, Jul 16, 2010 IP

Lucky Bastard Peon

Messages:: 406

Likes Received:: 10

Best Answers:: 0

Trophy Points:: 0

#4

Add to that scenario:

yada yada yada <a href="already-linked">creditor</a> yada yada yada creditor yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada Creditor's Petition yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada Creditor. yada yada and this following one won't get highlighted cos it is part of another word creditoryada yada yada yada yada yada yada
Click to expand...

yada yada yada <a href="already-linked">creditor</a> yada yada yada <a href="somelink">creditor</a> yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada <a href="somelink">Creditor's Petition</a> yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada <a href="somelink">Creditor</a>. yada yada and this following one won't get highlighted cos it is part of another word creditoryada yada yada yada yada yada yada
Click to expand...

If an instance of a word (which is a word that is in the glossary) is already previously linked then it should be left alone.

Lucky Bastard, Jul 16, 2010 IP

Deacalion Peon

Messages:: 438

Likes Received:: 11

Best Answers:: 0

Trophy Points:: 0

#5

Something like this should work. I've made it so each keyword can have a different link, this makes sense because google only uses the anchortext it finds in the first link on a page. It also gives a little more freedom.
You can add as many more as you want.


<?php
// your content
$content = <<<END
yada yada yada <a href="already-linked">creditor</a> yada yada yada creditor yada yada
yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada
yada yada yada yada yada yada yada yada yada Creditor's Petition yada yada yada yada yada
yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada
yada yada yada yada Creditor. yada yada and this following one won't get highlighted cos
it is part of another word creditors yada yada yada yada yada yada yada
END;

// define each word to be matched with it's corresponding link
// start with the longest words first, ie: 'Creditors Pension', then 'Creditor'
$patterns[] = "Creditor's Petition";
$links[]    = 'http://www.first.co.uk/';

$patterns[] = "creditors";
$links[]    = 'http://www.second.co.uk/';

$patterns[] = "creditor";
$links[]    = 'http://www.third.co.uk/';
// as many words as you want....

// build patterns
foreach ($patterns as $k => $v) $patterns[$k] = '/[^>]('.str_replace(' ', '\s', addslashes($v)).')[^<]/i';
foreach ($links as $k => $v) $links[$k] = ' <a href="'.$v.'">$1</a> '; 

// execute replace
$contentWithLinks = preg_replace($patterns, $links, $content);

// output new content
echo $contentWithLinks;
?>

PHP:

Deacalion, Jul 16, 2010 IP

Lucky Bastard Peon

Messages:: 406

Likes Received:: 10

Best Answers:: 0

Trophy Points:: 0

#6

Wow. Thanks
Now I got to work out how to morph that into the existing code which I can' get my head around

:


function red_glossary_parse_content($content){
	global $terms_done;
	//Run the glossary parser
	
	$glossaryPageID = get_option('red_glossaryID');
	if (((!is_page() && get_option('red_glossaryOnlySingle') == 0) OR
	(!is_page() && get_option('red_glossaryOnlySingle') == 1 && is_single()) OR
	(is_page() && get_option('red_glossaryOnPages') == 1))){
		$glossary_index = get_children(array(
											'post_type'		=> 'glossary',
											'post_status'	=> 'publish',
											));
		usort($glossary_index,'sortByLength');
		 
		if ($glossary_index){
			$timestamp = time();
			foreach($glossary_index as $glossary_item){
				$timestamp++;
				
				$glossary_title = $glossary_item->post_title;
				
				$glossary_search = '/\b'.$glossary_title.'[A-Za-z]*?\b(?=([^"]*"[^"]*")*[^"]*$)/i';
				
				$glossary_replace = '<a'.$timestamp.'>$0</a'.$timestamp.'>';
				$content_temp = preg_replace($glossary_search, $glossary_replace, $content);
				$content_temp = rtrim($content_temp);

					$link_search = '/<a'.$timestamp.'>('.$glossary_item->post_title.'[A-Za-z]*?)<\/a'.$timestamp.'>/i';
					if (get_option('red_glossaryTooltip') == 1) {
						$link_replace = '<a class="glossaryLink" href="' . get_permalink($glossary_item) . '" title="Glossary: '. $glossary_title . '" onmouseover="tooltip.show(\'' . addslashes($glossary_item->post_content) . '\');" onmouseout="tooltip.hide();">$1</a>';
					}
					else {
						$link_replace = '<a class="glossaryLink" href="' . get_permalink($glossary_item) . '" title="Glossary: '. $glossary_title . '">$1</a>';
					}
					
					if (!in_array($glossary_title,$terms_done)) {
						
						$content_temp_before = $content_temp;
						$content_temp = preg_replace($link_search, $link_replace, $content_temp,1);
						if ($content_temp_before != $content_temp) $terms_done[] = $glossary_title;
						$content = $content_temp;
					}
					
					
					
					
			}
		}
	}
	
	return $content;
}

PHP:

Where:
$glossary_title ($glossary_item->post_title) = patterns

Lucky Bastard, Jul 16, 2010 IP

Lucky Bastard Peon

Messages:: 406

Likes Received:: 10

Best Answers:: 0

Trophy Points:: 0

#7

I just tried your code in standalone file, and one problem I found is if the you had a word, say, creditors and that word wasn't in the glossary:
eg:
//$patterns[] = "creditors";
//$links[]    = 'http://www.second.co.uk/';
PHP:
Creditors word would get replaced with Creditor which is a partial match, and in the glossary:
$patterns[] = "creditor";
$links[]    = 'http://www.third.co.uk/';
PHP:
Of course this is a stupid example as they are both the same word, but just to show an example.

Lucky Bastard, Jul 16, 2010 IP

Deacalion Peon

Messages:: 438

Likes Received:: 11

Best Answers:: 0

Trophy Points:: 0

#8

lol. Yeah, that would be the regular expression - I match any character that isn't '<'. So it doesn't match already existing links, but in the case of 'creditors' the letter 's' falls into the category of 'not being <' - so it matches.

Change the line to this and it should work ok (untested):
foreach ($patterns as $k => $v) $patterns[$k] = '/[^>]('.str_replace(' ', '\s', addslashes($v)).')[^<\w]/i';   
PHP:
As for intergrating it into Wordpress - I have no idea why that code is going about it the way it is (timestamps?!). I could have a look tomorrow.

Last edited: Jul 16, 2010

Deacalion, Jul 16, 2010 IP

Lucky Bastard Peon

Messages:: 406

Likes Received:: 10

Best Answers:: 0

Trophy Points:: 0

#9

Thanks, yeah that seems work
I can wait till tomorrow if you don't mind helping. I just can't get my mind around the way the original developer did it, with timestamps and all.

Lucky Bastard, Jul 16, 2010 IP

danx10 Peon

Messages:: 1,179

Likes Received:: 44

Best Answers:: 2

Trophy Points:: 0

#10

@Deacalion
str_replace(' ', '\s', addslashes($v))
PHP:
woul'dnt the following be more reliable?:
preg_quote($v)
PHP:

danx10, Jul 17, 2010 IP

Lucky Bastard Peon

Messages:: 406

Likes Received:: 10

Best Answers:: 0

Trophy Points:: 0

#11

Any update here?

Lucky Bastard, Jul 18, 2010 IP

Deacalion Peon

Messages:: 438

Likes Received:: 11

Best Answers:: 0

Trophy Points:: 0

#12

What plugin is this? so I can take a look.

Deacalion, Jul 18, 2010 IP

Lucky Bastard Peon

Messages:: 406

Likes Received:: 10

Best Answers:: 0

Trophy Points:: 0

#13

Hi

A slightly modified version of the WordPress Plugin: TooltipGlossary

Lucky Bastard, Jul 18, 2010 IP

Deacalion Peon

Messages:: 438

Likes Received:: 11

Best Answers:: 0

Trophy Points:: 0

#14

I installed it and had a little look. It seems when you try to do this within Wordpress you're faced with a few problems .
Somewhere along the line Wordpress runs the post content through html_entities, because all single quotes are converted to & # 8 2 1 7 ; (without the spaces) - this is why the regex didn't work.

Put this somewhere near the top of the function:
    $content = str_replace('& # 8 2 1 7 ;', "'", $content); // without the spaces 
PHP:
Couple more changes:
$glossary_search = '/[^>]('.preg_replace('/\s+/', '\s', $glossary_title).')[^<\w]/i';
$glossary_replace = ' <a'.$timestamp.'>$1</a'.$timestamp.'> ';
PHP:
Should almost be working then - you just need to order the glossary items by title length before you run through the loop.

Deacalion, Jul 18, 2010 IP

Lucky Bastard Peon

Messages:: 406

Likes Received:: 10

Best Answers:: 0

Trophy Points:: 0

#15

Thanks for that, I have already done the ordering part, longest glossary word to shortest.
It doesn't pick up creditors/creditor's as being the same as creditor, which would be ideal.

Also am getting some weird things happening with this, it is outputting html code in it, which I think is to do with adding the glossary to sub-words, like:
Creditor Petition is getting glossary on
Creditor Petition and subword, Creditor. So nested <a> tags.
I think that is the reason.

ps. Thanks for your efforts and time

Last edited: Jul 18, 2010

Lucky Bastard, Jul 18, 2010 IP

Lucky Bastard Peon

Messages:: 406

Likes Received:: 10

Best Answers:: 0

Trophy Points:: 0

#16

Another eg:
Voluntary bankruptcy is picking up both in the glossary:
voluntary bankruptcy
and bankruptcy
So nested <a></a>
Where ideally it should only be picking up:
voluntary bankruptcy

Lucky Bastard, Jul 18, 2010 IP

Lucky Bastard Peon

Messages:: 406

Likes Received:: 10

Best Answers:: 0

Trophy Points:: 0

#17

Another example
The words: Involuntary bankruptcy
is being picked up with and highlighted with 3 three definitions (only should pick up the first):
Involuntary bankruptcy
voluntary bankruptcy
bankruptcy

Lucky Bastard, Jul 18, 2010 IP

Lucky Bastard Peon

Messages:: 406

Likes Received:: 10

Best Answers:: 0

Trophy Points:: 0

#18

Actually the problem (or a different problem) is if the word in the glossary description is also in the glossary. Say the description is:
The act of putting somebody into bankruptcy
The word bankruptcy is in the glossary, so the <a href tooltip gets detected as having a glossary word in it, and then problems occur.
I PMd you an example page.

Lucky Bastard, Jul 18, 2010 IP

Deacalion Peon

Messages:: 438

Likes Received:: 11

Best Answers:: 0

Trophy Points:: 0

#19

Yeah, it changes the content - then loops over itself and changes the content it's just added. Trickier than it first appeared

Deacalion, Jul 18, 2010 IP

Lucky Bastard Peon

Messages:: 406

Likes Received:: 10

Best Answers:: 0

Trophy Points:: 0

#20

I solved that problem, I think, by adding (?=([^"]*"[^"]*")*[^"]*$), (which was in the original code) making:
$glossary_search = '/[^>]('.preg_replace('/\s+/', '\s', $glossary_title).')[^<\w](?=([^"]*"[^"]*")*[^"]*$)/i';
PHP:
I have no idea what that does, well a lil idea only

Still have the problem where creditors, creditor's, isn't picked up by creditor (the version of the word in the glossary).

Lucky Bastard, Jul 18, 2010 IP

Log in or Sign up

preg_replace regex help

Lucky Bastard Peon

Deacalion Peon

Lucky Bastard Peon

Lucky Bastard Peon

Deacalion Peon

Lucky Bastard Peon

Lucky Bastard Peon

Deacalion Peon

Lucky Bastard Peon

danx10 Peon

Lucky Bastard Peon

Deacalion Peon

Lucky Bastard Peon

Deacalion Peon

Lucky Bastard Peon

Lucky Bastard Peon

Lucky Bastard Peon

Lucky Bastard Peon

Deacalion Peon

Lucky Bastard Peon

Useful Searches