preg_replace to strip noise words

grutland Active Member

Messages:: 86

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 71

#1

Hi,

Having some issues with stripping noise words.
I won't give you the full regular expression I'm using but this is a shortened version which includes a few noise words:
/\s(?:a|about|after|all|be|because|in|of|the)\s/i
PHP:
and just replacing with a space so that it can test for the next word.

But I'm getting some strange results, the text I'm testing on is "property located in the middle of nature." and getting this returned... "property located the middle nature."
Any one know why the "the" isn't being stripped?

Also, another problem I face is what if the noise word is the first or last word in a string.
There won't be a space both sides of the word, but I need to test for a space both sides other wise it will start stripping out parts of other words.

Any ideas?

grutland, Apr 14, 2010 IP

Sergey Popov Peon

Messages:: 29

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#2

Interesting issue. Looks like in your example, regular expression first replaces the in word surrounded with spaces, and after that the is treated as having no space at left (thus not matching your search pattern).

I suggest that you use array of search patterns instead of one string. Like this:
  $content='property located in the middle of nature';

  $search = array(
    "/\s(?)\s/i",
    "/\s(:)\s/i",
    "/\s(a)\s/i",
    "/\s(about)\s/i",
    "/\s(after)\s/i",
    "/\s(all)\s/i",
    "/\s(be)\s/i",
    "/\s(because)\s/i",
    "/\s(in)\s/i",
    "/\s(of)\s/i",
    "/\s(the)\s/i"
  );

  echo preg_replace($search," ",$content);
PHP:
I tried and it worked fine, resulting this:
property located middle nature
Code (markup):

Sergey Popov, Apr 14, 2010 IP

Kaimi Peon

Messages:: 60

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#3

Try this:


/(?<=\s)(?:a|about|after|all|be|because|in|of|the)\s/i

PHP:

Kaimi, Apr 14, 2010 IP

grutland Active Member

Messages:: 86

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 71

#4

Hi Kaimi, that works fine.
Don't suppose you'd like to explain why that works?
Also, what happens if the noise word is at the end of a sentence?

Just for the sake of an example, if you had this sentence: "property located in the middle of nature the"
As well as a full stop "."

grutland, Apr 14, 2010 IP

Kaimi Peon

Messages:: 60

Likes Received:: 5

Best Answers:: 0

Trophy Points:: 0

#5

Don't suppose you'd like to explain why that works?
Click to expand...

Read "Positive and Negative Lookbehind" at http://www.regular-expressions.info/lookaround.html

Also, what happens if the noise word is at the end of a sentence?
Click to expand...

Use \b instead of \s
<?
$str = "the property located in the middle of nature the";
echo preg_replace('/(?<=\b)(?:a|about|after|all|be|because|in|of|the)\b/i', '', $str);
?>
PHP:

Kaimi, Apr 14, 2010 IP

MyVodaFone Well-Known Member

Messages:: 1,048

Likes Received:: 42

Best Answers:: 10

Trophy Points:: 195

#6

You could also try running
array_unique
PHP:
on your string before preg_replace, in doing so that will strip out any duplicated words like "the"

MyVodaFone, Apr 14, 2010 IP

grutland Active Member

Messages:: 86

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 71

#7

But would that work with a string?
What Kaimi has suggested is working fine and I've now got the desired affect.

Cheers for the help.

grutland, Apr 14, 2010 IP

nunewnew Peon

Messages:: 38

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#8

t Don 'assume that you' would like to explain why it works? Read "Positive and negative lookahead" in http://www.regular-expressions.info/lookaround.html

Also, what happens if the noise floor at the end of sentences? Use b, but not S
$ Str = \\ "Property in the heart of nature" , 
echo preg_replace ( \\'/(?, \\ '\\' , $ Str ) , 
?] 
Code (markup):
If I have any good idea I will post, thx very much

nunewnew, Apr 29, 2010 IP

Log in or Sign up

preg_replace to strip noise words

grutland Active Member

Sergey Popov Peon

Kaimi Peon

grutland Active Member

Kaimi Peon

MyVodaFone Well-Known Member

grutland Active Member

nunewnew Peon

Useful Searches