Hi, Having some issues with stripping noise words. I won't give you the full regular expression I'm using but this is a shortened version which includes a few noise words: /\s(?:a|about|after|all|be|because|in|of|the)\s/i PHP: and just replacing with a space so that it can test for the next word. But I'm getting some strange results, the text I'm testing on is "property located in the middle of nature." and getting this returned... "property located the middle nature." Any one know why the "the" isn't being stripped? Also, another problem I face is what if the noise word is the first or last word in a string. There won't be a space both sides of the word, but I need to test for a space both sides other wise it will start stripping out parts of other words. Any ideas?
Interesting issue. Looks like in your example, regular expression first replaces the in word surrounded with spaces, and after that the is treated as having no space at left (thus not matching your search pattern). I suggest that you use array of search patterns instead of one string. Like this: $content='property located in the middle of nature'; $search = array( "/\s(?)\s/i", "/\s(:)\s/i", "/\s(a)\s/i", "/\s(about)\s/i", "/\s(after)\s/i", "/\s(all)\s/i", "/\s(be)\s/i", "/\s(because)\s/i", "/\s(in)\s/i", "/\s(of)\s/i", "/\s(the)\s/i" ); echo preg_replace($search," ",$content); PHP: I tried and it worked fine, resulting this: property located middle nature Code (markup):
Hi Kaimi, that works fine. Don't suppose you'd like to explain why that works? Also, what happens if the noise word is at the end of a sentence? Just for the sake of an example, if you had this sentence: "property located in the middle of nature the" As well as a full stop "."
Read "Positive and Negative Lookbehind" at http://www.regular-expressions.info/lookaround.html Use \b instead of \s <? $str = "the property located in the middle of nature the"; echo preg_replace('/(?<=\b)(?:a|about|after|all|be|because|in|of|the)\b/i', '', $str); ?> PHP:
You could also try running array_unique PHP: on your string before preg_replace, in doing so that will strip out any duplicated words like "the"
But would that work with a string? What Kaimi has suggested is working fine and I've now got the desired affect. Cheers for the help.
t Don 'assume that you' would like to explain why it works? Read "Positive and negative lookahead" in http://www.regular-expressions.info/lookaround.html Also, what happens if the noise floor at the end of sentences? Use b, but not S $ Str = \\ "Property in the heart of nature" , echo preg_replace ( \\'/(?, \\ '\\' , $ Str ) , ?] Code (markup): If I have any good idea I will post, thx very much