Hello, I have this script: $string = "The The the Hello Truck Hello The the Fantastic bear"; $pattern = "/\b([\w'-]+)(\s+\\1)+/i"; $replacement = "$1"; print preg_replace($pattern, $replacement, $string); PHP: This print: 'The Hello Truck Hello The Fantastic bear' The problem is that only 'The' removed, Hello still exist.... Any suggestions, please
The hello is not a duplicate string. You can't have a script for this very easily because it requires some human thought to realize that the hello is not desired in more than one place, and then it needs to know which one to remove. I don't even know which one of those you remove they should both go somewhere else. You can tell it to take all hello, or the first hello. Then when you give it 'The Goodbye Truck Goodbye The Fantastic bear' it won't know about goodbye and you have the same problem again. And sometimes it's good to have the same word twice in a sentence. This is not done in a simple regex, it's done with a huge set of rules and code it's a grammar/syntax checker.
If you only want unique words to be saved its not that hard, you have to explode your string loop all explosed check if in a temporary array the value exists, if not add it into a temporary array and add it to a string. something like this (not tested). $string = "The The the Hello Truck Hello The the Fantastic bear"; $tstring = ''; $tarray = ''; foreach (explode(" ", $string) AS $k=>$v) { if (!isset($tarray[$v])) { $tarray[$v] = true; $tstring .= $v . ' '; } } echo trim($tstring); PHP: