Not sure how to do this with php. I have a text file that I need to fix. I would like to replace any single <br> in the text file with nothing "". but I want to keep any multiple back to back ones.. for example <br><br> or <br><br><br> that exists. How can I do this? For example.. <br>A girl went to the market <br>to fetch a loaf of bread.<br><br> She was happy that <br>she had enough money to get the bread.<br><br> When walking home from the <br>store she started to sing a song to herself<br><br><br> After getting home she seen that she already<br> had a loaf of bread in her bread box<br><br> She felt a little silly when she found out<br><br> Code (markup): I would do it by hand, but I have 900,000 lines of this stuff. That is the reason why I would like to automate it. I was thinking I could do just a simple search and replace in a text editor. Finding all the <br><br><br> and replacing them with a high ascii char. then search for the <br><br> and doing the same with another high ascii char. Then after that just search/replace all the single <br> left over with nothing. then replace the high ascii char with the <br><br> again and so forth with the other one. How would you go about fixing this problem?
The way I've done this before is maybe a long way, but worked well... pretty much as you describe at the end of your post. use the str_replace like this, $data_line contains the line fo text you want to sort out $data_line = str_replace("<br><br><br><br><br>", "#5BR#", $data_line); $data_line = str_replace("<br><br><br><br>", "#4BR#", $data_line); $data_line = str_replace("<br><br><br>", "#3BR#", $data_line); $data_line = str_replace("<br><br>", "#2BR#", $data_line); the #2BR# etc, is just a token to replace them, needs to be something that will not appear in your text make sure you work from the highest repeat count down to 2 or bad things will happen!! then: $data_line = str_replace("<br>", "", $data_line); to lose the single <br> then $data_line = str_replace("#5BR#", "<br><br><br><br><br>", $data_line); $data_line = str_replace("#4BR#", "<br><br><br><br>", $data_line); $data_line = str_replace("#3BR#", "<br><br><br>", $data_line); $data_line = str_replace("#2BR#", "<br><br>", $data_line); to restore the multiple BRs Hope that helps, Si
$str = "<br>A girl went to the market <br>to fetch a loaf of bread.<br><br> She was happy that <br>she had enough money to get the bread.<br><br> When walking home from the <br>store she started to sing a song to herself<br><br><br> After getting home she seen that she already<br> had a loaf of bread in her bread box<br><br> She felt a little silly when she found out<br><br>"; $patterns = "/[^<br>]<br>[^<br>]/i"; echo preg_replace($patterns, " ", $str); shows: <br>A girl went to the marketo fetch a loaf of bread.<br><br> She was happy thathe had enough money to get the bread.<br><br> When walking home from thetore she started to sing a song to herself<br><br><br> After getting home she seen that she alreadhad a loaf of bread in her bread box<br><br> She felt a little silly when she found out<br><br> The first <br>you could delete by hand.