Hi So i am using cfhttp to grab content off some websites, what i would like to do is grab the content from the h1 and title tages of the page. I investigated using regex but this wont work because if a site has done this. <h1 class="h1class">content</h1> then a regex would get confused. I have seen plently of php scripts that can do something like this but found nothing in coldfusion. What is the best way for me to go about this, does anyone know of any scripts or sample code? Any help would be most appreciated. thanks
Sounds like you've just got the wrong regex. IIRC you can do it with back references. I can't remember the syntax offhand. Try asking in a regex forum. I'm sure somebody there would know.
H1Value = ReReplace(myTextString, "(.*?)(<h1.*?)(.*?)(</h1>)(.*)", "\3", "ALL") If you need to find multiple... run the regex using ReFind... CFLib.org has ReFindAll available that may help... you can likely write/find a lot better regex then what I put up above... Try RegEx Coach to help in debugging the regex....