Hi all , been doing this thing for days but i cant solve it. This is a webs craping project. I want to extract the Phrase Below using regular expression. It's in C#.net Health Niche blogs <TD class=info width="80%" noWrap><STRONG>Author:</STRONG> Health Niche Blogs | <STRONG>Published:</STRONG> Sep 04, 2009<BR><STRONG>License:</STRONG> FREE | <SPAN style="WHITE-SPACE: normal"><STRONG>O/S:</STRONG> Windows NT/2000/XP/2003/Vista </SPAN></TD> Here's my regex code but i think there's something wrong (?<=<STRONG>Author.*(?=<STRONG>) Thank you
Try this: Author:<\/STRONG> ([^|]+) | <STRONG Code (markup): I tested this regex on http://rubular.com/
I agree. TagSoup is another great one, and I use it a lot. For example the following Haskell code scraps a page and gets all the links that have rar as an extension: [rar | TagOpen "a" atts <- parseTags txt , ("href",rar) <- atts , takeExtension rar == ".rar"] Code (markup):