I have just created this VB ASP.NET code for handling HTML source code.. It will retrieve HTML elements and attributes of those elements from a remote (or local) webpage. It gets a little bit messy but it should work most of the time (if the HTML is poorly coded it might play up). I have coded an example in the page_load to demonstrate it's uses. Well here is the code: first: Imports System.Net Imports System.IO Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load 'an example of how to use the code: 'outputs actual links, anchor texts and the HREF attribrute from all links on www.google.com Dim i As Integer = 0 Dim google As HTMLDoc Dim alink As HTMLElement google.Source = GetWebPage("http://www.google.com") alink = google.getElementByTagName("a") For i = 0 To alink.count - 1 Response.Write("Actual Link: " & alink.Item(i).outerHTML & "<br>") Response.Write("Anchor Text: " & alink.Item(i).innerHTML & "<br>") Response.Write("HREF: " & alink.Item(i).getAttributeValue("href") & "<br><br>") Next End Sub Function GetWebPage(ByVal strURI As String) As String Dim r As WebResponse r = WebRequest.Create(New Uri(strURI)).GetResponse() Dim sr As New StreamReader(r.GetResponseStream()) Do Until sr.EndOfStream GetWebPage = sr.ReadToEnd Loop r.Close() sr.Close() End Function Structure HTMLDoc Dim Source As String Function getElementByTagName(ByVal tagname As String) As HTMLElement Dim p1, p2, p3, p4, p5 As Integer Dim c As Integer c = 0 p1 = 0 tagname = LCase(tagname) With getElementByTagName Do p1 = InStr(p1 + 1, LCase(Source), "<" & tagname) If p1 = 0 Then Exit Do p2 = InStr(p1, Source, ">") ReDim Preserve .Item(c) If Mid(Source, p2 - 1, 1) = "/" Then .Item(c).innerHTML = "" .Item(c).outerHTML = Mid(Source, p1, (p2 + 1) - p1) Else p3 = InStr(p2, LCase(Source), "</" & tagname) p4 = p3 + Len(tagname) + 3 .Item(c).innerHTML = Mid(Source, p2 + 1, p3 - (p2 + 1)) .Item(c).outerHTML = Mid(Source, p1, p4 - p1) End If c = c + 1 Loop Until p1 = 0 Or p2 = 0 .count = c End With End Function End Structure Structure HTMLElement Dim count As Integer Dim Item() As HTMLElementItem End Structure Structure HTMLElementItem Dim outerHTML As String Dim innerHTML As String Function getAttributeValue(ByVal attr As String) As String Dim p1, p2, p3 As Integer Dim i As Integer = 0 Dim formats(2) As String Dim endchars(3) As String attr = LCase(attr) formats(0) = attr & "=" & Chr(34) formats(1) = attr & "='" formats(2) = attr & "=" endchars(0) = Chr(34) endchars(1) = "'" endchars(2) = Chr(32) endchars(3) = ">" For i = 0 To 2 p1 = InStr(LCase(outerHTML), formats(i)) If p1 > 0 Then p2 = InStr(p1 + Len(formats(i)), outerHTML, endchars(i)) If i = 2 Then p3 = InStr(p1 + Len(formats(i)), outerHTML, endchars(3)) If p3 < p2 And p3 > 0 Or p2 = 0 And p3 > 0 Then p2 = p3 End If End If If p2 > 0 Then getAttributeValue = Mid(outerHTML, p1 + Len(formats(i)), p2 - (p1 + Len(formats(i)))) Exit For End If End If Next End Function End Structure Code (markup): I know it's ugly but it seems to work alright so far and I have been looking for something like this for a while and couldn't find it. So hopefully it is useful to someone else also.
Nice thing to try and play with. looks helpful. thanx. It is simple enough for a conversion to C# to work Such as: http://www.developerfusion.com/tools/convert/vb-to-csharp/ and http://www.dotnetspider.com/convert/Vb-To-Csharp.aspx
Yeah i've implemented it and ended up just writing seperate functions most of the time for different tags. It's a little bit buggy... but the idea is there (work's great for <a> tags though)