3 Replies - 1006 Views - Last Post: 16 December 2012 - 10:03 PM Rate Topic: -----

#1 Santas   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 11
  • Joined: 15-December 12

How do I filter through downloaded source for particular information?

Posted 15 December 2012 - 07:07 PM

How do I filter through downloaded source for particular information?




So far I have wrote a program to download a webpage's source, now I would like to search through this data and pick out particular information. In this case I would like whenever information is between <b> and </b> to be added to a list item. My code so far is below:


Public Class NameList
Dim thread As System.Threading.Thread
Dim sourcecode As String


Sub GetSource()
Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("http://www.websiteidliketouse.co.uk")
Dim response As System.Net.HttpWebResponse = request.GetResponse()


Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream( ))


sourcecode = sr.ReadToEnd()


End Sub
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
thread = New System.Threading.Thread(AddressOf GetSource)
thread.Start()
End Sub


Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click


End Sub
End Class

This post has been edited by Atli: 15 December 2012 - 09:16 PM
Reason for edit:: Use [code] tags when posting code.


Is This A Good Question/Topic? 0
  • +

Replies To: How do I filter through downloaded source for particular information?

#2 lar3ry   User is offline

  • Coding Geezer
  • member icon

Reputation: 314
  • View blog
  • Posts: 1,296
  • Joined: 12-September 12

Re: How do I filter through downloaded source for particular information?

Posted 15 December 2012 - 09:53 PM

View PostSantas, on 15 December 2012 - 08:07 PM, said:

How do I filter through downloaded source for particular information?

So far I have wrote a program to download a webpage's source, now I would like to search through this data and pick out particular information. In this case I would like whenever information is between <b> and </b> to be added to a list item.

Well, you could use .Contains to look for the "<b>" and "</b>" strings in the sourcecode variable, but it's a rather painful way to do it, when there is already an easy way to do it with an Htmldocument.

You could use a WebBrowser control, or you can convert your string to an HtmlDocument and work on that.

So, you will need a variable (up there by sourcecode)... Dim htmlsrc As HtmlDocument

And a Function to convert the string to an HtmlDocument (It uses a WebBrowser control, but only briefly, and you never have to see it on your form.

    Public Function HtmlToDoc(ByVal htmlstr As String) As HtmlDocument
        Dim w As WebBrowser = New WebBrowser
        w.DocumentText = htmlstr
        Do
            Application.DoEvents()
        Loop While w.ReadyState <> WebBrowserReadyState.Complete
        HtmlToDoc = w.Document
        w.Dispose()
    End Function


You will have to call it. I've placed it in the Button2.Click handler, followed by the code to extract all the InnerText from <b></b> tags. I leave it to you to add the text to whatever control you want. See the Output window for the Debug.Print results.

    Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
        htmlsrc = HtmlToDoc(sourcecode)
        For Each element As HtmlElement In htmlsrc.GetElementsByTagName("b")
            Debug.Print(element.InnerText)
        Next
    End Sub


Was This Post Helpful? 0
  • +
  • -

#3 Santas   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 11
  • Joined: 15-December 12

Re: How do I filter through downloaded source for particular information?

Posted 16 December 2012 - 09:58 AM

Is this the quickest method of downloading a webpage's source with vb.net?
Dim thread As System.Threading.Thread
Dim sourcecode As String


Sub GetSource()
Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(WebsiteAddress)
Dim response As System.Net.HttpWebResponse = request.GetResponse()


Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())


sourcecode = sr.ReadToEnd()


End Sub
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
thread = New System.Threading.Thread(AddressOf GetSource)
thread.Start()
End Sub

End Class

This post has been edited by modi123_1: 16 December 2012 - 10:01 AM
Reason for edit:: make sure to add the code tags

Was This Post Helpful? 0
  • +
  • -

#4 lar3ry   User is offline

  • Coding Geezer
  • member icon

Reputation: 314
  • View blog
  • Posts: 1,296
  • Joined: 12-September 12

Re: How do I filter through downloaded source for particular information?

Posted 16 December 2012 - 10:03 PM

I've seen this code several times, each time identical, but the interesting thing is that I have never been able to get it running, or at least running properly. I always get an error A first chance exception of type 'System.IO.IOException' occurred in mscorlib.dll. Even though I get the exception, it continues to run and gets the page, but it makes me doubt the elapsed time i get, which is far more than the code below produces. Most of that time may be eaten by handling the exception.

However, I can give you another method that will allow you to test the time difference.

First, above your Class statement, insert Imports System.Diagnostics

Then, just below the declaration for sourcecode, place:
    Dim sourceHtml As HtmlDocument
    Dim sw as New Stopwatch


Add another button, and drop this code in:
    Private Sub Button2_Click(sender As System.Object, e As System.EventArgs) Handles Button2.Click
        sw.Reset()
        sw.Start()
        WebBrowser1.Navigate("http://192.168.100.1/?page=basicStatus")
        Debug.Print(sw.Elapsed.ToString)
    End Sub

    Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
        sourceHtml = WebBrowser1.Document
        sourcecode = WebBrowser1.DocumentText
        Debug.Print(sw.Elapsed.ToString)
    End Sub


You can add the sw.Reset, .Start, and .Elapsed to your other button code to get a comparison.

I used the web page built in to my satellite modem as a target URL, to eliminate network delays. If you have any peripherals with built-in web pages, try those. If not, you might want to try a local HTML file on disk.

One other advantage to this method is that the result is an HtmlDocument (or a String, depending on which line you use in DocumentCompleted), which means that you can relatively easily find elements by Tag Name, and qualify them with attributes, to single out the InnerText of whatever elements you are interested in.

I'd be very interested to know which is faster, and of course, why I get that exception.

This post has been edited by lar3ry: 16 December 2012 - 10:04 PM

Was This Post Helpful? 0
  • +
  • -

Page 1 of 1