Background Processing

  • (3 Pages)
  • +
  • 1
  • 2
  • 3

36 Replies - 8183 Views - Last Post: 20 August 2011 - 02:22 AM Rate Topic: -----

#1 kevin_911  Icon User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 127
  • Joined: 02-April 09

Background Processing

Posted 15 August 2011 - 11:39 AM

Guys need a bit of advice on the above!

Basically I have a webcrawler app which uses a web-browser control to navigate to a URL. Once the page is loaded, I use the function below to grab the HTML. Once I obtain the HTML, using HTML Agility Pack I extract the plaintext. Once this is down, with a click of a button I save the url, and plaintext to the database (using sql server).

tb_HTML.Text = WebBrowser1.document.Body.OuterHtml


Now my question is how can I carry the above as a background process, whereby say I have a form and enter all the url I need to extract the plain text from i.e. download the HTML and then using HAP extract the text and then automatically save the result inside the dB.

How is this possible? Any ideas...........

This post has been edited by kevin_911: 15 August 2011 - 11:40 AM


Is This A Good Question/Topic? 0
  • +

Replies To: Background Processing

#2 modi123_1  Icon User is online

  • Suitor #2
  • member icon



Reputation: 8365
  • View blog
  • Posts: 31,081
  • Joined: 12-June 08

Re: Background Processing

Posted 15 August 2011 - 11:50 AM

Sure - background workers, threads, or tasks all present the same solutions. If you are using 4.0 I would suggest using tasks since they easily allow multiple balanced instances occurring and easily for folks to understand.

Basically you would make a class that does your operation you describe (input a url in the constructor, and the 'run' would go out to that url and do what ever you need done including saving it to the db). You would create new instances of the class, attach them to a task, and let the task operate. Pretty straight forward.

http://www.dreaminco...8-tasks-net-40/
Was This Post Helpful? 1
  • +
  • -

#3 kevin_911  Icon User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 127
  • Joined: 02-April 09

Re: Background Processing

Posted 15 August 2011 - 02:11 PM

View Postmodi123_1, on 15 August 2011 - 12:50 PM, said:

Sure - background workers, threads, or tasks all present the same solutions. If you are using 4.0 I would suggest using tasks since they easily allow multiple balanced instances occurring and easily for folks to understand.

Basically you would make a class that does your operation you describe (input a url in the constructor, and the 'run' would go out to that url and do what ever you need done including saving it to the db). You would create new instances of the class, attach them to a task, and let the task operate. Pretty straight forward.

http://www.dreaminco...8-tasks-net-40/


Thanks modi123_1 for your help! Could you give me a small demonstration of what you mean in the second paragraph please.

The class below does as I mentioned in my first post......

Inherits System.Windows.Forms.Form
 
    Public cs As New SqlConnection("Data Source=Connection;Initial Catalog=Crawl;Integrated Security=True")
    Public ds As New DataSet
    Public ds2 As New DataSet

    Public Shared Function HtmlToText(ByVal html As String) As String
       
    End Function

    Private Shared Sub ConvertToText(ByVal node As HtmlNode, ByVal sb As StringBuilder)
        
    End Sub

    Private Shared Sub ConvertChildren(ByVal parent As HtmlNode, ByVal sb As StringBuilder)
        
    End Sub

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        WebBrowser1.Navigate(tb_URL.Text)
    End Sub

    Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
        tb_Parse.Text = HtmlToText(tb_HTML.Text)
    End Sub

    Private Sub WebBrowser1_DocumentCompleted(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
        tb_HTML.Text = WebBrowser1.document.Body.OuterHtml
    End Sub

    Private Sub WebBrowser1_Navigated(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserNavigatedEventArgs) Handles WebBrowser1.Navigated
        tb_URL.Text = WebBrowser1.Url.ToString
    End Sub

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        WebBrowser1.Navigate(My.Settings.homepage)
      
    End Sub

    Private Sub btnSave_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnSave.Click

        Dim da As New SqlDataAdapter("SELECT * FROM search_result", cs)
        Dim dsNewRow As DataRow
        Dim cmb As SqlCommandBuilder = New SqlCommandBuilder(da)
        cmb.GetUpdateCommand()
        Dim dateToDisplay As DateTime = DateTime.Now

        da.Fill(ds, "search_result")
        dsNewRow = ds.Tables("search_result").NewRow
        dsNewRow.Item("URL") = tb_URL.Text
        dsNewRow.Item("page_content") = tb_Parse.Text
        dsNewRow.Item("date_created") = dateToDisplay.ToString() 
        ds.Tables("search_result").Rows.Add(dsNewRow)
        da.UpdateCommand = cmb.GetUpdateCommand
        da.Update(ds, "search_result")
        
        cs.Close()
    End Sub    
End Class


This post has been edited by kevin_911: 15 August 2011 - 02:37 PM

Was This Post Helpful? 0
  • +
  • -

#4 modi123_1  Icon User is online

  • Suitor #2
  • member icon



Reputation: 8365
  • View blog
  • Posts: 31,081
  • Joined: 12-June 08

Re: Background Processing

Posted 15 August 2011 - 02:20 PM

Read up over at MSDN for a clear example:
http://msdn.microsof...y/dd537609.aspx
Was This Post Helpful? 1
  • +
  • -

#5 kevin_911  Icon User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 127
  • Joined: 02-April 09

Re: Background Processing

Posted 15 August 2011 - 02:40 PM

View Postmodi123_1, on 15 August 2011 - 03:20 PM, said:

Read up over at MSDN for a clear example:
http://msdn.microsof...y/dd537609.aspx


That is a pretty informative link. I am gonna look at it in detail.
I know you mentioned in your previous posts, but could you tell me a
bit more as to how you would lay out the tasks? As what I am looking for is enter the URL in a textbox and then start the background search with a click of a button.

This post has been edited by kevin_911: 15 August 2011 - 02:41 PM

Was This Post Helpful? 0
  • +
  • -

#6 modi123_1  Icon User is online

  • Suitor #2
  • member icon



Reputation: 8365
  • View blog
  • Posts: 31,081
  • Joined: 12-June 08

Re: Background Processing

Posted 15 August 2011 - 02:55 PM

The tasks just are small classes you create for repetitive execution (well that's at least how that last link approaches them). Make a class that has a constructor that takes in a string (your url). It saves that url to a private variable in the class. Then what ever method you use start the processing (copied - in from where ever it is now - into your class) chugs away on that string.

'--- --- --- --- 
'--- main --- --- 
'--- --- --- --- 

    Private colTask As New List(Of Task)  '-- in case you want to be able to reference you tasks as the go
    Dim tasktest As foo = nothing '-- class instance you'll create a few times.

Dim urls() As String = {"urlA", "urlB", "urlC", "urlD"} '-- collection of urls
        For Each temp As String In urls'-- loop through collection
            tasktest = New foo(temp) '-- create a new instance of the class with each url
            colTask.Add(Task.Factory.StartNew(AddressOf tasktest.Start)) '-- start the processing of the task on it's own thread and 
'-- also add it to the collection
        Next

'-- do what ever to determine when the tasks are done so you can fold the data back into the main thread... 
'-- this is where the list of tasks become handy since you can still get a handle to the each thread's instance.



'-- -----------------------------

public class foo
private _sVal as string = string.empty

public sub new (byval url as string)
_sVal = url
end sub

public sub Start()
'-- processing code here
end sub


This post has been edited by modi123_1: 15 August 2011 - 02:55 PM

Was This Post Helpful? 2
  • +
  • -

#7 kevin_911  Icon User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 127
  • Joined: 02-April 09

Re: Background Processing

Posted 16 August 2011 - 07:57 AM

View Postmodi123_1, on 15 August 2011 - 03:55 PM, said:

The tasks just are small classes you create for repetitive execution (well that's at least how that last link approaches them). Make a class that has a constructor that takes in a string (your url). It saves that url to a private variable in the class. Then what ever method you use start the processing (copied - in from where ever it is now - into your class) chugs away on that string.


Thanks modi123_1 for your reply!
I choose to use the backgroundworker component as it seems easy enough to work with as I am still
a beginner :sweatdrop:

I created a test suit using the backgroundworker (referring to a tutorial on here) shown below. It seems to work well and store the data inside the database.

Anyhow, using the same technique how can I take request from say 4 textboxes and perform the tasks shown in the code and store in the database?

Please advice

Public Class Form1

    Dim m_count As Integer = 0
    Dim sourcecode As String
    Public cs As New SqlConnection("Data Source=KJSINGH-PC;Initial Catalog=ForumCrawl;Integrated Security=True")
    Public ds As New DataSet

    Private Sub btn_Start_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btn_Start.Click
        m_count = 30
        Me.btn_Start.Enabled = False
        Me.btn_stp.Enabled = True
        BackgroundWorker1.RunWorkerAsync()
    End Sub

    Private Sub btn_stp_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btn_stp.Click
        ' Is the Background Worker do some work?
        If BackgroundWorker1.IsBusy Then
            'If it supports cancellation, Cancel It
            If BackgroundWorker1.WorkerSupportsCancellation Then
                ' Tell the Background Worker to stop working.
                BackgroundWorker1.CancelAsync()
            End If
        End If
        ' Enable to Start Button
        Me.btn_Start.Enabled = True
        ' Disable to Stop Button
        Me.btn_stp.Enabled = False
    End Sub

    ' The delegate
    Delegate Sub SetLabelText_Delegate(ByVal [Label] As Label, ByVal [text] As String)

    ' The delegates subroutine.
    Private Sub SetLabelText_ThreadSafe(ByVal [Label] As Label, ByVal [text] As String)
        ' InvokeRequired required compares the thread ID of the calling thread to the thread ID of the creating thread.
        ' If these threads are different, it returns true.
        If [Label].InvokeRequired Then
            Dim MyDelegate As New SetLabelText_Delegate(AddressOf SetLabelText_ThreadSafe)
            Me.Invoke(MyDelegate, New Object() {[Label], [text]})
        Else
            [Label].Text = [text]
        End If
    End Sub

    Public Shared Function HtmlToText(ByVal html As String) As String
    
    End Function

    Private Shared Sub ConvertToText(ByVal node As HtmlNode, ByVal sb As StringBuilder)
       
    End Sub

    Private Shared Sub ConvertChildren(ByVal parent As HtmlNode, ByVal sb As StringBuilder)
        For Each node As HtmlNode In parent.ChildNodes
            ConvertToText(node, sb)
        Next
    End Sub


    Private Sub BackgroundWorker1_DoWork(ByVal sender As System.Object, _
                                         ByVal e As System.ComponentModel.DoWorkEventArgs) Handles BackgroundWorker1.DoWork
      
        Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(TextBox1.Text)
        Dim response As System.Net.HttpWebResponse = request.GetResponse()

        Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())

        sourcecode = sr.ReadToEnd()


    End Sub

    Private Sub BackgroundWorker1_ProgressChanged(ByVal sender As System.Object, ByVal e As System.ComponentModel.ProgressChangedEventArgs) Handles BackgroundWorker1.ProgressChanged
        ' Update the progress bar
        Me.ProgressBar1.Value = e.ProgressPercentage

    End Sub


    Private Sub BackgroundWorker1_RunWorkerCompleted(ByVal sender As System.Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles BackgroundWorker1.RunWorkerCompleted
        If e.Cancelled Then
            Me.Lbl_Status.Text = "Cancelled"
        Else
            'TextBox2.Text = sourcecode
            Dim da As New SqlDataAdapter("SELECT * FROM test", cs)
            Dim dsNewRow As DataRow
            Dim cmb As SqlCommandBuilder = New SqlCommandBuilder(da)
            cmb.GetUpdateCommand()
            ' Dim dateToDisplay As DateTime = DateTime.Now
            'Dim format As String = "MMM ddd d HH:mm yyyy"
            'Dim date1 As Date

            da.Fill(ds, "test")
            dsNewRow = ds.Tables("test").NewRow
            dsNewRow.Item("URL") = TextBox1.Text()
            dsNewRow.Item("page_content") = HtmlToText(sourcecode)
            ds.Tables("test").Rows.Add(dsNewRow)
            da.UpdateCommand = cmb.GetUpdateCommand
            da.Update(ds, "test")
            Me.Lbl_Status.Text = "Completed"
        End If
    End Sub
End Class

Was This Post Helpful? 0
  • +
  • -

#8 modi123_1  Icon User is online

  • Suitor #2
  • member icon



Reputation: 8365
  • View blog
  • Posts: 31,081
  • Joined: 12-June 08

Re: Background Processing

Posted 16 August 2011 - 08:09 AM

Wouldn't you just declare more background worker objects and turn them loose?
Was This Post Helpful? 0
  • +
  • -

#9 kevin_911  Icon User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 127
  • Joined: 02-April 09

Re: Background Processing

Posted 16 August 2011 - 08:26 AM

View Postmodi123_1, on 16 August 2011 - 09:09 AM, said:

Wouldn't you just declare more background worker objects and turn them loose?


Hmm how stupid of me :D

Lastly would you recommend using backgroundworker method or tasks(or any other)?

Edit: Also is there a certain limit on using backgroundworkers?

This post has been edited by kevin_911: 16 August 2011 - 08:28 AM

Was This Post Helpful? 0
  • +
  • -

#10 modi123_1  Icon User is online

  • Suitor #2
  • member icon



Reputation: 8365
  • View blog
  • Posts: 31,081
  • Joined: 12-June 08

Re: Background Processing

Posted 16 August 2011 - 08:37 AM

Personally I am a fan of tasks. The load balancing and more fine grained control baked into tasks is a game decider for me.
Was This Post Helpful? 0
  • +
  • -

#11 CodingSup3rnatur@l-360  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 982
  • View blog
  • Posts: 969
  • Joined: 30-September 10

Re: Background Processing

Posted 16 August 2011 - 08:40 AM

I personally would recommend tasks, particularly as you are talking about using multiple background workers concurrently. Tasks are the way asynchronous programming is going in .NET. They offer greater control, and a more explicit demonstration of your intentions when performing multiple operations concurrently.

You don't always need the level of extra control tasks provide though, but be aware that tasks offer certain advantages (as described in modi123_1's link) when you need them, and just make an informed decision depending on those advantages, your given coding situation and what you feel comfortable with, as there is nothing at all inherently wrong with the BackgroundWorker class :)

You should still make sure you learn tasks though ;)


The number of background workers you can have is really only limited by the number of threads available in the thread pool (a limitation that will also affect Tasks). If no more threads are available, no more background workers can be made (EDIT: Rereading your post, I don't think that was what you meant actually, but anyway... :oops:)

This post has been edited by CodingSup3rnatur@l-360: 16 August 2011 - 08:48 AM

Was This Post Helpful? 0
  • +
  • -

#12 kevin_911  Icon User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 127
  • Joined: 02-April 09

Re: Background Processing

Posted 16 August 2011 - 08:43 AM

View PostCodingSup3rnatur@l-360, on 16 August 2011 - 09:40 AM, said:

I personally would recommend tasks, particularly as you are talking about using multiple background workers concurrently. Tasks are the way asynchronous programming is going in .NET. They offer greater control.

The number of background workers you can have is really only limited by the number of threads available in the thread pool (a limitation that will also affect Tasks). If no more threads are available, no more background workers can be made :)


Hmm so tasks it is! Its just that I cant get my head around the tasks even though modi has given me a example. The confusion is mostly around how I collect urls from say four textbox and then do the rest?!?

I will have another go at tasks :)
Was This Post Helpful? 0
  • +
  • -

#13 modi123_1  Icon User is online

  • Suitor #2
  • member icon



Reputation: 8365
  • View blog
  • Posts: 31,081
  • Joined: 12-June 08

Re: Background Processing

Posted 16 August 2011 - 08:48 AM

Okay.. your main program creates the tasks, right? It takes the text boxes and shoves that data into the task class you created. In my example - each time a task is created the constructor requires a string (read: your url). Just manually create four tasks that all input a different URL.
Was This Post Helpful? 0
  • +
  • -

#14 kevin_911  Icon User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 127
  • Joined: 02-April 09

Re: Background Processing

Posted 16 August 2011 - 03:00 PM

View Postmodi123_1, on 16 August 2011 - 09:48 AM, said:

Okay.. your main program creates the tasks, right? It takes the text boxes and shoves that data into the task class you created. In my example - each time a task is created the constructor requires a string (read: your url). Just manually create four tasks that all input a different URL.


Sorry for the late reply!
Okay going step by step :)

Quote

Just manually create four tasks that all input a different URL.


Is this correct? Am I doing it write? :sweatdrop:

   Sub main()
        Dim myTask0 As New Task(Sub()
                                    Dim url As String
                                    url = TextBox1.Text
                                End Sub)
        myTask0.Start()

        Dim myTask1 As New Task(Sub()
                                    Dim url As String
                                    url = TextBox2.Text
                                End Sub)
    End Sub


This post has been edited by kevin_911: 16 August 2011 - 03:02 PM

Was This Post Helpful? 0
  • +
  • -

#15 CodingSup3rnatur@l-360  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 982
  • View blog
  • Posts: 969
  • Joined: 30-September 10

Re: Background Processing

Posted 16 August 2011 - 03:13 PM

Your accessing the textboxes from within the task (and therefore from a non GUI thread). That will cause problems (it is very likely it will cause a cross thread communication exception to be thrown when run).

You need to have another look at post number 6 :)

This post has been edited by CodingSup3rnatur@l-360: 16 August 2011 - 03:13 PM

Was This Post Helpful? 0
  • +
  • -

  • (3 Pages)
  • +
  • 1
  • 2
  • 3