Faster way to execute my loop

  • (4 Pages)
  • +
  • 1
  • 2
  • 3
  • 4

46 Replies - 1548 Views - Last Post: 28 June 2013 - 10:32 AM Rate Topic: -----

#16 dbasnett  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 109
  • View blog
  • Posts: 603
  • Joined: 01-October 08

Re: Faster way to execute my loop

Posted 25 June 2013 - 08:56 AM

I posted a link that shows how to read the file as bytes, skipping the conversion, in the post with the test. The example reads two files as bytes.
Was This Post Helpful? 1
  • +
  • -

#17 C.Andrews  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 14
  • View blog
  • Posts: 169
  • Joined: 18-October 12

Re: Faster way to execute my loop

Posted 25 June 2013 - 09:08 AM

View Postdbasnett, on 25 June 2013 - 03:56 PM, said:

I posted a link that shows how to read the file as bytes, skipping the conversion, in the post with the test. The example reads two files as bytes.



See? I knew the problem was me! I missed that link earlier. I'll integrate that technique and give it a shot. Thanks for the pointer.
Was This Post Helpful? 0
  • +
  • -

#18 lar3ry  Icon User is offline

  • Coding Geezer
  • member icon

Reputation: 310
  • View blog
  • Posts: 1,290
  • Joined: 12-September 12

Re: Faster way to execute my loop

Posted 25 June 2013 - 11:04 PM

View PostC.Andrews, on 24 June 2013 - 03:01 PM, said:

Here is a test file I'm using with just 50 records in it instead of the full 5,151. The first set of numbers is the animal ID, the second set is the genome. Enjoy!

I did some testing with your supplied text file. Here's the code I used. I think I did the comparisons as you want, every line compared with every other line. I read the file in as bytes, placing them in a List(Of Byte), and compared using indices 10 to end of Byte array. There are three superfluous bytes at the end of each string (CR, LF, 0), but none of these will create a 2,0 or 0,2 detection. I put the Debug.Print() statements in there so I could send you the results. They will add a little to the time, but I wanted you to be able to check the results of each line and the total results, to ensure I understand and have coded the problem correctly, as well as to show you the cumulative time taken.

First the code:

Option Strict On
Imports System.IO

Public Class Form1
    Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
        Dim count As Integer = 0
        Dim total As Integer
        Dim swatch As New Stopwatch
        Dim animal As New List(Of Byte())

        Dim fs As FileStream = File.OpenRead("smalltestdata.txt")
        For i = 0 To 49
            Dim tst(33382) As Byte
            fs.Read(tst, 0, 33382)
            animal.Add(tst)
        Next
        swatch.Start()

        For x = 0 To 48
            For y = x + 1 To 49
                For i = 10 To animal(0).Count - 1
                    If (animal(x)(i) Xor animal(y)(i)) = 2 Then
                        count += 1
                    End If
                Next
                Debug.Print(x.ToString & " " & y.ToString & " " & count.ToString & " " & swatch.Elapsed.ToString)
                total += count
                count = 0
            Next
        Next
        Debug.Print(CStr(total) & " " & swatch.Elapsed.ToString)
    End Sub
End Class



And the results:
Attached File  SmallTestDataResults.txt (34.22K)
Number of downloads: 20
Was This Post Helpful? 1
  • +
  • -

#19 C.Andrews  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 14
  • View blog
  • Posts: 169
  • Joined: 18-October 12

Re: Faster way to execute my loop

Posted 26 June 2013 - 06:49 AM

View Postlar3ry, on 26 June 2013 - 06:04 AM, said:

I did some testing with your supplied text file. Here's the code I used. I think I did the comparisons as you want, every line compared with every other line. I read the file in as bytes, placing them in a List(Of Byte), and compared using indices 10 to end of Byte array. There are three superfluous bytes at the end of each string (CR, LF, 0), but none of these will create a 2,0 or 0,2 detection. I put the Debug.Print() statements in there so I could send you the results. They will add a little to the time, but I wanted you to be able to check the results of each line and the total results, to ensure I understand and have coded the problem correctly, as well as to show you the cumulative time taken.


I looked at your results file and you did get the correct answer for all of your comparisons. I'm right in the middle of another project right this second, but when I'm done I'll run your code against mine and see how the speed of execution compares.

Thanks for looking in to this, I do appreciate it.
Was This Post Helpful? 0
  • +
  • -

#20 C.Andrews  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 14
  • View blog
  • Posts: 169
  • Joined: 18-October 12

Re: Faster way to execute my loop

Posted 26 June 2013 - 07:59 AM

Interestingly enough, I'm getting slower results using your byte-to-byte comparison method than I get comparing character literals. Here are the results I got for the comparisons using both methods:

        LoopTimer.Start()
        For i As Integer = 0 To AnimalCount - 2
            For j = (i + 1) To AnimalCount - 1
                For k As Integer = 0 To GeneCount - 1
                    If (aryGenomeBytes(i)(k) Xor aryGenomeBytes(j)(k)) = 2 Then intDiffCount += 1
                Next
                aryResults.Add(intDiffCount.ToString())
                intDiffCount = 0
            Next
        Next
        LoopTimer.Stop()




Byte-to-Byte Results:
Segment 01 calculation time: 00:00:00.019
Segment 02 calculation time: 00:00:00.019
Segment 03 calculation time: 00:00:00.016
Segment 04 calculation time: 00:00:00.016
Segment 05 calculation time: 00:00:00.019
Segment 06 calculation time: 00:00:00.019
Segment 07 calculation time: 00:00:00.018
Segment 08 calculation time: 00:00:00.015

 LoopTimer.Start()
        For i As Integer = 0 To AnimalCount - 2
            strCurrent = aryGenome(i)
            For j = (i + 1) To AnimalCount - 1
                strContrast = aryGenome(j)
                For k As Integer = 0 To GeneCount - 1
                    If (strCurrent(k) = "2"c And strContrast(k) = "0"c) Or (strCurrent(k) = "0"c And strContrast(k) = "2"c) Then intDiffCount += 1
                Next
                aryResults.Add(intDiffCount.ToString())
                intDiffCount = 0
            Next
        Next



Character Literal Results:
Segment 01 calculation time: 00:00:00.014
Segment 02 calculation time: 00:00:00.015
Segment 03 calculation time: 00:00:00.014
Segment 04 calculation time: 00:00:00.014
Segment 05 calculation time: 00:00:00.015
Segment 06 calculation time: 00:00:00.015
Segment 07 calculation time: 00:00:00.014
Segment 08 calculation time: 00:00:00.014

The reason you're seeing 8 segments is that I'm splitting the 33,370 character long genome into 8 chunks of approximately equal length and counting the differences in each segment in parallel. The overall completion time for all 8 segments = the duration of the longest segment in the run.

Maybe the drop in performance is due to your If statement looking up the value of aryGenomeBytes(i)(k) every iteration instead of using a variable value previously assigned the way my loop does? That's all I can think of.

Edit: Also tried this...

Dim GeneCount As Integer = aryGenome(0).Length
        LoopTimer.Start()
        For i As Integer = 0 To AnimalCount - 2
            strCurrent = aryGenomeBytes(i)
            For j = (i + 1) To AnimalCount - 1
                strContrast = aryGenomeBytes(j)
                For k As Integer = 0 To GeneCount - 1
                    If (strCurrent(k) = two AndAlso strContrast(k) = zero) OrElse (strCurrent(k) = zero AndAlso strContrast(k) = two) Then intDiffCount += 1
                Next
                aryResults.Add(intDiffCount)
                intDiffCount = 0
            Next
        Next
        LoopTimer.Stop()



Bytes with/ var assignment results:
Segment 01 calculation time: 00:00:00.018
Segment 02 calculation time: 00:00:00.018
Segment 03 calculation time: 00:00:00.019
Segment 04 calculation time: 00:00:00.019
Segment 05 calculation time: 00:00:00.018
Segment 06 calculation time: 00:00:00.018
Segment 07 calculation time: 00:00:00.019
Segment 08 calculation time: 00:00:00.018

This post has been edited by C.Andrews: 26 June 2013 - 08:33 AM

Was This Post Helpful? 0
  • +
  • -

#21 dbasnett  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 109
  • View blog
  • Posts: 603
  • Joined: 01-October 08

Re: Faster way to execute my loop

Posted 26 June 2013 - 10:46 AM

Very confusing... First, there is no reason that I can think of that a byte compare would be slower than a a string compare. When I look at the code I see

strCurrent = aryGenome(i)
and later on
strContrast = aryGenome(j)
and the further along

strCurrent(k) And strContrast(k)

which makes me wonder how all of these are dim'ed. Do you have Option Strict On?

I haven't understood what you are trying to do from the beginning, maybe a short example that doesn't assume that I know genomes would help.

This post has been edited by dbasnett: 26 June 2013 - 10:49 AM

Was This Post Helpful? 0
  • +
  • -

#22 lar3ry  Icon User is offline

  • Coding Geezer
  • member icon

Reputation: 310
  • View blog
  • Posts: 1,290
  • Joined: 12-September 12

Re: Faster way to execute my loop

Posted 26 June 2013 - 11:53 AM

View Postdbasnett, on 26 June 2013 - 11:46 AM, said:

I haven't understood what you are trying to do from the beginning, maybe a short example that doesn't assume that I know genomes would help.

Using a small set of sample data...
'   Animal_1  000111222
'   Animal_2  012012012
'   Animal_3  210210210

'   Compare   With        Results(1=match) count
'   ----------------------------------------------
'   Amimal_1  Animal_2    001000100        2
'   Animal_2  Animal_3    101101101        6
'   Animal_3  Animal_4 .... and so on, up to end of array with both indices.


Was This Post Helpful? 0
  • +
  • -

#23 C.Andrews  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 14
  • View blog
  • Posts: 169
  • Joined: 18-October 12

Re: Faster way to execute my loop

Posted 26 June 2013 - 12:19 PM

View Postdbasnett, on 26 June 2013 - 05:46 PM, said:

Very confusing... First, there is no reason that I can think of that a byte compare would be slower than a a string compare. When I look at the code I see

strCurrent = aryGenome(i)
and later on
strContrast = aryGenome(j)
and the further along

strCurrent(k) And strContrast(k)

which makes me wonder how all of these are dim'ed. Do you have Option Strict On?

I haven't understood what you are trying to do from the beginning, maybe a short example that doesn't assume that I know genomes would help.


I'm not at work right this moment so I don't have my code handy; I'll post my whole subroutine when I get back. To answer your questions:

Option Strict is on
strCurrent & strContrast are declared as Byte(), and get their values from an List(Of Byte()) generated when the file is read in.
Was This Post Helpful? 0
  • +
  • -

#24 dbasnett  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 109
  • View blog
  • Posts: 603
  • Joined: 01-October 08

Re: Faster way to execute my loop

Posted 26 June 2013 - 01:49 PM

View Postlar3ry, on 26 June 2013 - 01:53 PM, said:

View Postdbasnett, on 26 June 2013 - 11:46 AM, said:

I haven't understood what you are trying to do from the beginning, maybe a short example that doesn't assume that I know genomes would help.

Using a small set of sample data...
'   Animal_1  000111222
'   Animal_2  012012012
'   Animal_3  210210210

'   Compare   With        Results(1=match) count
'   ----------------------------------------------
'   Amimal_1  Animal_2    001000100        2
'   Animal_2  Animal_3    101101101        6
'   Animal_3  Animal_4 .... and so on, up to end of array with both indices.



I think I get it. In the sample data you posted were there two animals?
Was This Post Helpful? 0
  • +
  • -

#25 lar3ry  Icon User is offline

  • Coding Geezer
  • member icon

Reputation: 310
  • View blog
  • Posts: 1,290
  • Joined: 12-September 12

Re: Faster way to execute my loop

Posted 26 June 2013 - 03:53 PM

View Postdbasnett, on 26 June 2013 - 02:49 PM, said:

View Postlar3ry, on 26 June 2013 - 01:53 PM, said:

View Postdbasnett, on 26 June 2013 - 11:46 AM, said:

I haven't understood what you are trying to do from the beginning, maybe a short example that doesn't assume that I know genomes would help.

Using a small set of sample data...
'   Animal_1  000111222
'   Animal_2  012012012
'   Animal_3  210210210

'   Compare   With        Results(1=match) count
'   ----------------------------------------------
'   Amimal_1  Animal_2    001000100        2
'   Animal_2  Animal_3    101101101        6
'   Animal_3  Animal_4 .... and so on, up to end of array with both indices.



I think I get it. In the sample data you posted were there two animals?

Nope.. in this sample quoted above, there are three animals, The compare needs to compare all animals with all other animals.

The logic of the loop is in my last code posting, which was working on 50 animals...

       For x = 0 To 48
           For y = x + 1 To 49
               For i = 10 To animal(0).Count - 1
                   If (animal(x)(i) Xor animal(y)(i)) = 2 Then
                       count += 1
                   End If
               Next
           Next
        Next


animal(x) is compared with animal(y), and since the outer loop traverses from the first to the second-last animal, and the inner loop traverses animal (x+1 to the last animal, all are compared to each other. Apparently, my algorithm works, since I got the right answers. It'll be interesting to see how it fares against the direct compare methods, especially in multi-threading mode.

I'd try multi-threading, but have never used it in VB.Net, and am not sure how to do it.
Was This Post Helpful? 0
  • +
  • -

#26 AdamSpeight2008  Icon User is offline

  • MrCupOfT
  • member icon


Reputation: 2257
  • View blog
  • Posts: 9,445
  • Joined: 29-May 08

Re: Faster way to execute my loop

Posted 26 June 2013 - 04:35 PM

Have a try of this one. Note ordering of the result may change between runs.

Imports System.Threading
Module Module1

    Sub Main()

  Dim animals ={"01211230002",
                "01201011120",
                "01202320000"}


  Parallel.For(1,animals.Length, 
               Sub(x)
                 Parallel.For(0,x,
                              Sub(y)
                                Console.WriteLine("{0}v{1} = {2} ",x,y,animals(x).AsParallel.Zip(animals(y).AsParallel, Function(a,B)/> (a="0" andalso b="2") orelse (a="2" andalso b="0")).Count(Function(c) c=True ))
                              End Sub)
               End Sub)
 End Sub

  

End Module



Was This Post Helpful? 0
  • +
  • -

#27 C.Andrews  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 14
  • View blog
  • Posts: 169
  • Joined: 18-October 12

Re: Faster way to execute my loop

Posted 26 June 2013 - 06:17 PM

View PostAdamSpeight2008, on 26 June 2013 - 11:35 PM, said:

Have a try of this one. Note ordering of the result may change between runs.

Imports System.Threading
Module Module1

    Sub Main()

  Dim animals ={"01211230002",
                "01201011120",
                "01202320000"}


  Parallel.For(1,animals.Length, 
               Sub(x)
                 Parallel.For(0,x,
                              Sub(y)
                                Console.WriteLine("{0}v{1} = {2} ",x,y,animals(x).AsParallel.Zip(animals(y).AsParallel, Function(a,B)/>/> (a="0" andalso b="2") orelse (a="2" andalso b="0")).Count(Function(c) c=True ))
                              End Sub)
               End Sub)
 End Sub

  

End Module




I tried almost exactly that same thing before and the performance is terrible. I tried parallelizing the inner loops, outer loop, and every combination thereof but the performance was always really poor so I gave up on it. Tomorrow morning I'll post my entire project in a .zip and you can all laugh hysterically at the parallel processing kludge that I'm using.
Was This Post Helpful? 0
  • +
  • -

#28 lar3ry  Icon User is offline

  • Coding Geezer
  • member icon

Reputation: 310
  • View blog
  • Posts: 1,290
  • Joined: 12-September 12

Re: Faster way to execute my loop

Posted 26 June 2013 - 07:49 PM

View PostC.Andrews, on 26 June 2013 - 07:17 PM, said:

I tried almost exactly that same thing before and the performance is terrible. I tried parallelizing the inner loops, outer loop, and every combination thereof but the performance was always really poor so I gave up on it. Tomorrow morning I'll post my entire project in a .zip and you can all laugh hysterically at the parallel processing kludge that I'm using.

Did you try my Xor suggestion in your inner loops? It should save three if statements and one boolean operation per comparison. I don't know how my processor speed compares with yours, but I thought I was getting pretty good times. Should be even better dividing the task up for multi-threading, no?
Was This Post Helpful? 0
  • +
  • -

#29 C.Andrews  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 14
  • View blog
  • Posts: 169
  • Joined: 18-October 12

Re: Faster way to execute my loop

Posted 26 June 2013 - 08:32 PM

View Postlar3ry, on 27 June 2013 - 02:49 AM, said:

View PostC.Andrews, on 26 June 2013 - 07:17 PM, said:

I tried almost exactly that same thing before and the performance is terrible. I tried parallelizing the inner loops, outer loop, and every combination thereof but the performance was always really poor so I gave up on it. Tomorrow morning I'll post my entire project in a .zip and you can all laugh hysterically at the parallel processing kludge that I'm using.

Did you try my Xor suggestion in your inner loops? It should save three if statements and one boolean operation per comparison. I don't know how my processor speed compares with yours, but I thought I was getting pretty good times. Should be even better dividing the task up for multi-threading, no?


I posted the results where I incorporated your Xor method above, I was getting slightly longer times using that comparison vs. the character literal comparison I had been using. I'll post the whole project tomorrow and you can use it to compare apples to apples if you like.
Was This Post Helpful? 0
  • +
  • -

#30 lar3ry  Icon User is offline

  • Coding Geezer
  • member icon

Reputation: 310
  • View blog
  • Posts: 1,290
  • Joined: 12-September 12

Re: Faster way to execute my loop

Posted 26 June 2013 - 08:56 PM

View PostC.Andrews, on 26 June 2013 - 09:32 PM, said:

I posted the results where I incorporated your Xor method above, I was getting slightly longer times using that comparison vs. the character literal comparison I had been using. I'll post the whole project tomorrow and you can use it to compare apples to apples if you like.

Oh! My bad. Didn't notice the Xor code in that post. It's strange alright. Even stranger, your use of AndAlso and OrElse make it slower too.
Was This Post Helpful? 0
  • +
  • -

  • (4 Pages)
  • +
  • 1
  • 2
  • 3
  • 4