Currently, I'm reading in the entire sequence of allele ids as one long string, ~33,400 characters. Next, I look at the value for two birds at a particular locus and if one is a 2 and the other is a 0, I'm adding +1 to a counter. No other combination of 0, 1, or 2 gets counted.
I'm accomplishing that by looping through my 33,400 character string character by character and evaluating each locus pair for every possible combination of birds in my data file. The loop looks like this:
For i As Integer = 0 To AnimalCount - 2 For j = (i + 1) To AnimalCount - 1 For k As Integer = 0 To GeneCount - 1 strCurrent = aryGenome(i) strContrast = aryGenome(j) If Math.Abs(CInt(AscW(strCurrent(k))) - CInt(AscW(strContrast(k)))) > 1 Then intDiffCount += 1 Next aryResults.Add(intDiffCount.ToString()) intDiffCount = 0 Next Next
To speed up the process, I'm splitting the data file into 8 pieces and running the above loop in 8 different tasks, then adding up the results at the end. I can process one of our data files with 5,151 records (13,263,825) in about 48 minutes (13 minutes on my gaming PC at home), but I'm thinking there's probably plenty of opportunity to improve on that time.
When I was asking about the related problem before, there were a lot of suggestions revolving around my datatypes and how I'm representing the data, but I found that it didn't seem to matter how I represented the data; I couldn't get any better performance doing bit-wise compares or anything else. It could be a just did it wrong or misunderstood the suggestion.
Anyway, that's my question: how can I get the loop above to execute faster?