10 Replies - 1788 Views - Last Post: 21 July 2010 - 04:16 AM Rate Topic: -----

#1 cbutler00  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 12
  • Joined: 21-April 10

Fastest way to loop a set of data and create objects

Posted 20 July 2010 - 05:17 AM

I was wondering what would be the fastest way to loop a set of items (Outlook items specifically) and then create objects from the item attributes?
Is This A Good Question/Topic? 0
  • +

Replies To: Fastest way to loop a set of data and create objects

#2 eclipsed4utoo  Icon User is offline

  • Not Your Ordinary Programmer
  • member icon

Reputation: 1524
  • View blog
  • Posts: 5,958
  • Joined: 21-March 08

Re: Fastest way to loop a set of data and create objects

Posted 20 July 2010 - 06:48 AM

a for loop?
Was This Post Helpful? 0
  • +
  • -

#3 Frinavale  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 203
  • View blog
  • Posts: 776
  • Joined: 03-June 10

Re: Fastest way to loop a set of data and create objects

Posted 20 July 2010 - 07:01 AM

Well if you have to loop through all of the items then I don't think it's going to really matter how your loop works. You could use a for loop, a while loop...

Maybe if you split the collection in half recursively and start the object creation/population in separate threads...

Actually! This might be a good chance to check out Parallel Programming :)

Ok, maybe I'm getting a bit too excited about looping!

If your a student/beginner then a for loop or while loop will do fine.

-Frinny

This post has been edited by Frinavale: 20 July 2010 - 07:02 AM

Was This Post Helpful? 0
  • +
  • -

#4 Momerath  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1010
  • View blog
  • Posts: 2,444
  • Joined: 04-October 09

Re: Fastest way to loop a set of data and create objects

Posted 20 July 2010 - 08:22 AM

*
POPULAR

In general you use a foreach to iterate over a collection, for loop to perform an action a set number of times, a while loop to do something zero or more times and a do-while loop to do something one or more times. But lets take a look at an example.

I created a simple list of integers (List<int> myList = new List<int>()) then iterated over it using three different methods:

for (int i = 0; i < myList.Count; i++) {
    MyMethod(myList[i]);
}

foreach (int i in myList) {
    MyMethod(i);
}

int a = 0;
while (a < myList.Count) {
    MyMethod(myList[a]);
    a++;
}


The code for the first one results in this:
            for (int i = 0; i < myList.Count; i++) {
00000070  xor         edx,edx 
00000072  mov         dword ptr [ebp-44h],edx 
00000075  nop 
00000076  jmp         00000096 
00000078  nop 
                MyMethod(myList[i]);
00000079  mov         edx,dword ptr [ebp-44h] 
0000007c  mov         ecx,dword ptr [ebp-40h] 
0000007f  cmp         dword ptr [ecx],ecx 
00000081  call        67601070 
00000086  mov         dword ptr [ebp-6Ch],eax 
00000089  mov         ecx,dword ptr [ebp-6Ch] 
0000008c  call        FFF8A320 
00000091  nop 
            }
00000092  nop 
            for (int i = 0; i < myList.Count; i++) {
00000093  inc         dword ptr [ebp-44h] 
00000096  mov         eax,dword ptr [ebp-44h] 
00000099  mov         dword ptr [ebp-64h],eax 
0000009c  mov         ecx,dword ptr [ebp-40h] 
0000009f  cmp         dword ptr [ecx],ecx 
000000a1  call        66EAC3C0 
000000a6  mov         dword ptr [ebp-68h],eax 
000000a9  mov         eax,dword ptr [ebp-64h] 
000000ac  cmp         eax,dword ptr [ebp-68h] 
000000af  setl        al 
000000b2  movzx       eax,al 
000000b5  mov         dword ptr [ebp-4Ch],eax 
000000b8  cmp         dword ptr [ebp-4Ch],0 
000000bc  jne         00000078 
From this we can see that it sets up the loop variable (the first 5 lines) then jumps down to the 2nd line of the main body (00000096). It does this to skip the incrementing of the loop variable this time through. The rest of the main body compares the loop variable to the limit (myList.Count) and if it is less than jumps to the call to MyMethod(compare is at 000000bc). MyMethod has to dereference the index to get the actual object and then call MyMethod(). There are a total of 24 instructions (not counting nop).

The second results in:
            foreach (int i in myList) {
000000be  nop 
000000bf  lea         edx,[ebp-7Ch] 
000000c2  mov         ecx,dword ptr [ebp-40h] 
000000c5  cmp         dword ptr [ecx],ecx 
000000c7  call        67601570 
000000cc  lea         edi,[ebp-5Ch] 
000000cf  lea         esi,[ebp-7Ch] 
000000d2  movs        dword ptr es:[edi],dword ptr [esi] 
000000d3  movs        dword ptr es:[edi],dword ptr [esi] 
000000d4  movs        dword ptr es:[edi],dword ptr [esi] 
000000d5  movs        dword ptr es:[edi],dword ptr [esi] 
000000d6  nop 
000000d7  jmp         000000FB 
000000d9  lea         ecx,[ebp-5Ch] 
000000dc  call        66EAFDD0 
000000e1  mov         dword ptr [ebp+FFFFFF7Ch],eax 
000000e7  mov         eax,dword ptr [ebp+FFFFFF7Ch] 
000000ed  mov         dword ptr [ebp-44h],eax 
000000f0  nop 
                MyMethod(i);
000000f1  mov         ecx,dword ptr [ebp-44h] 
000000f4  call        FFF8A320 
000000f9  nop 
            }
000000fa  nop 
            foreach (int i in myList) {
000000fb  lea         ecx,[ebp-5Ch] 
000000fe  call        66E90C80 
00000103  mov         dword ptr [ebp-80h],eax 
00000106  movzx       eax,byte ptr [ebp-80h] 
0000010a  mov         dword ptr [ebp-4Ch],eax 
0000010d  cmp         dword ptr [ebp-4Ch],0 
00000111  jne         000000D9 
00000113  nop 
00000114  mov         dword ptr [ebp-20h],0 
0000011b  mov         dword ptr [ebp-1Ch],0FCh 
00000122  push        34257Eh 
00000127  jmp         00000129 
00000129  lea         ecx,[ebp-5Ch] 
0000012c  call        66EAFDE0 
00000131  nop 
00000132  pop         eax 
00000133  jmp         eax 
00000135  nop 
This one is different because it has to get an enumerator rather than using an index. The code in the first block up to the nop is the initializer, after that we again jump down to the 'test' section (line 14, the jmp 000000FB). It then makes a call to the IEnumerable code (line 29) to see if there is a next one to get, and if so jumps back in the code block to the area that does the actual retrieving of the object (Lines 15-19) which then falls through into the MyMethod call and back into the check for more objects. Notice that there is code after the loop is done, and this is to remove the IEnumerable object reference that had to be created. There are a total of 33 instructions.

Last we have:
            int a = 0;
00000136  xor         edx,edx 
00000138  mov         dword ptr [ebp-48h],edx 
0000013b  nop 
0000013c  jmp         0000016B 
0000013e  mov         dword ptr [ebp-1Ch],0 
00000145  jmp         00000135 
            while (a < myList.Count) {
00000147  nop 
                MyMethod(myList[a]);
00000148  mov         edx,dword ptr [ebp-48h] 
0000014b  mov         ecx,dword ptr [ebp-40h] 
0000014e  cmp         dword ptr [ecx],ecx 
00000150  call        67601070 
00000155  mov         dword ptr [ebp+FFFFFF70h],eax 
0000015b  mov         ecx,dword ptr [ebp+FFFFFF70h] 
00000161  call        FFF8A320 
00000166  nop 
                a++;
00000167  inc         dword ptr [ebp-48h] 
            }
0000016a  nop 
            while (a < myList.Count) {
0000016b  mov         eax,dword ptr [ebp-48h] 
0000016e  mov         dword ptr [ebp+FFFFFF78h],eax 
00000174  mov         ecx,dword ptr [ebp-40h] 
00000177  cmp         dword ptr [ecx],ecx 
00000179  call        66EAC3C0 
0000017e  mov         dword ptr [ebp+FFFFFF74h],eax 
00000184  mov         eax,dword ptr [ebp+FFFFFF78h] 
0000018a  cmp         eax,dword ptr [ebp+FFFFFF74h] 
00000190  setl        al 
00000193  movzx       eax,al 
00000196  mov         dword ptr [ebp-4Ch],eax 
00000199  cmp         dword ptr [ebp-4Ch],0 
0000019d  jne         00000147 
This one is similar to the for loop. We start with initializing the counter and jumping into the main body of the code (Lines 2-5). It then makes some calls to get the value of Count and then compares it to our index to see if we are done. If not we jump back in the code to make the call to MyMethod. What I find interesting about this is lines 6-7 which can't be reached in this code. Why are they there? There are a total of 24 instructions, which isn't a big surprise as for and while loops are logically the same.

Based on instruction length, it looks like foreach would be the slowest to execute. That said, to be accurate we'd have to determine how long each instruction takes to execute and the difference in execution times of the IEnumerable calls (Is it faster to call the Count/Index methods or the IsThereANext/GetNext methods of IEnumerable).

My conclusion: Use the one that logically makes sense. The difference in execution times will only matter if you have 10s of millions of objects to deal with.

P.S. One thing to note as that each time through the for or while loops it had to make a call to determine the Count. If you assign this to a variable before the loops you can save the execution of 6 instructions each time you loop. Woot! Optimization!

This post has been edited by Momerath: 20 July 2010 - 08:26 AM

Was This Post Helpful? 5
  • +
  • -

#5 modi123_1  Icon User is online

  • Suitor #2
  • member icon



Reputation: 9049
  • View blog
  • Posts: 33,972
  • Joined: 12-June 08

Re: Fastest way to loop a set of data and create objects

Posted 20 July 2010 - 08:29 AM

Aw.. I vote the least often used 'do-while'. It's neglected.
Was This Post Helpful? 0
  • +
  • -

#6 cbutler00  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 12
  • Joined: 21-April 10

Re: Fastest way to loop a set of data and create objects

Posted 20 July 2010 - 05:24 PM

Thanks to everyone for taking the time to contribute. Momerath your explanation was what I was looking for. Currently what I am doing works but because I have a very large data set it is very slow and anything I could do to optimize the operation would be worth looking into. That said, Frinavale your suggestion may be an option but I'm not sure how much faster it would make it.

Basically I am going through a Folder structure and on each folder I iterate a list of items, check for sub folders and if the count is greater than zero, create those sub folders and iterate through each folder's list of items and on and on. Obviously the larger the Folder structure the longer it takes. Anything that could potentially shave off some time I think would be worth it.
Was This Post Helpful? 0
  • +
  • -

#7 JackOfAllTrades  Icon User is offline

  • Saucy!
  • member icon

Reputation: 6049
  • View blog
  • Posts: 23,475
  • Joined: 23-August 08

Re: Fastest way to loop a set of data and create objects

Posted 20 July 2010 - 05:44 PM

I believe the best thing to do would be to profile your application to find the actual bottleneck, so you don't waste time on things that don't make a big difference.
Was This Post Helpful? 0
  • +
  • -

#8 Momerath  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1010
  • View blog
  • Posts: 2,444
  • Joined: 04-October 09

Re: Fastest way to loop a set of data and create objects

Posted 20 July 2010 - 06:32 PM

View Postcbutler00, on 20 July 2010 - 03:24 PM, said:

Basically I am going through a Folder structure and on each folder I iterate a list of items, check for sub folders and if the count is greater than zero, create those sub folders and iterate through each folder's list of items and on and on. Obviously the larger the Folder structure the longer it takes. Anything that could potentially shave off some time I think would be worth it.
By 'create those folders' do you mean on disk? If so, then I suspect that your program is slow because disk access is slow (compared to processor speed).
Was This Post Helpful? 1
  • +
  • -

#9 cbutler00  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 12
  • Joined: 21-April 10

Re: Fastest way to loop a set of data and create objects

Posted 20 July 2010 - 08:16 PM

No not on disk.

What I am actually doing is going through an Outlook MAPIFolder structure and iterating all mail items in each folder. For each mail item I was creating a 'MailMessage' object, an 'Attachment' object and a 'Sender' object in memory which have all the attributes these objects would have in Outlook itself.

My mailbox is about 1.17GB and creating those objects takes a significant amount of time (several minutes). I have since changed my code to only compute the count and size of each folder's mail items and attachments and this takes significantly less time and memory.

My end goal is a fully configurable Outlook Add-in to backup mail items and / or attachments that you can filter by sender, date, mail or attachment attributes. If it were simply a complete backup utility then I would not need any of this information, but its not, so I do.

I have written an add in that will backup all the mail items in a mailbox and it has worked with over 20,000 mail items producing over 6GB of *.msg files. That operation took several hours and did not check and filters, it just iterated through the mailbox saving every mail item. My concern is when someone sets filters to only save mail items from certain people, before or after a certain date for different folders, mail items that have x number attachments etc, etc...

The bottleneck then is how do I get this information quickly and then once filters are set, how do I check which mail items or attachments to process quickly?

I understand I have gone off topic significantly but if anyone can offer any advice it would be greatly appreciated.
Was This Post Helpful? 0
  • +
  • -

#10 Momerath  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1010
  • View blog
  • Posts: 2,444
  • Joined: 04-October 09

Re: Fastest way to loop a set of data and create objects

Posted 20 July 2010 - 10:08 PM

Disk reads are like disk writes: slow. Your program is going to be IO bound, and optimizing your code for loops isn't going to speed it up much. For filtering if you just provide options to filter on the message headers, that will speed up your searching (as opposed to filtering on message bodies). Otherwise there isn't much you can do.

You might want to consider using a memory caching to hold the Outlook files and implement a method to keep the cache full of new stuff (different thread would help). As long as you are only doing forward reads this shouldn't be that hard.
Was This Post Helpful? 0
  • +
  • -

#11 cbutler00  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 12
  • Joined: 21-April 10

Re: Fastest way to loop a set of data and create objects

Posted 21 July 2010 - 04:16 AM

Thanks again Momerath, but I think that might be just a bit over my skill level... that or I'm being too lazy.

I think I came up with a usable solution; I am not creating the 'MailMessage' objects or 'Attachment' objects anymore but keeping the Senders. I am still calculating all the stats per folder I need without a fair amount of overhead. Its much quicker and does the same job effectively.

I decided to do this since I will need to perform all the checks for process options, filters and saving options as I would have before so creating those objects really wasn't buying me anything.

Thanks again for your help everyone!
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1