**MASM : 1024-bit Arithmetic Functions : Part II**

**Introduction**

The ability to provide arithmetic on integers larger than the CPU register width has always provided me with endless hours of fun in assembly. Being a kind and generous person, I have decided to share that enjoyment with you all, by providing a set of three tutorials on addition, subtraction, multiplication and division on binary numbers that are

This second tutorial concerns itself with 512-bit unsigned multiplication providing a 1024-bit result.

**Theory**

Using Booths Algorithm, for the equation A x B = C where A and B are binary numbers, we take B (the multiplier) and for each iteration of the 512 bits (as this is the size of the multiplier), if the least significant bit is 1, then add the multiplicand to a partial result, then right-shift the multiplier 1 bit and left-shift the multiplicand 1 bit otherwise right-shift the multiplier 1 bit and left-shift the multiplicand 1 bit.

**1024-bit Shift Left**

This is the 1024-bit shift left function, which performs an in-situ left shift on the 32 words at the address passed as a parameter. We use the

**shl**machine code instruction which shifts the least significant word left by 1 bit moving binary 0 into the 'hole' left in the least significant bit position and the bit shifted out of the most significant bit is moved into the 'carry' bit in the condition codes register. The remaining 31 iterations of the function use the

**rcl**machine code instruction which shifts the current word left by 1 bit moving the carry flag into the 'hole' left in the least significant bit position and the bit shifted out of the most significant bit is moved into the 'carry' bit in the condition codes register.

Shl1024 proc num1:dword push ecx mov esi, num1 mov ecx, 32 shl dword ptr [esi+ecx*4-4], 1 pushf dec ecx Shl1024_loop: popf rcl dword ptr [esi+ecx*4-4], 1 pushf dec ecx jnz Shl1024_loop popf pop ecx ret Shl1024 endp

**1024-bit Shift Right**

This is the 1024-bit shift right function, which performs an in-situ right shift on the 32 words at the address passed as a parameter. We use the

**shr**machine code instruction which shifts the most significant word right by 1 bit moving binary 0 into the 'hole' left in the most significant bit position and the bit shifted out of the least significant bit is moved into the 'carry' bit in the condition codes register. The remaining 31 iterations of the function use the

**rcr**machine code instruction which shifts the current word right by 1 bit moving the carry flag into the 'hole' left in the most significant bit position and the bit shifted out of the least significant bit is moved into the 'carry' bit in the condition codes register.

Shr1024 proc num1:dword push ecx mov esi, num1 mov ecx, 32 shr dword ptr [esi], 1 pushf dec ecx Shr1024_loop: add esi, 4 popf rcr dword ptr [esi], 1 pushf dec ecx jnz Shr1024_loop popf pop ecx ret Shr1024 endp

**512-bit Multiplication Producing 1024-bit Result**

The following function performs the multiplication itself. The multiplicand (

**num1**) which is 512-bits long is moved into the least significant 512 bits of a 1024-bit area (

**num1_shift**), and the upper 512 bits set to zero. Similarly, the multiplier (

**num2**) which is 512-bits long is moved into the least significant 512 bits of a 1024-bit area (

**num2_shift**), and the upper 512 bits set to zero. The partial result (partial) is set to zero.

The actual multiplication uses the shift right function provided above to determine the value of the least significant bit of the multiplier, then based on that either adds the multiplicand to the partial result or not. The multiplicand is then shifted left. When the loop has been iterated 512 times, the 1024-bit partial result (

**partial**) is copied to the address provided by the caller through the

**result**parameter.

Mul1024 proc num1:dword, num2:dword, result:dword local num1_shift[32]:DWORD, partial[32]:DWORD local num2_shift[32]:DWORD xor eax, eax lea edi, partial mov ecx, 32 rep stosd lea edi, num1_shift mov ecx, 16 rep stosd mov esi, num1 mov ecx, 16 rep movsd lea edi, num2_shift mov ecx, 16 rep stosd mov esi, num2 mov ecx, 16 rep movsd mov ecx, 512 clc pushf Mul1024_loop: popf invoke Shr1024, ADDR num2_shift jnc Mul1024_continue invoke Add1024, ADDR num1_shift, ADDR partial, ADDR partial Mul1024_continue: invoke Shl1024, ADDR num1_shift pushf dec ecx jnz Mul1024_loop popf lea esi, partial mov edi, result mov ecx, 32 rep movsd ret Mul1024 endp

Part III of this tutorial may be found here

This post has been edited by **Martyn.Rae**: 31 March 2011 - 01:30 PM