Page 1 of 1

MASM : 1024-bit Arithmetic Functions : Part II

#1 Martyn.Rae  Icon User is offline

  • The programming dinosaur
  • member icon

Reputation: 540
  • View blog
  • Posts: 1,406
  • Joined: 22-August 09

Posted 28 March 2011 - 09:46 PM

MASM : 1024-bit Arithmetic Functions : Part II

Introduction

The ability to provide arithmetic on integers larger than the CPU register width has always provided me with endless hours of fun in assembly. Being a kind and generous person, I have decided to share that enjoyment with you all, by providing a set of three tutorials on addition, subtraction, multiplication and division on binary numbers that are very stupidly large!

This second tutorial concerns itself with 512-bit unsigned multiplication providing a 1024-bit result.

Theory

Using Booths Algorithm, for the equation A x B = C where A and B are binary numbers, we take B (the multiplier) and for each iteration of the 512 bits (as this is the size of the multiplier), if the least significant bit is 1, then add the multiplicand to a partial result, then right-shift the multiplier 1 bit and left-shift the multiplicand 1 bit otherwise right-shift the multiplier 1 bit and left-shift the multiplicand 1 bit.

1024-bit Shift Left

This is the 1024-bit shift left function, which performs an in-situ left shift on the 32 words at the address passed as a parameter. We use the shl machine code instruction which shifts the least significant word left by 1 bit moving binary 0 into the 'hole' left in the least significant bit position and the bit shifted out of the most significant bit is moved into the 'carry' bit in the condition codes register. The remaining 31 iterations of the function use the rcl machine code instruction which shifts the current word left by 1 bit moving the carry flag into the 'hole' left in the least significant bit position and the bit shifted out of the most significant bit is moved into the 'carry' bit in the condition codes register.

Shl1024				proc		num1:dword
					
					push		ecx
					mov			esi, num1
					mov			ecx, 32
					shl			dword ptr [esi+ecx*4-4], 1
					pushf
					dec			ecx
Shl1024_loop:		popf
					rcl			dword ptr [esi+ecx*4-4], 1
					pushf
					dec			ecx
					jnz			Shl1024_loop
					popf
					pop			ecx
					ret

Shl1024				endp



1024-bit Shift Right

This is the 1024-bit shift right function, which performs an in-situ right shift on the 32 words at the address passed as a parameter. We use the shr machine code instruction which shifts the most significant word right by 1 bit moving binary 0 into the 'hole' left in the most significant bit position and the bit shifted out of the least significant bit is moved into the 'carry' bit in the condition codes register. The remaining 31 iterations of the function use the rcr machine code instruction which shifts the current word right by 1 bit moving the carry flag into the 'hole' left in the most significant bit position and the bit shifted out of the least significant bit is moved into the 'carry' bit in the condition codes register.


Shr1024				proc		num1:dword

					push		ecx
					mov			esi, num1
					mov			ecx, 32
					shr			dword ptr [esi], 1
					pushf
					dec			ecx
Shr1024_loop:		add			esi, 4
					popf
					rcr			dword ptr [esi], 1
					pushf
					dec			ecx
					jnz			Shr1024_loop
					popf
					pop			ecx
					ret

Shr1024				endp




512-bit Multiplication Producing 1024-bit Result

The following function performs the multiplication itself. The multiplicand (num1) which is 512-bits long is moved into the least significant 512 bits of a 1024-bit area (num1_shift), and the upper 512 bits set to zero. Similarly, the multiplier (num2) which is 512-bits long is moved into the least significant 512 bits of a 1024-bit area (num2_shift), and the upper 512 bits set to zero. The partial result (partial) is set to zero.

The actual multiplication uses the shift right function provided above to determine the value of the least significant bit of the multiplier, then based on that either adds the multiplicand to the partial result or not. The multiplicand is then shifted left. When the loop has been iterated 512 times, the 1024-bit partial result (partial) is copied to the address provided by the caller through the result parameter.

Mul1024				proc		num1:dword, num2:dword, result:dword
					local		num1_shift[32]:DWORD, partial[32]:DWORD
					local		num2_shift[32]:DWORD
					xor			eax, eax
					lea			edi, partial
					mov			ecx, 32
					rep			stosd
					lea			edi, num1_shift
					mov			ecx, 16
					rep			stosd
					mov			esi, num1
					mov			ecx, 16
					rep			movsd
					lea			edi, num2_shift
					mov			ecx, 16
					rep			stosd
					mov			esi, num2
					mov			ecx, 16
					rep			movsd
					mov			ecx, 512
					clc
					pushf
Mul1024_loop:		popf
					invoke		Shr1024, ADDR num2_shift
					jnc			Mul1024_continue
					invoke		Add1024, ADDR num1_shift, ADDR partial, ADDR partial
Mul1024_continue:	invoke		Shl1024, ADDR num1_shift
					pushf
					dec			ecx
					jnz			Mul1024_loop
					popf
					lea			esi, partial
					mov			edi, result
					mov			ecx, 32
					rep			movsd
					ret
Mul1024				endp



Part III of this tutorial may be found here

This post has been edited by Martyn.Rae: 31 March 2011 - 01:30 PM


Is This A Good Question/Topic? 0
  • +

Page 1 of 1