## 1 Replies - 3954 Views - Last Post: 02 April 2011 - 05:11 PM

### #1 TopdeK Reputation: 1
• Posts: 3
• Joined: 02-April 11

# MIPS64 Optimization of Code

Posted 02 April 2011 - 05:06 PM

Hi all,

I was wondering if you could take a look at some code for me and see if I've done enough to optimize the code. Firstly, just to explain, my assignment was to translate a C++ program used to find the legendre symbol for two values X and P into a working MIPS64 code. Then after translating the C++ code, optimize the MIPS64 code. I have translated the code to the best of my ability and then I optimized it too. I have it down to 278 cycles using the values X: 40902 and P: 84143. My biggest issues seem to be Structural Stalls (25) and Branch Taken Stalls (25). I'm using a simulator (winmips64) provided by my lecturer and its got some limitations. I've provided the Instruction Set for the simulator below. Finally, the latency for my pipelines is as follows: Addition - 3, Multiplication - 4 and Division - 10. Can anyone spot a simple rescheduling of code to reduce it from 278. If not, I'm happy to submit it at 278. If rumor is true, it should get me about 87% of the marks. But I'm aiming for top 5 - 10%.

Thanks in advance. No worries if nothing can be done. I'm just looking for an outside perspective.

NOTE: For t = t % x, I used the algorithm: "t - [t / x] * x"

MIPS
```.data

x:	.word 40902
p:	.word 84143

.text
ld r1, x(r0)		; load x
ld r2, p(r0)		; load p

daddi r15, r0, 1	; value of 1 for check with while loop
dadd r3, r0, r0		; variable m set to zero

main:	daddi r30, r2, 0	; set R30 to p

beq r30, r15, end	; if equal to 1, finish ... [while(p > 1)]

for:	andi r16, r1, 1		; x % 2
andi r5, r2, 7		; p8 = p % 8
bnez r16, if		; if x is odd, branch.
daddi r4, r4, 1		; increment k by 1
dsrl r1, r1, 1		; divide x by 2
j for			; loop around

if:	andi r17, r4, 1		; k % 2
dmul r18, r5, r5	; p8 * p8
beqz r17, skip		; if k is even, branch
nop			; dmul is not ready with R18. RAW stall
daddi r18, r18, -1	; (p8 * p8) - 1
dsrl r27, r18, 3	; [(p8 * p8) - 1] / 8

skip:	ddiv r19, r2, r1	; p / x ... since t=p and t=t%x, optimize as p%x
daddi r5, r5, -1	; p8 - 1
andi r23, r1, 3		; x % 4
daddi r29, r23, -1	; [x % 4] - 1
dmul r24, r29, r5	; (p8 - 1) * [(x % 4) - 1]
daddi r31, r0, -1	; -1 for negative output at end, rescheduled to reduce stalls
dadd r4, r0, r0		; variable k
dsrl r25, r24, 2	; divide by 4

dmul r20, r19, r1	; (p / x) * x
andi r3, r3, 1		; m % 2
dadd r2, r0, r1		; p = x
dsub r6, r30, r20	; p - [(p / x) * x] ... Assign to t
dadd r1, r0, r6		; x = t

j main			; jump back to check condition of while loop

end:	bnez r3, minus
j exit

exit:	halt

;IMPORTANT REGISTERS
;r1 = x
;r2 = p
;r3 = m
;r4 = k
;r5 = p8
;r6 = t
;r10 = legendre symbol

```

C++
```int legendre(int x, int p)
{
int m, k, p8, t;

m = 0;
while(p > 1)
{
for(k=0; x%2==0; k++)
x/=2;

p8 = p%8;
if(k%2==1)
m+=(p8*p8-1)/8;

m+=((p8-1)*(x%4-1))/4;
t=p;
t%=x;
p=x;
x=t;
m%2;
}
if(m==0) return 1;
else return -1;
}

```

I.SET
```The following assembler directives are supported

.data                 - start of data segment
.text                 - start of code segment
.code                 - start of code segment (same as .text)
.space  <n>           - leave n empty bytes
.asciiz <s>           - enters zero terminated ascii string
.ascii  <s>           - enter ascii string
.align  <n>           - align to n-byte boundary
.word   <n1>,<n2>..   - enters word(s) of data (64-bits)
.byte   <n1>,<n2>..   - enter bytes
.word32 <n1>,<n2>..   - enters 32 bit number(s)
.word16 <n1>,<n2>..   - enters 16 bit number(s)
.double <n1>,<n2>..   - enters floating-point number(s)

where <n> denotes a number like 24, <s> denotes a string like "fred", and
<n1>,<n2>.. denotes numbers seperated by commas. The integer registers can
be referred to as r0-r31, or R0-R31, or \$0-\$31 or using standard MIPS
pseudo-names, like \$zero for r0, \$t0 for r8 etc. Note that the size of an
immediate is limited to 16-bits. The maximum size of an immediate register
shift is 5 bits (so a shift by greater than 31 is illegal).

Floating point registers can be referred to as f0-f31, or F0-F31

The following instructions are supported

sb      - store byte
lhu     - load 16-bit half word unsigned
sh      - store 16-bit half-word
lwu     - load 32-bit word unsigned
sw      - store 32-bit word
sd      - store 64-bit double-word
s.d     - store 64-bit floating-point
halt    - stops the program

andi    - logical and immediate
ori     - logical or immediate
xori    - exclusive or immediate
lui     - load upper half of register immediate

slti    - set if less than or equal immediate
sltiu   - set if less than or equal immediate unsigned

beq     - branch if pair of registers are equal
bne     - branch if pair of registers are not equal
beqz    - branch if register is equal to zero
bnez    - branch if register is not equal to zero

dsll    - shift left logical
dsrl    - shift right logical
dsra    - shift right arithmetic
dsllv   - shift left logical by variable amount
dsrlv   - shift right logical by variable amount
dsrav   - shift right arithmetic by variable amount
movz    - move if register equals zero
movn    - move if register not equal to zero
nop     - no operation
and     - logical and
or      - logical or
xor     - logical xor
slt     - set if less than
sltu    - set if less than unsigned
dsub    - subtract integers
dsubu   - subtract integers unsigned
dmul    - signed integer multiplication
dmulu   - unsigned integer multiplication
ddiv    - signed integer division
ddivu   - unsigned integer division

sub.d   - subtract floating-point
mul.d   - multiply floating-point
div.d   - divide floating-point
mov.d   - move floating-point
cvt.d.l - convert 64-bit integer to a double floating-point format
cvt.l.d - convert double floating-point to a 64-bit integer format
c.lt.d  - set FP flag if less than
c.le.d  - set FP flag if less than or equal to
c.eq.d  - set FP flag if equal to
bc1f    - branch to address if FP flag is FALSE
bc1t    - branch to address if FP flag is TRUE
mtc1    - move data from integer register to floating-point register
mfc1    - move data from floating-point register to integer register

Memory Mapped I/O area

Addresses of CONTROL and DATA registers

CONTROL: .word32 0x10000
DATA:    .word32 0x10008

Set CONTROL = 1, Set DATA to Unsigned Integer to be output
Set CONTROL = 2, Set DATA to Signed Integer to be output
Set CONTROL = 3, Set DATA to Floating Point to be output
Set CONTROL = 4, Set DATA to address of string to be output
Set CONTROL = 5, Set DATA+5 to x coordinate, DATA+4 to y coordinate, and DATA to RGB colour to be output
Set CONTROL = 6, Clears the terminal screen
Set CONTROL = 7, Clears the graphics screen
Set CONTROL = 8, read the DATA (either an integer or a floating-point) from the keyboard
Set CONTROL = 9, read one byte from DATA, no character echo.

```

Is This A Good Question/Topic? 0

## Replies To: MIPS64 Optimization of Code

### #2 macosxnerd101 • • Games, Graphs, and Auctions
•     Reputation: 12742
• Posts: 45,926
• Joined: 27-December 08

## Re: MIPS64 Optimization of Code

Posted 02 April 2011 - 05:11 PM

Moved to Assembly.

Page 1 of 1

 .related ul { list-style-type: circle; font-size: 12px; font-weight: bold; } .related li { margin-bottom: 5px; background-position: left 7px !important; margin-left: -35px; } .related h2 { font-size: 18px; font-weight: bold; } .related a { color: blue; }