Prathamesh Kulkarni via llvm-dev
2020-Mar-31 16:33 UTC
[llvm-dev] [ARM] Register pressure with -mthumb forces register reload before each call
Hi,
Compiling attached test-case, which is reduced version of of
uECC_shared_secret from tinycrypt library [1], with
--target=arm-linux-gnueabi -march=armv6-m -Oz -S
results in reloading of register holding function's address before
every call to blx:
ldr r3, .LCPI0_0
blx r3
mov r0, r6
mov r1, r5
mov r2, r4
ldr r3, .LCPI0_0
blx r3
ldr r3, .LCPI0_0
mov r0, r6
mov r1, r5
mov r2, r4
blx r3
.LCPI0_0:
.long foo
>From dump of regalloc (attached), AFAIU, what seems to happen during
greedy allocator is, all virt regs %0 to %3 are live across first two
calls to foo. Thus %0, %1 and %2 get assigned r6, r5 and r4
respectively, and %3 which holds foo's address doesn't have any
register left.
Since it's live-range has least weight, it does not evict any existing
interval,
and gets split. Eventually we have the following allocation:
[%0 -> $r6] tGPR
[%1 -> $r5] tGPR
[%2 -> $r4] tGPR
[%6 -> $r3] tGPR
[%11 -> $r3] tGPR
[%16 -> $r3] tGPR
[%17 -> $r3] tGPR
where %6, %11, %16 and %17 all are derived from %3.
And since r3 is a call-clobbered register, the compiler is forced to
reload foo's address
each time before blx.
To fix this, I thought of following approaches:
(a) Disable the heuristic to prefer indirect call when there are at
least 3 calls to
same function in basic block in ARMTargetLowering::LowerCall for Thumb-1 ISA.
(b) In ARMTargetLowering::LowerCall, put another constraint like
number of arguments, as a proxy for register pressure for Thumb-1, but
that's bound to trip another cases.
(c) Give higher priority to allocate vrit reg used for indirect calls
? However, if that
results in spilling of some other register, it would defeat the
purpose of saving code-size. I suppose ideally we want to trigger the
heuristic of using indirect call only when we know beforehand that it
will not result in spilling. But I am not sure if it's possible to
estimate that during isel ?
I would be grateful for suggestions on how to proceed further.
[1] https://github.com/intel/tinycrypt/blob/master/lib/source/ecc_dh.c#L139
Thanks,
Prathamesh
-------------- next part --------------
PreferIndirect: 1
PreferIndirect: 1
PreferIndirect: 1
Computing live-in reg-units in ABI blocks.
0B %bb.0 R0#0 R1#0 R2#0
Created 3 new intervals.
********** INTERVALS **********
R0 [0B,48r:0)[96r,144r:3)[192r,240r:2)[288r,336r:1) 0 at 0B-phi 1 at 288r 2 at
192r 3 at 96r
R1 [0B,32r:0)[112r,144r:3)[208r,240r:2)[304r,336r:1) 0 at 0B-phi 1 at 304r 2 at
208r 3 at 112r
R2 [0B,16r:0)[128r,144r:3)[224r,240r:2)[320r,336r:1) 0 at 0B-phi 1 at 320r 2 at
224r 3 at 128r
%0 [48r,288r:0) 0 at 48r weight:0.000000e+00
%1 [32r,304r:0) 0 at 32r weight:0.000000e+00
%2 [16r,320r:0) 0 at 16r weight:0.000000e+00
%3 [80r,336r:0) 0 at 80r weight:0.000000e+00
RegMasks: 144r 240r 336r
********** MACHINEINSTRS **********
# Machine code for function uECC_shared_secret: NoPHIs, TracksLiveness
Constant Pool:
cp#0: @foo, align=4
Function Live Ins: $r0 in %0, $r1 in %1, $r2 in %2
0B bb.0.entry:
liveins: $r0, $r1, $r2
16B %2:tgpr = COPY $r2
32B %1:tgpr = COPY $r1
48B %0:tgpr = COPY $r0
64B ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
80B %3:tgpr = tLDRpci %const.0, 14, $noreg :: (load 4 from constant-pool)
96B $r0 = COPY %0:tgpr
112B $r1 = COPY %1:tgpr
128B $r2 = COPY %2:tgpr
144B tBLXr 14, $noreg, %3:tgpr, <regmask $lr $d8 $d9 $d10 $d11 $d12 $d13
$d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17 $s18 $s19
$s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>, implicit-def dead
$lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2, implicit-def $sp
160B ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
176B ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
192B $r0 = COPY %0:tgpr
208B $r1 = COPY %1:tgpr
224B $r2 = COPY %2:tgpr
240B tBLXr 14, $noreg, %3:tgpr, <regmask $lr $d8 $d9 $d10 $d11 $d12 $d13
$d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17 $s18 $s19
$s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>, implicit-def dead
$lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2, implicit-def $sp
256B ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
272B ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
288B $r0 = COPY %0:tgpr
304B $r1 = COPY %1:tgpr
320B $r2 = COPY %2:tgpr
336B tBLXr 14, $noreg, %3:tgpr, <regmask $lr $d8 $d9 $d10 $d11 $d12 $d13
$d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17 $s18 $s19
$s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>, implicit-def dead
$lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2, implicit-def $sp
352B ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
368B tBX_RET 14, $noreg
# End machine code for function uECC_shared_secret.
********** SIMPLE REGISTER COALESCING **********
********** Function: uECC_shared_secret
********** JOINING INTERVALS ***********
entry:
16B %2:tgpr = COPY $r2
Considering merging %2 with $r2
Can only merge into reserved registers.
32B %1:tgpr = COPY $r1
Considering merging %1 with $r1
Can only merge into reserved registers.
48B %0:tgpr = COPY $r0
Considering merging %0 with $r0
Can only merge into reserved registers.
96B $r0 = COPY %0:tgpr
Considering merging %0 with $r0
Can only merge into reserved registers.
112B $r1 = COPY %1:tgpr
Considering merging %1 with $r1
Can only merge into reserved registers.
128B $r2 = COPY %2:tgpr
Considering merging %2 with $r2
Can only merge into reserved registers.
192B $r0 = COPY %0:tgpr
Considering merging %0 with $r0
Can only merge into reserved registers.
208B $r1 = COPY %1:tgpr
Considering merging %1 with $r1
Can only merge into reserved registers.
224B $r2 = COPY %2:tgpr
Considering merging %2 with $r2
Can only merge into reserved registers.
288B $r0 = COPY %0:tgpr
Considering merging %0 with $r0
Can only merge into reserved registers.
304B $r1 = COPY %1:tgpr
Considering merging %1 with $r1
Can only merge into reserved registers.
320B $r2 = COPY %2:tgpr
Considering merging %2 with $r2
Can only merge into reserved registers.
96B $r0 = COPY %0:tgpr
Considering merging %0 with $r0
Can only merge into reserved registers.
112B $r1 = COPY %1:tgpr
Considering merging %1 with $r1
Can only merge into reserved registers.
128B $r2 = COPY %2:tgpr
Considering merging %2 with $r2
Can only merge into reserved registers.
192B $r0 = COPY %0:tgpr
Considering merging %0 with $r0
Can only merge into reserved registers.
208B $r1 = COPY %1:tgpr
Considering merging %1 with $r1
Can only merge into reserved registers.
224B $r2 = COPY %2:tgpr
Considering merging %2 with $r2
Can only merge into reserved registers.
288B $r0 = COPY %0:tgpr
Considering merging %0 with $r0
Can only merge into reserved registers.
304B $r1 = COPY %1:tgpr
Considering merging %1 with $r1
Can only merge into reserved registers.
320B $r2 = COPY %2:tgpr
Considering merging %2 with $r2
Can only merge into reserved registers.
Trying to inflate 0 regs.
********** INTERVALS **********
R0 [0B,48r:0)[96r,144r:3)[192r,240r:2)[288r,336r:1) 0 at 0B-phi 1 at 288r 2 at
192r 3 at 96r
R1 [0B,32r:0)[112r,144r:3)[208r,240r:2)[304r,336r:1) 0 at 0B-phi 1 at 304r 2 at
208r 3 at 112r
R2 [0B,16r:0)[128r,144r:3)[224r,240r:2)[320r,336r:1) 0 at 0B-phi 1 at 320r 2 at
224r 3 at 128r
%0 [48r,288r:0) 0 at 48r weight:0.000000e+00
%1 [32r,304r:0) 0 at 32r weight:0.000000e+00
%2 [16r,320r:0) 0 at 16r weight:0.000000e+00
%3 [80r,336r:0) 0 at 80r weight:0.000000e+00
RegMasks: 144r 240r 336r
********** MACHINEINSTRS **********
# Machine code for function uECC_shared_secret: NoPHIs, TracksLiveness
Constant Pool:
cp#0: @foo, align=4
Function Live Ins: $r0 in %0, $r1 in %1, $r2 in %2
0B bb.0.entry:
liveins: $r0, $r1, $r2
16B %2:tgpr = COPY $r2
32B %1:tgpr = COPY $r1
48B %0:tgpr = COPY $r0
64B ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
80B %3:tgpr = tLDRpci %const.0, 14, $noreg :: (load 4 from constant-pool)
96B $r0 = COPY %0:tgpr
112B $r1 = COPY %1:tgpr
128B $r2 = COPY %2:tgpr
144B tBLXr 14, $noreg, %3:tgpr, <regmask $lr $d8 $d9 $d10 $d11 $d12 $d13
$d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17 $s18 $s19
$s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>, implicit-def dead
$lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2, implicit-def $sp
160B ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
176B ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
192B $r0 = COPY %0:tgpr
208B $r1 = COPY %1:tgpr
224B $r2 = COPY %2:tgpr
240B tBLXr 14, $noreg, %3:tgpr, <regmask $lr $d8 $d9 $d10 $d11 $d12 $d13
$d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17 $s18 $s19
$s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>, implicit-def dead
$lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2, implicit-def $sp
256B ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
272B ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
288B $r0 = COPY %0:tgpr
304B $r1 = COPY %1:tgpr
320B $r2 = COPY %2:tgpr
336B tBLXr 14, $noreg, %3:tgpr, <regmask $lr $d8 $d9 $d10 $d11 $d12 $d13
$d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17 $s18 $s19
$s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>, implicit-def dead
$lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2, implicit-def $sp
352B ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
368B tBX_RET 14, $noreg
# End machine code for function uECC_shared_secret.
********** GREEDY REGISTER ALLOCATION **********
********** Function: uECC_shared_secret
********** INTERVALS **********
R0 [0B,48r:0)[96r,144r:3)[192r,240r:2)[288r,336r:1) 0 at 0B-phi 1 at 288r 2 at
192r 3 at 96r
R1 [0B,32r:0)[112r,144r:3)[208r,240r:2)[304r,336r:1) 0 at 0B-phi 1 at 304r 2 at
208r 3 at 112r
R2 [0B,16r:0)[128r,144r:3)[224r,240r:2)[320r,336r:1) 0 at 0B-phi 1 at 320r 2 at
224r 3 at 128r
%0 [48r,288r:0) 0 at 48r weight:6.312500e-03
%1 [32r,304r:0) 0 at 32r weight:6.011905e-03
%2 [16r,320r:0) 0 at 16r weight:5.738636e-03
%3 [80r,336r:0) 0 at 80r weight:3.048780e-03
RegMasks: 144r 240r 336r
********** MACHINEINSTRS **********
# Machine code for function uECC_shared_secret: NoPHIs, TracksLiveness
Constant Pool:
cp#0: @foo, align=4
Function Live Ins: $r0 in %0, $r1 in %1, $r2 in %2
0B bb.0.entry:
liveins: $r0, $r1, $r2
16B %2:tgpr = COPY $r2
32B %1:tgpr = COPY $r1
48B %0:tgpr = COPY $r0
64B ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
80B %3:tgpr = tLDRpci %const.0, 14, $noreg :: (load 4 from constant-pool)
96B $r0 = COPY %0:tgpr
112B $r1 = COPY %1:tgpr
128B $r2 = COPY %2:tgpr
144B tBLXr 14, $noreg, %3:tgpr, <regmask $lr $d8 $d9 $d10 $d11 $d12 $d13
$d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17 $s18 $s19
$s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>, implicit-def dead
$lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2, implicit-def $sp
160B ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
176B ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
192B $r0 = COPY %0:tgpr
208B $r1 = COPY %1:tgpr
224B $r2 = COPY %2:tgpr
240B tBLXr 14, $noreg, %3:tgpr, <regmask $lr $d8 $d9 $d10 $d11 $d12 $d13
$d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17 $s18 $s19
$s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>, implicit-def dead
$lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2, implicit-def $sp
256B ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
272B ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
288B $r0 = COPY %0:tgpr
304B $r1 = COPY %1:tgpr
320B $r2 = COPY %2:tgpr
336B tBLXr 14, $noreg, %3:tgpr, <regmask $lr $d8 $d9 $d10 $d11 $d12 $d13
$d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17 $s18 $s19
$s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>, implicit-def dead
$lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2, implicit-def $sp
352B ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
368B tBX_RET 14, $noreg
# End machine code for function uECC_shared_secret.
selectOrSplit tGPR:%2 [16r,320r:0) 0 at 16r weight:5.738636e-03 w=5.738636e-03
AllocationOrder(tGPR) = [ $r0 $r1 $r2 $r3 $r4 $r5 $r6 ]
hints: $r2
Checking interference for %2 [16r,320r:0) 0 at 16r weight:5.738636e-03
$r2: IK_RegMask
$r0: IK_RegMask
$r1: IK_RegMask
$r3: IK_RegMask
$r4: IK_Free
missed hint $r2
assigning %2 to $r4: R4 [16r,320r:0) 0 at 16r
selectOrSplit tGPR:%1 [32r,304r:0) 0 at 32r weight:6.011905e-03 w=6.011905e-03
hints: $r1
Checking interference for %1 [32r,304r:0) 0 at 32r weight:6.011905e-03
$r1: IK_RegMask
$r0: IK_RegMask
$r2: IK_RegMask
$r3: IK_RegMask
$r4: IK_VirtReg
$r5: IK_Free
missed hint $r1
assigning %1 to $r5: R5 [32r,304r:0) 0 at 32r
selectOrSplit tGPR:%0 [48r,288r:0) 0 at 48r weight:6.312500e-03 w=6.312500e-03
hints: $r0
Checking interference for %0 [48r,288r:0) 0 at 48r weight:6.312500e-03
$r0: IK_RegMask
$r1: IK_RegMask
$r2: IK_RegMask
$r3: IK_RegMask
$r4: IK_VirtReg
$r5: IK_VirtReg
$r6: IK_Free
missed hint $r0
assigning %0 to $r6: R6 [48r,288r:0) 0 at 48r
selectOrSplit tGPR:%3 [80r,336r:0) 0 at 80r weight:3.048780e-03 w=3.048780e-03
Checking interference for %3 [80r,336r:0) 0 at 80r weight:3.048780e-03
$r0: IK_RegMask
$r1: IK_RegMask
$r2: IK_RegMask
$r3: IK_RegMask
$r4: IK_VirtReg
$r5: IK_VirtReg
$r6: IK_VirtReg
RS_Assign Cascade 0
wait for second round
queuing new interval: %3 [80r,336r:0) 0 at 80r weight:3.048780e-03
selectOrSplit tGPR:%3 [80r,336r:0) 0 at 80r weight:3.048780e-03 w=3.048780e-03
Checking interference for %3 [80r,336r:0) 0 at 80r weight:3.048780e-03
$r0: IK_RegMask
$r1: IK_RegMask
$r2: IK_RegMask
$r3: IK_RegMask
$r4: IK_VirtReg
$r5: IK_VirtReg
$r6: IK_VirtReg
RS_Split Cascade 0
Analyze counted 4 instrs in 1 blocks, through 0 blocks.
tryLocalSplit: 80r 144r 240r 336r
3 regmasks in block: 144r:80r-144r 144r:144r-240r 240r:240r-336r
$r0 80r-144r i=INF extend
$r0 144r-240r i=INF extend
$r0 240r-336r i=INF end
$r1 80r-144r i=INF extend
$r1 144r-240r i=INF extend
$r1 240r-336r i=INF end
$r2 80r-144r i=INF extend
$r2 144r-240r i=INF extend
$r2 240r-336r i=INF end
$r3 80r-144r i=INF extend
$r3 144r-240r i=INF extend
$r3 240r-336r i=INF end
$r4 80r-144r i=5.738636e-03 w=6.250000e-03 (best) extend
$r4 80r-240r i=5.738636e-03 w=6.944444e-03 (best) extend
$r4 80r-336r i=5.738636e-03 all
$r5 80r-144r i=6.011905e-03 w=6.250000e-03 extend
$r5 80r-240r i=6.011905e-03 w=6.944444e-03 extend
$r5 80r-336r i=6.011905e-03 all
$r6 80r-144r i=6.312500e-03 w=6.250000e-03 extend
$r6 144r-240r i=6.312500e-03 w=7.575758e-03 (best) extend
$r6 144r-336r i=6.312500e-03 w=6.578947e-03 end
Best local split range: 144r-240r, 1.237968e-03, 2 instrs
enterIntvBefore 144r: valno 0
leaveIntvAfter 240r: valno 0
useIntv [136r;248r): [136r;248r):1
blit [80r,336r:0): [80r;136r)=0(%4):1*%bb.0 [136r;248r)=1(%5):0
[248r;336r)=0(%4):0*%bb.0
rewr %bb.0 80r:0 %4:tgpr = tLDRpci %const.0, 14, $noreg :: (load 4 from
constant-pool)
rewr %bb.0 144B:1 tBLXr 14, $noreg, %5:tgpr, <regmask $lr $d8 $d9 $d10 $d11
$d12 $d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17
$s18 $s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>,
implicit-def dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2,
implicit-def $sp
rewr %bb.0 240B:1 tBLXr 14, $noreg, %5:tgpr, <regmask $lr $d8 $d9 $d10 $d11
$d12 $d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17
$s18 $s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>,
implicit-def dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2,
implicit-def $sp
rewr %bb.0 336B:0 tBLXr 14, $noreg, %4:tgpr, <regmask $lr $d8 $d9 $d10 $d11
$d12 $d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17
$s18 $s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>,
implicit-def dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2,
implicit-def $sp
rewr %bb.0 136B:0 %5:tgpr = COPY %4:tgpr
rewr %bb.0 248B:1 %4:tgpr = COPY %5:tgpr
Split 2 components: %4 [80r,136r:0)[248r,336r:1) 0 at 80r 1 at 248r
weight:0.000000e+00
Tagging non-progress ranges: %5
queuing new interval: %4 [80r,136r:0) 0 at 80r weight:2.214912e-03
queuing new interval: %5 [136r,248r:0) 0 at 136r weight:3.945312e-03
queuing new interval: %6 [248r,336r:0) 0 at 248r weight:2.069672e-03
selectOrSplit tGPR:%5 [136r,248r:0) 0 at 136r weight:3.945312e-03
w=3.945312e-03
Checking interference for %5 [136r,248r:0) 0 at 136r weight:3.945312e-03
$r0: IK_RegMask
$r1: IK_RegMask
$r2: IK_RegMask
$r3: IK_RegMask
$r4: IK_VirtReg
$r5: IK_VirtReg
$r6: IK_VirtReg
RS_Split2 Cascade 0
Analyze counted 4 instrs in 1 blocks, through 0 blocks.
tryLocalSplit: 136r 144r 240r 248r
3 regmasks in block: 144r:136r-144r 144r:144r-240r 240r:240r-248r
$r0 136r-144r i=INF extend
$r0 144r-240r i=INF extend
$r0 240r-248r i=INF end
$r1 136r-144r i=INF extend
$r1 144r-240r i=INF extend
$r1 240r-248r i=INF end
$r2 136r-144r i=INF extend
$r2 144r-240r i=INF extend
$r2 240r-248r i=INF end
$r3 136r-144r i=INF extend
$r3 144r-240r i=INF extend
$r3 240r-248r i=INF end
$r4 136r-144r i=5.738636e-03 w=7.075472e-03 (best) extend
$r4 136r-240r i=5.738636e-03 shrink
$r4 144r-240r i=5.738636e-03 extend
$r4 240r-248r i=5.738636e-03 w=7.075472e-03 (best) end
$r5 136r-144r i=6.011905e-03 w=7.075472e-03 extend
$r5 136r-240r i=6.011905e-03 shrink
$r5 144r-240r i=6.011905e-03 extend
$r5 240r-248r i=6.011905e-03 w=7.075472e-03 end
$r6 136r-144r i=6.312500e-03 w=7.075472e-03 extend
$r6 136r-240r i=6.312500e-03 shrink
$r6 144r-240r i=6.312500e-03 extend
$r6 240r-248r i=6.312500e-03 w=7.075472e-03 end
Best local split range: 240r-248r, 1.310072e-03, 2 instrs
enterIntvBefore 240r: valno 0
leaveIntvAfter 248r: not live
useIntv [232r;256B): [232r;256B):1
blit [136r,248r:0): [136r;232r)=0(%7):0 [232r;248r)=1(%8):0
rewr %bb.0 136r:0 %7:tgpr = COPY %4:tgpr
rewr %bb.0 144B:0 tBLXr 14, $noreg, %7:tgpr, <regmask $lr $d8 $d9 $d10 $d11
$d12 $d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17
$s18 $s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>,
implicit-def dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2,
implicit-def $sp
rewr %bb.0 240B:1 tBLXr 14, $noreg, %8:tgpr, <regmask $lr $d8 $d9 $d10 $d11
$d12 $d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17
$s18 $s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>,
implicit-def dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2,
implicit-def $sp
rewr %bb.0 248B:1 %6:tgpr = COPY %8:tgpr
rewr %bb.0 232B:0 %8:tgpr = COPY %7:tgpr
queuing new interval: %7 [136r,232r:0) 0 at 136r weight:3.054435e-03
queuing new interval: %8 [232r,248r:0) 0 at 232r weight:3.641827e-03
selectOrSplit tGPR:%4 [80r,136r:0) 0 at 80r weight:2.214912e-03 w=2.214912e-03
Checking interference for %4 [80r,136r:0) 0 at 80r weight:2.214912e-03
$r0: IK_RegUnit
$r1: IK_RegUnit
$r2: IK_RegUnit
$r3: IK_Free
assigning %4 to $r3: R3 [80r,136r:0) 0 at 80r
selectOrSplit tGPR:%7 [136r,232r:0) 0 at 136r weight:3.054435e-03
w=3.054435e-03
hints: $r3
Checking interference for %7 [136r,232r:0) 0 at 136r weight:3.054435e-03
$r3: IK_RegMask
$r0: IK_RegMask
$r1: IK_RegMask
$r2: IK_RegMask
$r4: IK_VirtReg
$r5: IK_VirtReg
$r6: IK_VirtReg
RS_Assign Cascade 0
wait for second round
queuing new interval: %7 [136r,232r:0) 0 at 136r weight:3.054435e-03
selectOrSplit tGPR:%8 [232r,248r:0) 0 at 232r weight:3.641827e-03
w=3.641827e-03
Checking interference for %8 [232r,248r:0) 0 at 232r weight:3.641827e-03
$r0: IK_RegMask
$r1: IK_RegMask
$r2: IK_RegMask
$r3: IK_RegMask
$r4: IK_VirtReg
$r5: IK_VirtReg
$r6: IK_VirtReg
RS_Assign Cascade 0
wait for second round
queuing new interval: %8 [232r,248r:0) 0 at 232r weight:3.641827e-03
selectOrSplit tGPR:%6 [248r,336r:0) 0 at 248r weight:2.069672e-03
w=2.069672e-03
Checking interference for %6 [248r,336r:0) 0 at 248r weight:2.069672e-03
$r0: IK_RegUnit
$r1: IK_RegUnit
$r2: IK_RegUnit
$r3: IK_Free
assigning %6 to $r3: R3 [248r,336r:0) 0 at 248r
selectOrSplit tGPR:%7 [136r,232r:0) 0 at 136r weight:3.054435e-03
w=3.054435e-03
hints: $r3
Checking interference for %7 [136r,232r:0) 0 at 136r weight:3.054435e-03
$r3: IK_RegMask
$r0: IK_RegMask
$r1: IK_RegMask
$r2: IK_RegMask
$r4: IK_VirtReg
$r5: IK_VirtReg
$r6: IK_VirtReg
RS_Split Cascade 0
Analyze counted 3 instrs in 1 blocks, through 0 blocks.
tryLocalSplit: 136r 144r 232r
3 regmasks in block: 144r:136r-144r 144r:144r-232r
$r3 136r-144r i=INF extend
$r3 144r-232r i=INF end
$r0 136r-144r i=INF extend
$r0 144r-232r i=INF end
$r1 136r-144r i=INF extend
$r1 144r-232r i=INF end
$r2 136r-144r i=INF extend
$r2 144r-232r i=INF end
$r4 136r-144r i=5.738636e-03 w=7.075472e-03 (best) extend
$r4 136r-232r i=5.738636e-03 all
$r5 136r-144r i=6.011905e-03 w=7.075472e-03 extend
$r5 136r-232r i=6.011905e-03 all
$r6 136r-144r i=6.312500e-03 w=7.075472e-03 extend
$r6 136r-232r i=6.312500e-03 all
Best local split range: 136r-144r, 1.310072e-03, 2 instrs
enterIntvBefore 136r: not live
leaveIntvAfter 144r: valno 0
useIntv [136B;152r): [136B;152r):1
blit [136r,232r:0): [136r;152r)=1(%10):0 [152r;232r)=0(%9):0
rewr %bb.0 136r:1 %10:tgpr = COPY %4:tgpr
rewr %bb.0 144B:1 tBLXr 14, $noreg, %10:tgpr, <regmask $lr $d8 $d9 $d10
$d11 $d12 $d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16
$s17 $s18 $s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>,
implicit-def dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2,
implicit-def $sp
rewr %bb.0 232B:0 %8:tgpr = COPY %9:tgpr
rewr %bb.0 152B:1 %9:tgpr = COPY %10:tgpr
Tagging non-progress ranges: %10
queuing new interval: %9 [152r,232r:0) 0 at 152r weight:2.104167e-03
queuing new interval: %10 [136r,152r:0) 0 at 136r weight:3.641827e-03
selectOrSplit tGPR:%10 [136r,152r:0) 0 at 136r weight:3.641827e-03
w=3.641827e-03
hints: $r3
Checking interference for %10 [136r,152r:0) 0 at 136r weight:3.641827e-03
$r3: IK_RegMask
$r0: IK_RegMask
$r1: IK_RegMask
$r2: IK_RegMask
$r4: IK_VirtReg
$r5: IK_VirtReg
$r6: IK_VirtReg
RS_Split2 Cascade 0
Analyze counted 3 instrs in 1 blocks, through 0 blocks.
tryLocalSplit: 136r 144r 152r
3 regmasks in block: 144r:136r-144r 144r:144r-152r
$r3 136r-144r i=INF extend
$r3 144r-152r i=INF end
$r0 136r-144r i=INF extend
$r0 144r-152r i=INF end
$r1 136r-144r i=INF extend
$r1 144r-152r i=INF end
$r2 136r-144r i=INF extend
$r2 144r-152r i=INF end
$r4 136r-144r i=5.738636e-03 extend
$r4 144r-152r i=5.738636e-03 end
$r5 136r-144r i=6.011905e-03 extend
$r5 144r-152r i=6.011905e-03 end
$r6 136r-144r i=6.312500e-03 extend
$r6 144r-152r i=6.312500e-03 end
Inline spilling tGPR:%10 [136r,152r:0) 0 at 136r weight:3.641827e-03
From original %3
also spill snippet %4 [80r,136r:0) 0 at 80r weight:2.214912e-03
also spill snippet %9 [152r,232r:0) 0 at 152r weight:2.104167e-03
remat: 140r %11:tgpr = tLDRpci %const.0, 14, $noreg :: (load 4 from
constant-pool)
144e tBLXr 14, $noreg, killed %11:tgpr, <regmask $lr $d8 $d9 $d10
$d11 $d12 $d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16
$s17 $s18 $s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>,
implicit-def dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2,
implicit-def $sp
remat: 228r %12:tgpr = tLDRpci %const.0, 14, $noreg :: (load 4 from
constant-pool)
232e %8:tgpr = COPY killed %12:tgpr
All defs dead: dead %10:tgpr = COPY %4:tgpr
All defs dead: dead %4:tgpr = tLDRpci %const.0, 14, $noreg :: (load 4 from
constant-pool)
All defs dead: dead %9:tgpr = COPY %10:tgpr
Remat created 3 dead defs.
Deleting dead def 152r dead %9:tgpr = COPY %10:tgpr
Deleting dead def 80r dead %4:tgpr = tLDRpci %const.0, 14, $noreg :: (load 4
from constant-pool)
unassigning %4 from $r3: R3
Deleting dead def 136r dead %10:tgpr = COPY %4:tgpr
Shrink: %4 EMPTY weight:2.214912e-03
Shrunk: %4 EMPTY weight:2.214912e-03
0 registers to spill after remat.
queuing new interval: %11 [140r,144r:0) 0 at 140r weight:INF
queuing new interval: %12 [228r,232r:0) 0 at 228r weight:INF
selectOrSplit tGPR:%11 [140r,144r:0) 0 at 140r weight:INF w=INF
Checking interference for %11 [140r,144r:0) 0 at 140r weight:INF
$r0: IK_RegUnit
$r1: IK_RegUnit
$r2: IK_RegUnit
$r3: IK_Free
assigning %11 to $r3: R3 [140r,144r:0) 0 at 140r
selectOrSplit tGPR:%12 [228r,232r:0) 0 at 228r weight:INF w=INF
Checking interference for %12 [228r,232r:0) 0 at 228r weight:INF
$r0: IK_RegUnit
$r1: IK_RegUnit
$r2: IK_RegUnit
$r3: IK_Free
assigning %12 to $r3: R3 [228r,232r:0) 0 at 228r
Dropping unused %4 EMPTY weight:2.214912e-03
Dropping unused %9 EMPTY weight:2.104167e-03
selectOrSplit tGPR:%8 [232r,248r:0) 0 at 232r weight:3.641827e-03
w=3.641827e-03
hints: $r3
Checking interference for %8 [232r,248r:0) 0 at 232r weight:3.641827e-03
$r3: IK_RegMask
$r0: IK_RegMask
$r1: IK_RegMask
$r2: IK_RegMask
$r4: IK_VirtReg
$r5: IK_VirtReg
$r6: IK_VirtReg
RS_Split Cascade 0
Analyze counted 3 instrs in 1 blocks, through 0 blocks.
tryLocalSplit: 232r 240r 248r
3 regmasks in block: 240r:232r-240r 240r:240r-248r
$r3 232r-240r i=INF extend
$r3 240r-248r i=INF end
$r0 232r-240r i=INF extend
$r0 240r-248r i=INF end
$r1 232r-240r i=INF extend
$r1 240r-248r i=INF end
$r2 232r-240r i=INF extend
$r2 240r-248r i=INF end
$r4 232r-240r i=5.738636e-03 w=7.075472e-03 (best) extend
$r4 232r-248r i=5.738636e-03 all
$r5 232r-240r i=6.011905e-03 w=7.075472e-03 extend
$r5 232r-248r i=6.011905e-03 all
$r6 232r-240r i=6.312500e-03 w=7.075472e-03 extend
$r6 232r-248r i=6.312500e-03 all
Best local split range: 232r-240r, 1.310072e-03, 2 instrs
enterIntvBefore 232r: not live
leaveIntvAfter 240r: valno 0
useIntv [232B;244r): [232B;244r):1
blit [232r,248r:0): [232r;244r)=1(%15):0 [244r;248r)=0(%14):0
rewr %bb.0 232r:1 %15:tgpr = COPY %12:tgpr
rewr %bb.0 240B:1 tBLXr 14, $noreg, %15:tgpr, <regmask $lr $d8 $d9 $d10
$d11 $d12 $d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16
$s17 $s18 $s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>,
implicit-def dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2,
implicit-def $sp
rewr %bb.0 248B:0 %6:tgpr = COPY %14:tgpr
rewr %bb.0 244B:1 %14:tgpr = COPY %15:tgpr
Tagging non-progress ranges: %15
queuing new interval: %14 [244r,248r:0) 0 at 244r weight:INF
queuing new interval: %15 [232r,244r:0) 0 at 232r weight:3.677184e-03
selectOrSplit tGPR:%15 [232r,244r:0) 0 at 232r weight:3.677184e-03
w=3.677184e-03
hints: $r3
Checking interference for %15 [232r,244r:0) 0 at 232r weight:3.677184e-03
$r3: IK_RegMask
$r0: IK_RegMask
$r1: IK_RegMask
$r2: IK_RegMask
$r4: IK_VirtReg
$r5: IK_VirtReg
$r6: IK_VirtReg
RS_Split2 Cascade 0
Analyze counted 3 instrs in 1 blocks, through 0 blocks.
tryLocalSplit: 232r 240r 244r
3 regmasks in block: 240r:232r-240r 240r:240r-244r
$r3 232r-240r i=INF extend
$r3 240r-244r i=INF end
$r0 232r-240r i=INF extend
$r0 240r-244r i=INF end
$r1 232r-240r i=INF extend
$r1 240r-244r i=INF end
$r2 232r-240r i=INF extend
$r2 240r-244r i=INF end
$r4 232r-240r i=5.738636e-03 extend
$r4 240r-244r i=5.738636e-03 end
$r5 232r-240r i=6.011905e-03 extend
$r5 240r-244r i=6.011905e-03 end
$r6 232r-240r i=6.312500e-03 extend
$r6 240r-244r i=6.312500e-03 end
Inline spilling tGPR:%15 [232r,244r:0) 0 at 232r weight:3.677184e-03
From original %3
also spill snippet %12 [228r,232r:0) 0 at 228r weight:INF
also spill snippet %14 [244r,248r:0) 0 at 244r weight:INF
remat: 236r %16:tgpr = tLDRpci %const.0, 14, $noreg :: (load 4 from
constant-pool)
240e tBLXr 14, $noreg, killed %16:tgpr, <regmask $lr $d8 $d9 $d10
$d11 $d12 $d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16
$s17 $s18 $s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>,
implicit-def dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2,
implicit-def $sp
remat: 252r %17:tgpr = tLDRpci %const.0, 14, $noreg :: (load 4 from
constant-pool)
260e %6:tgpr = COPY killed %17:tgpr
All defs dead: dead %15:tgpr = COPY %12:tgpr
All defs dead: dead %12:tgpr = tLDRpci %const.0, 14, $noreg :: (load 4 from
constant-pool)
All defs dead: dead %14:tgpr = COPY %15:tgpr
Remat created 3 dead defs.
Deleting dead def 244r dead %14:tgpr = COPY %15:tgpr
Deleting dead def 228r dead %12:tgpr = tLDRpci %const.0, 14, $noreg :: (load 4
from constant-pool)
unassigning %12 from $r3: R3
Deleting dead def 232r dead %15:tgpr = COPY %12:tgpr
Shrink: %12 EMPTY weight:INF
Shrunk: %12 EMPTY weight:INF
0 registers to spill after remat.
queuing new interval: %16 [236r,240r:0) 0 at 236r weight:INF
queuing new interval: %17 [252r,260r:0) 0 at 252r weight:INF
selectOrSplit tGPR:%17 [252r,260r:0) 0 at 252r weight:INF w=INF
hints: $r3
Checking interference for %17 [252r,260r:0) 0 at 252r weight:INF
$r3: IK_Free
assigning %17 to $r3: R3 [252r,260r:0) 0 at 252r
Dropping unused %14 EMPTY weight:INF
Dropping unused %12 EMPTY weight:INF
selectOrSplit tGPR:%16 [236r,240r:0) 0 at 236r weight:INF w=INF
Checking interference for %16 [236r,240r:0) 0 at 236r weight:INF
$r0: IK_RegUnit
$r1: IK_RegUnit
$r2: IK_RegUnit
$r3: IK_Free
assigning %16 to $r3: R3 [236r,240r:0) 0 at 236r
Trying to reconcile hints for: %2($r4)
%2($r4) is recolorable.
Trying to reconcile hints for: %1($r5)
%1($r5) is recolorable.
Trying to reconcile hints for: %0($r6)
%0($r6) is recolorable.
********** REWRITE VIRTUAL REGISTERS **********
********** Function: uECC_shared_secret
********** REGISTER MAP **********
[%0 -> $r6] tGPR
[%1 -> $r5] tGPR
[%2 -> $r4] tGPR
[%6 -> $r3] tGPR
[%11 -> $r3] tGPR
[%16 -> $r3] tGPR
[%17 -> $r3] tGPR
0B bb.0.entry:
liveins: $r0, $r1, $r2
16B %2:tgpr = COPY $r2
32B %1:tgpr = COPY $r1
48B %0:tgpr = COPY $r0
64B ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
96B $r0 = COPY %0:tgpr
112B $r1 = COPY %1:tgpr
128B $r2 = COPY %2:tgpr
140B %11:tgpr = tLDRpci %const.0, 14, $noreg :: (load 4 from constant-pool)
144B tBLXr 14, $noreg, killed %11:tgpr, <regmask $lr $d8 $d9 $d10 $d11 $d12
$d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17 $s18
$s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>, implicit-def
dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2, implicit-def
$sp
160B ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
176B ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
192B $r0 = COPY %0:tgpr
208B $r1 = COPY %1:tgpr
224B $r2 = COPY %2:tgpr
236B %16:tgpr = tLDRpci %const.0, 14, $noreg :: (load 4 from constant-pool)
240B tBLXr 14, $noreg, killed %16:tgpr, <regmask $lr $d8 $d9 $d10 $d11 $d12
$d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17 $s18
$s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>, implicit-def
dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2, implicit-def
$sp
252B %17:tgpr = tLDRpci %const.0, 14, $noreg :: (load 4 from constant-pool)
260B %6:tgpr = COPY killed %17:tgpr
268B ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
272B ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
288B $r0 = COPY killed %0:tgpr
304B $r1 = COPY killed %1:tgpr
320B $r2 = COPY killed %2:tgpr
336B tBLXr 14, $noreg, killed %6:tgpr, <regmask $lr $d8 $d9 $d10 $d11 $d12
$d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17 $s18
$s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>, implicit-def
dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2, implicit-def
$sp
352B ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
368B tBX_RET 14, $noreg> renamable $r4 = COPY $r2
> renamable $r5 = COPY $r1
> renamable $r6 = COPY $r0
> ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
> $r0 = COPY renamable $r6
> $r1 = COPY renamable $r5
> $r2 = COPY renamable $r4
> renamable $r3 = tLDRpci %const.0, 14, $noreg :: (load 4 from constant-pool)
> tBLXr 14, $noreg, killed renamable $r3, <regmask $lr $d8 $d9 $d10 $d11
$d12 $d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17
$s18 $s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>,
implicit-def dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2,
implicit-def $sp
> ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
> ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
> $r0 = COPY renamable $r6
> $r1 = COPY renamable $r5
> $r2 = COPY renamable $r4
> renamable $r3 = tLDRpci %const.0, 14, $noreg :: (load 4 from constant-pool)
> tBLXr 14, $noreg, killed renamable $r3, <regmask $lr $d8 $d9 $d10 $d11
$d12 $d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17
$s18 $s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>,
implicit-def dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2,
implicit-def $sp
> renamable $r3 = tLDRpci %const.0, 14, $noreg :: (load 4 from constant-pool)
> renamable $r3 = COPY killed renamable $r3
Identity copy: renamable $r3 = COPY killed renamable $r3
deleted.> ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
> ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
> $r0 = COPY killed renamable $r6
> $r1 = COPY killed renamable $r5
> $r2 = COPY killed renamable $r4
> tBLXr 14, $noreg, killed renamable $r3, <regmask $lr $d8 $d9 $d10 $d11
$d12 $d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17
$s18 $s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>,
implicit-def dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2,
implicit-def $sp
> ADJCALLSTACKUP 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp
> tBX_RET 14, $noreg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reduced.c
Type: text/x-csrc
Size: 329 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200331/5b9500b3/attachment-0001.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reduced.s
Type: application/octet-stream
Size: 1752 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200331/5b9500b3/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reduced.ll
Type: application/octet-stream
Size: 1918 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200331/5b9500b3/attachment-0003.obj>
John Brawn via llvm-dev
2020-Apr-07 18:25 UTC
[llvm-dev] [ARM] Register pressure with -mthumb forces register reload before each call
If I'm understanding what's going on in this test correctly, what's happening is: * ARMTargetLowering::LowerCall prefers indirect calls when a function is called at least 3 times in minsize * In thumb 1 (without -fno-omit-frame-pointer) we have effectively only 3 callee-saved registers (r4-r6) * The function has three arguments, so those three plus the register we need to hold the function address is more than our callee-saved registers * Therefore something needs to be spilt * The function address can be rematerialized, so we spill that and insert and LDR before each call If we didn't have this spilling happening (e.g. if the function had one less argument) then the code size of using BL vs BLX * BL: 3*4-byte BL = 12 bytes * BX: 3*2-byte BX + 1*2-byte LDR + 4-byte litpool = 12 bytes (So maybe even not considering spilling, LowerCall should be adjusted to do this for functions called 4 or more times) When we have to spill, if we compare spilling the functions address vs spilling an argument: * BX with spilt fn: 3*2-byte BX + 3*2-byte LDR + 4-byte litpool = 16 bytes * BX with spilt arg: 3*2-byte BX + 1*2-byte LDR + 4-byte litpool + 1*2-byte STR + 2*2-byte LDR = 18 bytes So just changing the spilling heuristic won't work. The two ways I see of fixing this: * In LowerCall only prefer an indirect call if the number of integer register arguments is less than the number of callee-saved registers. * When the load of the function address is spilled, instead of just rematerializing the load instead convert the BX back into BL. The first of these would be easier, but there will be situations where we need to use less than three callee-saved registers (e.g. arguments are loaded from a pointer) and there are situations where we will spill the function address for reasons entirely unrelated to the function arguments (e.g. if we have enough live local variables). For the second, looking at InlineSpiller.cpp it does have the concept of rematerializing by folding a memory operand into another instruction, so I think we could make use of that to do this. It looks like it would involve adding a foldMemoryOperand function to ARMInstrInfo and then have this fold a LDR into a BX by turning it into a BL. John ________________________________ From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Prathamesh Kulkarni via llvm-dev <llvm-dev at lists.llvm.org> Sent: 07 April 2020 21:07 To: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] [ARM] Register pressure with -mthumb forces register reload before each call On Tue, 31 Mar 2020 at 22:03, Prathamesh Kulkarni <prathamesh.kulkarni at linaro.org> wrote:> > Hi, > Compiling attached test-case, which is reduced version of of > uECC_shared_secret from tinycrypt library [1], with > --target=arm-linux-gnueabi -march=armv6-m -Oz -S > results in reloading of register holding function's address before > every call to blx: > > ldr r3, .LCPI0_0 > blx r3 > mov r0, r6 > mov r1, r5 > mov r2, r4 > ldr r3, .LCPI0_0 > blx r3 > ldr r3, .LCPI0_0 > mov r0, r6 > mov r1, r5 > mov r2, r4 > blx r3 > > .LCPI0_0: > .long foo > > From dump of regalloc (attached), AFAIU, what seems to happen during > greedy allocator is, all virt regs %0 to %3 are live across first two > calls to foo. Thus %0, %1 and %2 get assigned r6, r5 and r4 > respectively, and %3 which holds foo's address doesn't have any > register left. > Since it's live-range has least weight, it does not evict any existing interval, > and gets split. Eventually we have the following allocation: > > [%0 -> $r6] tGPR > [%1 -> $r5] tGPR > [%2 -> $r4] tGPR > [%6 -> $r3] tGPR > [%11 -> $r3] tGPR > [%16 -> $r3] tGPR > [%17 -> $r3] tGPR > > where %6, %11, %16 and %17 all are derived from %3. > And since r3 is a call-clobbered register, the compiler is forced to > reload foo's address > each time before blx. > > To fix this, I thought of following approaches: > (a) Disable the heuristic to prefer indirect call when there are at > least 3 calls to > same function in basic block in ARMTargetLowering::LowerCall for Thumb-1 ISA. > > (b) In ARMTargetLowering::LowerCall, put another constraint like > number of arguments, as a proxy for register pressure for Thumb-1, but > that's bound to trip another cases. > > (c) Give higher priority to allocate vrit reg used for indirect calls > ? However, if that > results in spilling of some other register, it would defeat the > purpose of saving code-size. I suppose ideally we want to trigger the > heuristic of using indirect call only when we know beforehand that it > will not result in spilling. But I am not sure if it's possible to > estimate that during isel ? > > I would be grateful for suggestions on how to proceed further.ping ? Thanks, Prathamesh> > [1] https://github.com/intel/tinycrypt/blob/master/lib/source/ecc_dh.c#L139 > > Thanks, > Prathamesh_______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200407/9539ed17/attachment.html>
Prathamesh Kulkarni via llvm-dev
2020-Apr-07 20:07 UTC
[llvm-dev] [ARM] Register pressure with -mthumb forces register reload before each call
On Tue, 31 Mar 2020 at 22:03, Prathamesh Kulkarni <prathamesh.kulkarni at linaro.org> wrote:> > Hi, > Compiling attached test-case, which is reduced version of of > uECC_shared_secret from tinycrypt library [1], with > --target=arm-linux-gnueabi -march=armv6-m -Oz -S > results in reloading of register holding function's address before > every call to blx: > > ldr r3, .LCPI0_0 > blx r3 > mov r0, r6 > mov r1, r5 > mov r2, r4 > ldr r3, .LCPI0_0 > blx r3 > ldr r3, .LCPI0_0 > mov r0, r6 > mov r1, r5 > mov r2, r4 > blx r3 > > .LCPI0_0: > .long foo > > From dump of regalloc (attached), AFAIU, what seems to happen during > greedy allocator is, all virt regs %0 to %3 are live across first two > calls to foo. Thus %0, %1 and %2 get assigned r6, r5 and r4 > respectively, and %3 which holds foo's address doesn't have any > register left. > Since it's live-range has least weight, it does not evict any existing interval, > and gets split. Eventually we have the following allocation: > > [%0 -> $r6] tGPR > [%1 -> $r5] tGPR > [%2 -> $r4] tGPR > [%6 -> $r3] tGPR > [%11 -> $r3] tGPR > [%16 -> $r3] tGPR > [%17 -> $r3] tGPR > > where %6, %11, %16 and %17 all are derived from %3. > And since r3 is a call-clobbered register, the compiler is forced to > reload foo's address > each time before blx. > > To fix this, I thought of following approaches: > (a) Disable the heuristic to prefer indirect call when there are at > least 3 calls to > same function in basic block in ARMTargetLowering::LowerCall for Thumb-1 ISA. > > (b) In ARMTargetLowering::LowerCall, put another constraint like > number of arguments, as a proxy for register pressure for Thumb-1, but > that's bound to trip another cases. > > (c) Give higher priority to allocate vrit reg used for indirect calls > ? However, if that > results in spilling of some other register, it would defeat the > purpose of saving code-size. I suppose ideally we want to trigger the > heuristic of using indirect call only when we know beforehand that it > will not result in spilling. But I am not sure if it's possible to > estimate that during isel ? > > I would be grateful for suggestions on how to proceed further.ping ? Thanks, Prathamesh> > [1] https://github.com/intel/tinycrypt/blob/master/lib/source/ecc_dh.c#L139 > > Thanks, > Prathamesh
Apparently Analagous Threads
- [ARM] Register pressure with -mthumb forces register reload before each call
- [ARM] Register pressure with -mthumb forces register reload before each call
- [ARM] Register pressure with -mthumb forces register reload before each call
- [LLVMdev] RegisterCoalescing Pass seems to ignore part of CFG.
- [LLVMdev] RegisterCoalescing Pass seems to ignore part of CFG.