On Sep 4, 2010, at 11:21 AM, Borja Ferrer wrote:> I've noticed this pattern happening with other operators aswell, but used xor in this example. As i said before, i tried with different register allocation orders, but it will produce always the same result. GCC is emitting longer code, but since LLVM is so nearer to the optimal code sequence i wanted to reach it.In LLVM, copies are coalesced away as much as possible before registers are allocated, so the allocation order wouldn't affect it. Try looking at the output of -debug-only=regcoalescing to see what is going wrong. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1929 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100904/f9ef1c08/attachment.bin>
I've compiled the program with -debug-only=regcoalescing and looked at the output which i've attached in this email. It's pretty advanced so I understand part of it, but i get lost with all the register live ranges. The instructions are: 44 %R15<def> = XORRdRr %R15, %R13<kill> 52 %reg1029<def> = MOVRdRr %R14<kill> 60 %reg1029<def> = XORRdRr %reg1029, %R12<kill> 68 %R14<def> = MOVIRdK 18 76 %R14<def> = XORRdRr %R14, %reg1029<kill> i guess 68 should be assigned a virtual reg instead of 52 leaving reg1029 for later allocation. If there is anymore debug info i can attach please let me know. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100905/5f2964de/attachment.html> -------------- next part -------------- ********** SIMPLE REGISTER COALESCING ********** ********** Function: foo ********** JOINING INTERVALS *********** entry: 4 %reg1027<def> = MOVRdRr %R12<kill> Inspecting R12,inf = [0,6:0) 0 at 0*-(6) and %reg1027,0.000000e+00 = [6,62:0) 0 at 6-(62): Joined. Result = R12,inf = [0,62:0) 0 at 0*-(62) 12 %reg1026<def> = MOVRdRr %R13<kill> Inspecting R13,inf = [0,14:0) 0 at 0*-(14) and %reg1026,0.000000e+00 = [14,46:0) 0 at 14-(46): Joined. Result = R13,inf = [0,46:0) 0 at 0*-(46) 20 %reg1025<def> = MOVRdRr %R14<kill> Inspecting R14,inf = [0,22:0)[94,106:1) 0 at 0*-(22) 1 at 94-(106) and %reg1025,0.000000e+00 = [22,54:0) 0 at 22-(54): Joined. Result = R14,inf = [0,54:0)[94,106:1) 0 at 0*-(54) 1 at 94-(106) 28 %reg1024<def> = MOVRdRr %R15<kill> Inspecting R15,inf = [0,30:0)[86,106:1) 0 at 0*-(30) 1 at 86-(106) and %reg1024,0.000000e+00 = [30,38:0) 0 at 30-(38): Joined. Result = R15,inf = [0,38:0)[86,106:1) 0 at 0*-(38) 1 at 86-(106) 84 %R15<def> = MOVRdRr %reg1028<kill> Inspecting %reg1028,0.000000e+00 = [38,46:1)[46,86:0) 0 at 46-(86) 1 at 38-(46) and R15,inf = [0,38:0)[86,106:1) 0 at 0*-(38) 1 at 86-(106): Joined. Result = R15,inf = [0,46:0)[46,106:1) 0 at 0*-(46) 1 at 46-(106) 92 %R14<def> = MOVRdRr %reg1031<kill> Inspecting %reg1031,0.000000e+00 = [70,78:1)[78,94:0) 0 at 78-(94) 1 at 70-(78) and R14,inf = [0,54:0)[94,106:1) 0 at 0*-(54) 1 at 94-(106): Joined. Result = R14,inf = [0,54:0)[70,78:2)[78,106:1) 0 at 0*-(54) 1 at 78-(106) 2 at 70-(78) 36 %R15<def> = MOVRdRr %R15<kill> Copy already coalesced. 52 %reg1029<def> = MOVRdRr %R14<kill> Inspecting R14,inf = [0,54:0)[70,78:2)[78,106:1) 0 at 0*-(54) 1 at 78-(106) 2 at 70-(78) and %reg1029,0.000000e+00 = [54,62:1)[62,78:0) 0 at 62-(78) 1 at 54-(62): Interference! 52 %reg1029<def> = MOVRdRr %R14<kill> Inspecting R14,inf = [0,54:0)[70,78:2)[78,106:1) 0 at 0*-(54) 1 at 78-(106) 2 at 70-(78) and %reg1029,0.000000e+00 = [54,62:1)[62,78:0) 0 at 62-(78) 1 at 54-(62): Interference! ********** INTERVALS POST JOINING ********** R14,inf = [0,54:0)[70,78:2)[78,106:1) 0 at 0*-(54) 1 at 78-(106) 2 at 70-(78) R12,inf = [0,62:0) 0 at 0*-(62) R15,inf = [0,46:0)[46,106:1) 0 at 0*-(46) 1 at 46-(106) R13,inf = [0,46:0) 0 at 0*-(46) %reg1029,0.000000e+00 = [54,62:1)[62,78:0) 0 at 62-(78) 1 at 54-(62) ********** INTERVALS ********** R14,inf = [0,54:0)[70,78:2)[78,106:1) 0 at 0*-(54) 1 at 78-(106) 2 at 70-(78) R12,inf = [0,62:0) 0 at 0*-(62) R15,inf = [0,46:0)[46,106:1) 0 at 0*-(46) 1 at 46-(106) R13,inf = [0,46:0) 0 at 0*-(46) %reg1029,0.000000e+00 = [54,62:1)[62,78:0) 0 at 62-(78) 1 at 54-(62) ********** MACHINEINSTRS ********** BB#0: # derived from entry 44 %R15<def> = XORRdRr %R15, %R13<kill> 52 %reg1029<def> = MOVRdRr %R14<kill> 60 %reg1029<def> = XORRdRr %reg1029, %R12<kill> 68 %R14<def> = MOVIRdK 18 76 %R14<def> = XORRdRr %R14, %reg1029<kill> 104 RET %R15<imp-use,kill>, %R14<imp-use,kill>
On Sat, Sep 4, 2010 at 1:31 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:> > On Sep 4, 2010, at 11:21 AM, Borja Ferrer wrote: > >> I've noticed this pattern happening with other operators aswell, but used xor in this example. As i said before, i tried with different register allocation orders, but it will produce always the same result. GCC is emitting longer code, but since LLVM is so nearer to the optimal code sequence i wanted to reach it. > > In LLVM, copies are coalesced away as much as possible before registers are allocated, so the allocation order wouldn't affect it. > > Try looking at the output of -debug-only=regcoalescing to see what is going wrong.If you want to take a look at this yourself, the issue is easy to reproduce with Thumb1: $ cat > test.c typedef unsigned long long t; t foo(t a, t b) { t a4 = b^a^18; return a4; } $ clang -cc1 -triple thumbv5-u-u -S -O2 test.c -o - [...] eors r1, r3 mov r3, r0 eors r3, r2 movs r0, #18 eors r0, r3 bx lr [...] -Eli
On Sep 4, 2010, at 5:40 PM, Eli Friedman wrote:> If you want to take a look at this yourself, the issue is easy to > reproduce with Thumb1:Thanks, Eli. Nice catch! This IR: target triple = "thumbv5-u-u" define arm_aapcscc i64 @foo(i64 %a, i64 %b) nounwind readnone { entry: %xor = xor i64 %a, 18 ; <i64> [#uses=1] %xor2 = xor i64 %xor, %b ; <i64> [#uses=1] ret i64 %xor2 } produces these instructions before coalescing: 4L %reg16387<def> = COPY %R3<kill> 12L %reg16386<def> = COPY %R2<kill> 28L %reg16384<def> = COPY %R0<kill> 36L %reg16388<def> = COPY %reg16385<kill> 44L %reg16388<def>, %CPSR<def,dead> = tEOR %reg16388, %reg16387<kill>, pred:14, pred:%reg0 56L %reg16389<def> = COPY %reg16384<kill> 64L %reg16389<def>, %CPSR<def,dead> = tEOR %reg16389, %reg16386<kill>, pred:14, pred:%reg0 76L %reg16390<def>, %CPSR<def,dead> = tMOVi8 18, pred:14, pred:%reg0 88L %reg16391<def> = COPY %reg16390<kill> 96L %reg16391<def>, %CPSR<def,dead> = tEOR %reg16391, %reg16389<kill>, pred:14, pred:%reg0 108L %R0<def> = COPY %reg16391<kill> 116L %R1<def> = COPY %reg16388<kill> 128L tBX_RET %R0<imp-use,kill>, %R1<imp-use,kill> and after: 44L %R1<def>, %CPSR<def,dead> = tEOR %R1, %R3<kill>, pred:14, pred:%reg0 56L %reg16389<def> = COPY %R0<kill> 64L %reg16389<def>, %CPSR<def,dead> = tEOR %reg16389, %R2<kill>, pred:14, pred:%reg0 76L %R0<def>, %CPSR<def,dead> = tMOVi8 18, pred:14, pred:%reg0 96L %R0<def>, %CPSR<def,dead> = tEOR %R0, %reg16389<kill>, pred:14, pred:%reg0 128L tBX_RET %R0<imp-use,kill>, %R1<imp-use,kill> We see, as Borja pointed out, that %R0 from the 108L COPY has been joined with %reg16391 and %reg16390 so it is too late to commute the xor. Passing -disable-physical-join to prevent the %R0 sabotage, we get: 4L %reg16387<def> = COPY %R3<kill>; tGPR:%reg16387 12L %reg16386<def> = COPY %R2<kill>; tGPR:%reg16386 20L %reg16388<def> = COPY %R1<kill>; tGPR:%reg16388 28L %reg16389<def> = COPY %R0<kill>; tGPR:%reg16389 44L %reg16388<def>, %CPSR<def,dead> = tEOR %reg16388, %reg16387<kill>, pred:14, pred:%reg0; tGPR:%reg16388,16387 64L %reg16389<def>, %CPSR<def,dead> = tEOR %reg16389, %reg16386<kill>, pred:14, pred:%reg0; tGPR:%reg16389,16386 76L %reg16391<def>, %CPSR<def,dead> = tMOVi8 18, pred:14, pred:%reg0; tGPR:%reg16391 96L %reg16391<def>, %CPSR<def,dead> = tEOR %reg16391, %reg16389<kill>, pred:14, pred:%reg0; tGPR:%reg16391,16389 108L %R0<def> = COPY %reg16391<kill>; tGPR:%reg16391 116L %R1<def> = COPY %reg16388<kill>; tGPR:%reg16388 128L tBX_RET %R0<imp-use,kill>, %R1<imp-use,kill> It is not easy to see here that the 96L tEOR should be commuted. You would have to notice that the hints for %reg16389 and %reg16391 are clashing. After register allocation with hinting it becomes: %R1<def>, %CPSR<def,dead> = tEOR %R1, %R3<kill>, pred:14, pred:%reg0 %R0<def>, %CPSR<def,dead> = tEOR %R0, %R2<kill>, pred:14, pred:%reg0 %R2<def>, %CPSR<def,dead> = tMOVi8 18, pred:14, pred:%reg0 %R2<def>, %CPSR<def,dead> = tEOR %R2, %R0<kill>, pred:14, pred:%reg0 %R0<def> = COPY %R2<kill> tBX_RET %R0<imp-use,kill>, %R1<imp-use,kill> There are two fundamental deficiencies here: 1. The coalescer is not very good at handling conflicting joins. The examples show that different orders of joining can give different results. The coalescer uses heuristics to pick an order. It doesn't try to find an optimal order. 2. Commuting two-address instructions is not really integrated into the coalescer algorithm. It is more of an afterthought, calling RemoveCopyByCommutingDef when a copy could otherwise not be removed. /jakob -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100905/8bda9d8c/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 1929 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100905/8bda9d8c/attachment.bin>
Possibly Parallel Threads
- [LLVMdev] Optimization passes break machine instructions on new backend
- [LLVMdev] Possible missed optimization?
- [LLVMdev] Possible missed optimization?
- [LLVMdev] [LLVMDev] Register Allocation and Kill Flags
- [LLVMdev] [LLVMDev] Register Allocation and copy instructions