On Sep 4, 2010, at 5:40 PM, Eli Friedman wrote:
> If you want to take a look at this yourself, the issue is easy to
> reproduce with Thumb1:
Thanks, Eli. Nice catch!
This IR:
target triple = "thumbv5-u-u"
define arm_aapcscc i64 @foo(i64 %a, i64 %b) nounwind readnone {
entry:
%xor = xor i64 %a, 18 ; <i64> [#uses=1]
%xor2 = xor i64 %xor, %b ; <i64> [#uses=1]
ret i64 %xor2
}
produces these instructions before coalescing:
4L %reg16387<def> = COPY %R3<kill>
12L %reg16386<def> = COPY %R2<kill>
28L %reg16384<def> = COPY %R0<kill>
36L %reg16388<def> = COPY %reg16385<kill>
44L %reg16388<def>, %CPSR<def,dead> = tEOR %reg16388,
%reg16387<kill>, pred:14, pred:%reg0
56L %reg16389<def> = COPY %reg16384<kill>
64L %reg16389<def>, %CPSR<def,dead> = tEOR %reg16389,
%reg16386<kill>, pred:14, pred:%reg0
76L %reg16390<def>, %CPSR<def,dead> = tMOVi8 18, pred:14,
pred:%reg0
88L %reg16391<def> = COPY %reg16390<kill>
96L %reg16391<def>, %CPSR<def,dead> = tEOR %reg16391,
%reg16389<kill>, pred:14, pred:%reg0
108L %R0<def> = COPY %reg16391<kill>
116L %R1<def> = COPY %reg16388<kill>
128L tBX_RET %R0<imp-use,kill>, %R1<imp-use,kill>
and after:
44L %R1<def>, %CPSR<def,dead> = tEOR %R1, %R3<kill>,
pred:14, pred:%reg0
56L %reg16389<def> = COPY %R0<kill>
64L %reg16389<def>, %CPSR<def,dead> = tEOR %reg16389,
%R2<kill>, pred:14, pred:%reg0
76L %R0<def>, %CPSR<def,dead> = tMOVi8 18, pred:14, pred:%reg0
96L %R0<def>, %CPSR<def,dead> = tEOR %R0, %reg16389<kill>,
pred:14, pred:%reg0
128L tBX_RET %R0<imp-use,kill>, %R1<imp-use,kill>
We see, as Borja pointed out, that %R0 from the 108L COPY has been joined with
%reg16391 and %reg16390 so it is too late to commute the xor.
Passing -disable-physical-join to prevent the %R0 sabotage, we get:
4L %reg16387<def> = COPY %R3<kill>; tGPR:%reg16387
12L %reg16386<def> = COPY %R2<kill>; tGPR:%reg16386
20L %reg16388<def> = COPY %R1<kill>; tGPR:%reg16388
28L %reg16389<def> = COPY %R0<kill>; tGPR:%reg16389
44L %reg16388<def>, %CPSR<def,dead> = tEOR %reg16388,
%reg16387<kill>, pred:14, pred:%reg0; tGPR:%reg16388,16387
64L %reg16389<def>, %CPSR<def,dead> = tEOR %reg16389,
%reg16386<kill>, pred:14, pred:%reg0; tGPR:%reg16389,16386
76L %reg16391<def>, %CPSR<def,dead> = tMOVi8 18, pred:14,
pred:%reg0; tGPR:%reg16391
96L %reg16391<def>, %CPSR<def,dead> = tEOR %reg16391,
%reg16389<kill>, pred:14, pred:%reg0; tGPR:%reg16391,16389
108L %R0<def> = COPY %reg16391<kill>; tGPR:%reg16391
116L %R1<def> = COPY %reg16388<kill>; tGPR:%reg16388
128L tBX_RET %R0<imp-use,kill>, %R1<imp-use,kill>
It is not easy to see here that the 96L tEOR should be commuted. You would have
to notice that the hints for %reg16389 and %reg16391 are clashing.
After register allocation with hinting it becomes:
%R1<def>, %CPSR<def,dead> = tEOR %R1, %R3<kill>,
pred:14, pred:%reg0
%R0<def>, %CPSR<def,dead> = tEOR %R0, %R2<kill>,
pred:14, pred:%reg0
%R2<def>, %CPSR<def,dead> = tMOVi8 18, pred:14, pred:%reg0
%R2<def>, %CPSR<def,dead> = tEOR %R2, %R0<kill>,
pred:14, pred:%reg0
%R0<def> = COPY %R2<kill>
tBX_RET %R0<imp-use,kill>, %R1<imp-use,kill>
There are two fundamental deficiencies here:
1. The coalescer is not very good at handling conflicting joins. The examples
show that different orders of joining can give different results. The coalescer
uses heuristics to pick an order. It doesn't try to find an optimal order.
2. Commuting two-address instructions is not really integrated into the
coalescer algorithm. It is more of an afterthought, calling
RemoveCopyByCommutingDef when a copy could otherwise not be removed.
/jakob
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100905/8bda9d8c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1929 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100905/8bda9d8c/attachment.bin>