Hello, while testing trivial functions in my backend i noticed a suboptimal way of assigning regs that had the following pattern, consider the following function: typedef unsigned short t; t foo(t a, t b) { t a4 = b^a^18; return a4; } Argument "a" is passed in R15:R14 and argument "b" is passed in R13:R12, the return value is stored in R15:R14. Producing the following asm code: xor r15, r13 ; xor top part mov r8, r14 xor r8, r12 ; xor bottom part movi r14, 18 xor r14, r8 ; xor bottom part with imm value However it could have produced: xor r15, r13 xor r14, r12 movi r8, 18 xor r14, r8 This saves the first move instruction by using any scratch register when moving the imm value instead of using r14 which is used as an incoming reg and as the return value. Is this a missed optimization from LLVM or did i miss something out? Changing the register allocation order didnt work. Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100904/a6db132b/attachment.html>
On Sep 4, 2010, at 9:54 AM, Borja Ferrer <borja.ferav at gmail.com> wrote:> Producing the following asm code: > xor r15, r13 ; xor top part > mov r8, r14 > xor r8, r12 ; xor bottom part > movi r14, 18 > xor r14, r8 ; xor bottom part with imm value > > However it could have produced: > xor r15, r13 > xor r14, r12 > movi r8, 18 > xor r14, r8Only by commuting the xor operands. Have you marked xor as commutable? See the other targets for how to do that.
Indeed, i've marked it as commutable: let isCommutable = 1, isTwoAddress = 1 in def XORRdRr : FRdRr<0b0010, 0b01, (outs GPR8:$dst), (ins GPR8:$src1, GPR8:$src2), "xor\t$dst, $src2", [(set GPR8:$dst, (xor GPR8:$src1, GPR8:$src2))]>; -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100904/60166a0b/attachment.html>
I've noticed this pattern happening with other operators aswell, but used xor in this example. As i said before, i tried with different register allocation orders, but it will produce always the same result. GCC is emitting longer code, but since LLVM is so nearer to the optimal code sequence i wanted to reach it. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100904/ab1c47bd/attachment.html>
Hello> and as the return value. Is this a missed optimization from LLVM or did i > miss something out? > Changing the register allocation order didnt work.What are the patterns for xor / mov ? -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
The pattern for the copy integer to reg is: let isReMaterializable = 1, isAsCheapAsAMove = 1 in def MOVIRdK : FRdK<0b1110, (outs GPR8:$dst), (ins i8imm:$src), "movi\t$dst, $src", [(set GPR8:$dst, imm:$src)]>; The xor pattern is : let isCommutable = 1, isTwoAddress = 1 in def XORRdRr : FRdRr<0b0010, 0b01, (outs GPR8:$dst), (ins GPR8:$src1, GPR8:$src2), "xor\t$dst, $src2", [(set GPR8:$dst, (xor GPR8:$src1, GPR8:$src2))]>; -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100904/965e07e0/attachment.html>
Could it be that an optimization was actually missed? It seems that there is some sort of look-ahead optimization is happening because the constant value was moved into r14, when the optimal choice was not to move r14 and keep the variable into r14. All these R14 with different values is confusing. My mental SSA form looks like: t1 = b_high ^ a_low t2 = b_low ^ a_low t3 = 18 t4 = t2 ^ t3 5: return t1, t4 Now, when the registers are allocated, the t4 attempts to use a register already used by either t2 or t3. However a constant value is in t3. The allocator could attempt to re-use t3 instead of t2 because it doesn't know about t3 being a constant. Thus it inserts a move on t2 and re-uses t3. At the allocator standpoint consider: 1: t1 = b_high ^ a_high ; b_high and a_high are dead and t1 may be any b_high or b_low. 2: t2 = b_low ^ a_low ; b_low and a_low are also dead. t2 may be either. 3: t3 = 18 ; 18 may be any register 4: t4 = t2 ^ t3 ; t4 may reuse t2 or t3. 5: return t1, t4 The optimal value at (1) is of course R14 or a_high's old value. This is a simple choice. The other choice of t4 is a bit harder because it may be t2 or t3. At first glance a large enumerative tree search would be exponential. However, I don't know much of the theory behind reusing registers like this. Thanks, Jeff Kunkel On Sat, Sep 4, 2010 at 12:54 PM, Borja Ferrer <borja.ferav at gmail.com> wrote:> > Hello, while testing trivial functions in my backend i noticed a suboptimal way of assigning regs that had the following pattern, consider the following function: > > typedef unsigned short t; > t foo(t a, t b) > { > t a4 = b^a^18; > return a4; > } > > Argument "a" is passed in R15:R14 and argument "b" is passed in R13:R12, the return value is stored in R15:R14. > > Producing the following asm code: > xor r15, r13 ; xor top part > mov r8, r14 > xor r8, r12 ; xor bottom part > movi r14, 18 > xor r14, r8 ; xor bottom part with imm value > > However it could have produced: > xor r15, r13 > xor r14, r12 > movi r8, 18 > xor r14, r8 > > This saves the first move instruction by using any scratch register when moving the imm value instead of using r14 which is used as an incoming reg and as the return value. Is this a missed optimization from LLVM or did i miss something out? > Changing the register allocation order didnt work. > > Thanks > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >