I have my new port limping enough to compile a very basic function: int foo (int a, int b, int c, int d) { return a + b - c + d; } clang-cc -O2 yields: define i32 @foo(i32 %a, i32 %b, i32 %c, i32 %d) nounwind readnone { entry: %add = add i32 %b, %a ; <i32> [#uses=1] %sub = sub i32 %add, %c ; <i32> [#uses=1] %add4 = add i32 %sub, %d ; <i32> [#uses=1] ret i32 %add4 } which lowers to this assembler code (note: args arrive in r1..r12, and results are returned in r1..r3.): foo: add r2,r1 ### add r1,r2 is better sub r2,r3 mov r1,r2 ### unnecessary!! add r1,r4 jmp [r30] .end foo The mov insn would be unnecessary if the operand order for the first add were reversed. For this function, GCC does the right thing. Is there some optimizer knob I'm not turning properly? In more complex cases, GCC does poorly with two-address operand choices and so bloats the code with unnecessary register moves. I have high hopes LLVM can do better, so this result for a simple case is bothersome. G
On Apr 16, 2009, at 3:17 PM, Greg McGary wrote:> I have my new port limping enough to compile a very basic function: > > int > foo (int a, int b, int c, int d) > { > return a + b - c + d; > } > > clang-cc -O2 yields: > > define i32 @foo(i32 %a, i32 %b, i32 %c, i32 %d) nounwind readnone { > entry: > %add = add i32 %b, %a ; <i32> [#uses=1] > %sub = sub i32 %add, %c ; <i32> [#uses=1] > %add4 = add i32 %sub, %d ; <i32> [#uses=1] > ret i32 %add4 > } > > which lowers to this assembler code (note: args arrive in r1..r12, and > results are returned in r1..r3.): > > foo: > add r2,r1 ### add r1,r2 is better > sub r2,r3 > mov r1,r2 ### unnecessary!! > add r1,r4 > jmp [r30] > .end foo > > The mov insn would be unnecessary if the operand order for the first > add > were reversed. For this function, GCC does the right thing. > > Is there some optimizer knob I'm not turning properly? In more > complex > cases, GCC does poorly with two-address operand choices and so bloats > the code with unnecessary register moves. I have high hopes LLVM > can do > better, so this result for a simple case is bothersome.Are you marking add as commutable? Are you making mov as a copy instruction? Evan> > G > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Evan Cheng wrote:> On Apr 16, 2009, at 3:17 PM, Greg McGary wrote: > >> Is there some optimizer knob I'm not turning properly? In more complex >> cases, GCC does poorly with two-address operand choices and so bloats >> the code with unnecessary register moves. I have high hopes LLVM >> can do better, so this result for a simple case is bothersome. >> > > Are you marking add as commutable? Are you making mov as a copy > instruction? >How do I mark them? For the commutative property, I observed this definition: def add : SDNode<"ISD::ADD" , SDTIntBinOp , [SDNPCommutative, SDNPAssociative]>; ... and assumed it was sufficient, since I saw no other targets making special arrangements. I see no obvious (to me, anyway 8^) "copy instruction" property. The insn in question is generated by copyRegToReg(), and satisfies the isMoveInstr() predicate. G