For the little C test program where a constant is stored in memory and also used as a parameter: #include <stdint.h> uint64_t val, *p; extern uint64_t xtr( uint64_t); uint64_t caller() { uint64_t x; p = &val; x = 12345123400L; *p = x; return xtr(x); } clang (3.2, 3.3 and svn) generates the following X86 code (at -O3): caller: movq $val, p(%rip) movabsq $12345123400, %rax # imm = 0x2DFD3A248 movq %rax, val(%rip) movabsq $12345123400, %rdi # imm = 0x2DFD3A248 jmp xtr # TAILCALL Notice that the constant value is loaded twice, once into %rax and again into %rdi as the first parameter for the (tail) call. There is no need for the second constant load - the value is already in a register. If the target register was assigned as %rdi there would be no need for the second load instruction at all. gcc (4.6+) generates better code: caller: movabsq $12345123400, %rdi movq $val, p(%rip) movq %rdi, val(%rip) jmp xtr I expected that this optimization would be picked up in a cse, gvn, machine-cse or even peepholing pass. Comments? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130802/055f499e/attachment.html>
Rafael Espíndola
2013-Aug-02 20:37 UTC
[LLVMdev] Missing optimization - constant parameter
> I expected that this optimization would be picked > up in a cse, gvn, machine-cse or even peepholing pass. > > Comments?At the LLVM IR level this is represented as define i64 @caller() #0 { entry: store i64* @val, i64** @p, align 8, !tbaa !0 store i64 12345123400, i64* @val, align 8, !tbaa !3 %call = tail call i64 @xtr(i64 12345123400) #2 ret i64 %call } Which is probably the best representation to have at this relatively high level. At the machine level it looks like it is the register coalescer that is duplicating the constant. It transforms 0B BB#0: derived from LLVM BB %entry 16B %vreg0<def> = MOV64rm %RIP, 1, %noreg, <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0 32B %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg1 48B MOV64mr %vreg1, 1, %noreg, 0, %noreg, %vreg0; mem:ST8[@p](tbaa=!"any pointer") GR64:%vreg1,%vreg0 64B %vreg2<def> = MOV64ri 12345123400; GR64:%vreg2 80B MOV64mr %vreg0, 1, %noreg, 0, %noreg, %vreg2; mem:ST8[@val](tbaa=!"long long") GR64:%vreg0,%vreg2 96B %RDI<def> = COPY %vreg2; GR64:%vreg2 112B TCRETURNdi64 <ga:@xtr>, 0, <regmask>, %RSP<imp-use>, %RDI<imp-use,kill> into 0B BB#0: derived from LLVM BB %entry 16B %vreg0<def> = MOV64rm %RIP, 1, %noreg, <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0 32B %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg1 48B MOV64mr %vreg1, 1, %noreg, 0, %noreg, %vreg0; mem:ST8[@p](tbaa=!"any pointer") GR64:%vreg1,%vreg0 64B %vreg2<def> = MOV64ri 12345123400; GR64:%vreg2 80B MOV64mr %vreg0, 1, %noreg, 0, %noreg, %vreg2; mem:ST8[@val](tbaa=!"long long") GR64:%vreg0,%vreg2 96B %RDI<def> = MOV64ri 12345123400 112B TCRETURNdi64 <ga:@xtr>, 0, <regmask>, %RSP<imp-use>, %RDI<imp-use,kill> I am not sure why. Maybe this should be delayed until the register allocator, which can split the range if it cannot assign rdi to vreg2? Jakob, should I open a bug? Cheers, Rafael
Jakob Stoklund Olesen
2013-Aug-02 22:48 UTC
[LLVMdev] Missing optimization - constant parameter
On Aug 2, 2013, at 1:37 PM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:>> I expected that this optimization would be picked >> up in a cse, gvn, machine-cse or even peepholing pass. >> >> Comments? > > > At the LLVM IR level this is represented as > > define i64 @caller() #0 { > entry: > store i64* @val, i64** @p, align 8, !tbaa !0 > store i64 12345123400, i64* @val, align 8, !tbaa !3 > %call = tail call i64 @xtr(i64 12345123400) #2 > ret i64 %call > } > > Which is probably the best representation to have at this relatively high level. > > At the machine level it looks like it is the register coalescer that > is duplicating the constant. It transforms > > 0B BB#0: derived from LLVM BB %entry > 16B %vreg0<def> = MOV64rm %RIP, 1, %noreg, > <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0 > 32B %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5], > %noreg; mem:LD8[GOT] GR64:%vreg1 > 48B MOV64mr %vreg1, 1, %noreg, 0, %noreg, %vreg0; > mem:ST8[@p](tbaa=!"any pointer") GR64:%vreg1,%vreg0 > 64B %vreg2<def> = MOV64ri 12345123400; GR64:%vreg2 > 80B MOV64mr %vreg0, 1, %noreg, 0, %noreg, %vreg2; > mem:ST8[@val](tbaa=!"long long") GR64:%vreg0,%vreg2 > 96B %RDI<def> = COPY %vreg2; GR64:%vreg2 > 112B TCRETURNdi64 <ga:@xtr>, 0, <regmask>, %RSP<imp-use>, > %RDI<imp-use,kill> > > into > > 0B BB#0: derived from LLVM BB %entry > 16B %vreg0<def> = MOV64rm %RIP, 1, %noreg, > <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0 > 32B %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5], > %noreg; mem:LD8[GOT] GR64:%vreg1 > 48B MOV64mr %vreg1, 1, %noreg, 0, %noreg, %vreg0; > mem:ST8[@p](tbaa=!"any pointer") GR64:%vreg1,%vreg0 > 64B %vreg2<def> = MOV64ri 12345123400; GR64:%vreg2 > 80B MOV64mr %vreg0, 1, %noreg, 0, %noreg, %vreg2; > mem:ST8[@val](tbaa=!"long long") GR64:%vreg0,%vreg2 > 96B %RDI<def> = MOV64ri 12345123400 > 112B TCRETURNdi64 <ga:@xtr>, 0, <regmask>, %RSP<imp-use>, > %RDI<imp-use,kill> > > I am not sure why. Maybe this should be delayed until the register > allocator, which can split the range if it cannot assign rdi to vreg2? > > Jakob, should I open a bug?MachineCSE skips cheap instructions on purpose: // Heuristics #1: Don't CSE "cheap" computation if the def is not local or in // an immediate predecessor. We don't want to increase register pressure and // end up causing other computation to be spilled. if (MI->isAsCheapAsAMove()) { MachineBasicBlock *CSBB = CSMI->getParent(); MachineBasicBlock *BB = MI->getParent(); if (CSBB != BB && !CSBB->isSuccessor(BB)) return false; } This code is older than the greedy register allocator. We could delete it if somebody is willing to check for regressions in the test suite. I thought we had a PR about this already, but I can’t find it now. Thanks, /jakob