Jakob Stoklund Olesen
2013-Aug-02 22:48 UTC
[LLVMdev] Missing optimization - constant parameter
On Aug 2, 2013, at 1:37 PM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:>> I expected that this optimization would be picked >> up in a cse, gvn, machine-cse or even peepholing pass. >> >> Comments? > > > At the LLVM IR level this is represented as > > define i64 @caller() #0 { > entry: > store i64* @val, i64** @p, align 8, !tbaa !0 > store i64 12345123400, i64* @val, align 8, !tbaa !3 > %call = tail call i64 @xtr(i64 12345123400) #2 > ret i64 %call > } > > Which is probably the best representation to have at this relatively high level. > > At the machine level it looks like it is the register coalescer that > is duplicating the constant. It transforms > > 0B BB#0: derived from LLVM BB %entry > 16B %vreg0<def> = MOV64rm %RIP, 1, %noreg, > <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0 > 32B %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5], > %noreg; mem:LD8[GOT] GR64:%vreg1 > 48B MOV64mr %vreg1, 1, %noreg, 0, %noreg, %vreg0; > mem:ST8[@p](tbaa=!"any pointer") GR64:%vreg1,%vreg0 > 64B %vreg2<def> = MOV64ri 12345123400; GR64:%vreg2 > 80B MOV64mr %vreg0, 1, %noreg, 0, %noreg, %vreg2; > mem:ST8[@val](tbaa=!"long long") GR64:%vreg0,%vreg2 > 96B %RDI<def> = COPY %vreg2; GR64:%vreg2 > 112B TCRETURNdi64 <ga:@xtr>, 0, <regmask>, %RSP<imp-use>, > %RDI<imp-use,kill> > > into > > 0B BB#0: derived from LLVM BB %entry > 16B %vreg0<def> = MOV64rm %RIP, 1, %noreg, > <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0 > 32B %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5], > %noreg; mem:LD8[GOT] GR64:%vreg1 > 48B MOV64mr %vreg1, 1, %noreg, 0, %noreg, %vreg0; > mem:ST8[@p](tbaa=!"any pointer") GR64:%vreg1,%vreg0 > 64B %vreg2<def> = MOV64ri 12345123400; GR64:%vreg2 > 80B MOV64mr %vreg0, 1, %noreg, 0, %noreg, %vreg2; > mem:ST8[@val](tbaa=!"long long") GR64:%vreg0,%vreg2 > 96B %RDI<def> = MOV64ri 12345123400 > 112B TCRETURNdi64 <ga:@xtr>, 0, <regmask>, %RSP<imp-use>, > %RDI<imp-use,kill> > > I am not sure why. Maybe this should be delayed until the register > allocator, which can split the range if it cannot assign rdi to vreg2? > > Jakob, should I open a bug?MachineCSE skips cheap instructions on purpose: // Heuristics #1: Don't CSE "cheap" computation if the def is not local or in // an immediate predecessor. We don't want to increase register pressure and // end up causing other computation to be spilled. if (MI->isAsCheapAsAMove()) { MachineBasicBlock *CSBB = CSMI->getParent(); MachineBasicBlock *BB = MI->getParent(); if (CSBB != BB && !CSBB->isSuccessor(BB)) return false; } This code is older than the greedy register allocator. We could delete it if somebody is willing to check for regressions in the test suite. I thought we had a PR about this already, but I can’t find it now. Thanks, /jakob
Are you sure that's it? I commented that block out, rebuilt llvm 3.3, and it still duplicates the constant. My concern is that long constant loads increase code size and if they can be avoided by better targeting it would be a win. My project's application of llvm tends to use a lot of long constants so this can be a significant optimization. I'll do some more debugging now that you have pointed me in the right direction. thanks /maurice On Fri, Aug 2, 2013 at 5:48 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk>wrote:> > On Aug 2, 2013, at 1:37 PM, Rafael Espíndola <rafael.espindola at gmail.com> > wrote: > > >> I expected that this optimization would be picked > >> up in a cse, gvn, machine-cse or even peepholing pass. > >> > >> Comments? > > > > > > At the LLVM IR level this is represented as > > > > define i64 @caller() #0 { > > entry: > > store i64* @val, i64** @p, align 8, !tbaa !0 > > store i64 12345123400, i64* @val, align 8, !tbaa !3 > > %call = tail call i64 @xtr(i64 12345123400) #2 > > ret i64 %call > > } > > > > Which is probably the best representation to have at this relatively > high level. > > > > At the machine level it looks like it is the register coalescer that > > is duplicating the constant. It transforms > > > > 0B BB#0: derived from LLVM BB %entry > > 16B %vreg0<def> = MOV64rm %RIP, 1, %noreg, > > <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0 > > 32B %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5], > > %noreg; mem:LD8[GOT] GR64:%vreg1 > > 48B MOV64mr %vreg1, 1, %noreg, 0, %noreg, %vreg0; > > mem:ST8[@p](tbaa=!"any pointer") GR64:%vreg1,%vreg0 > > 64B %vreg2<def> = MOV64ri 12345123400; GR64:%vreg2 > > 80B MOV64mr %vreg0, 1, %noreg, 0, %noreg, %vreg2; > > mem:ST8[@val](tbaa=!"long long") GR64:%vreg0,%vreg2 > > 96B %RDI<def> = COPY %vreg2; GR64:%vreg2 > > 112B TCRETURNdi64 <ga:@xtr>, 0, <regmask>, %RSP<imp-use>, > > %RDI<imp-use,kill> > > > > into > > > > 0B BB#0: derived from LLVM BB %entry > > 16B %vreg0<def> = MOV64rm %RIP, 1, %noreg, > > <ga:@val>[TF=5], %noreg; mem:LD8[GOT] GR64:%vreg0 > > 32B %vreg1<def> = MOV64rm %RIP, 1, %noreg, <ga:@p>[TF=5], > > %noreg; mem:LD8[GOT] GR64:%vreg1 > > 48B MOV64mr %vreg1, 1, %noreg, 0, %noreg, %vreg0; > > mem:ST8[@p](tbaa=!"any pointer") GR64:%vreg1,%vreg0 > > 64B %vreg2<def> = MOV64ri 12345123400; GR64:%vreg2 > > 80B MOV64mr %vreg0, 1, %noreg, 0, %noreg, %vreg2; > > mem:ST8[@val](tbaa=!"long long") GR64:%vreg0,%vreg2 > > 96B %RDI<def> = MOV64ri 12345123400 > > 112B TCRETURNdi64 <ga:@xtr>, 0, <regmask>, %RSP<imp-use>, > > %RDI<imp-use,kill> > > > > I am not sure why. Maybe this should be delayed until the register > > allocator, which can split the range if it cannot assign rdi to vreg2? > > > > Jakob, should I open a bug? > > MachineCSE skips cheap instructions on purpose: > > // Heuristics #1: Don't CSE "cheap" computation if the def is not local > or in > // an immediate predecessor. We don't want to increase register pressure > and > // end up causing other computation to be spilled. > if (MI->isAsCheapAsAMove()) { > MachineBasicBlock *CSBB = CSMI->getParent(); > MachineBasicBlock *BB = MI->getParent(); > if (CSBB != BB && !CSBB->isSuccessor(BB)) > return false; > } > > This code is older than the greedy register allocator. We could delete it > if somebody is willing to check for regressions in the test suite. > > I thought we had a PR about this already, but I can’t find it now. > > Thanks, > /jakob > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130805/8b7421e7/attachment.html>
Jakob Stoklund Olesen
2013-Aug-05 16:19 UTC
[LLVMdev] Missing optimization - constant parameter
On Aug 5, 2013, at 8:34 AM, Maurice Marks <maurice.marks at gmail.com> wrote:> Are you sure that's it? I commented that block out, rebuilt llvm 3.3, and it still duplicates the constant. > My concern is that long constant loads increase code size and if they can be avoided by better targeting it would be a win. My project's application of llvm tends to use a lot of long constants so this can be a significant optimization. > I'll do some more debugging now that you have pointed me in the right direction.It is also possible that the coalescer is duplicating the instruction like Rafael suggested. It will do that if it can't eliminate a copy. /jakob
Maybe Matching Threads
- [LLVMdev] Missing optimization - constant parameter
- [LLVMdev] Missing optimization - constant parameter
- [LLVMdev] Missing optimization - constant parameter
- [LLVMdev] Missing optimization - constant parameter
- [LLVMdev] Missing optimization - constant parameter