Hi, I found that in some cases llvm generates duplicate loads of double constants, e.g. $ cat t.c double f(double* p, int n) { double s = 0; if (n) s += *p; return s; } $ clang -S -O3 t.c -o - ... f: # @f .cfi_startproc # BB#0: xorps %xmm0, %xmm0 testl %esi, %esi je .LBB0_2 # BB#1: xorps %xmm0, %xmm0 addsd (%rdi), %xmm0 .LBB0_2: ret ... Note that there are 2 xorps instructions, the one in BB#1 being clearly redundant as it's dominated by the first one. Two xorps come from 2 FsFLD0SD generated by instruction selection and never eliminated by machine passes. My guess would be machine CSE should take care of it. A variation of this case without indirection shows the same problem, as well as not commuting addps, resulting in an extra movps: $ cat t.c double f(double p, int n) { double s = 0; if (n) s += p; return s; } $ clang -S -O3 t.c -o - ... f: # @f .cfi_startproc # BB#0: xorps %xmm1, %xmm1 testl %edi, %edi je .LBB0_2 # BB#1: xorps %xmm1, %xmm1 addsd %xmm1, %xmm0 movaps %xmm0, %xmm1 .LBB0_2: movaps %xmm1, %xmm0 ret ... Thanks, Eugene -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130819/96dd40e6/attachment.html>
On 18 August 2013 22:38, Eugene Toder <eltoder at gmail.com> wrote:> Hi, > > I found that in some cases llvm generates duplicate loads of double > constants, > e.g. > > $ cat t.c > double f(double* p, int n) > { > double s = 0; > if (n) > s += *p; > return s; > } > $ clang -S -O3 t.c -o - > ... > f: # @f > .cfi_startproc > # BB#0: > xorps %xmm0, %xmm0 > testl %esi, %esi > je .LBB0_2 > # BB#1: > xorps %xmm0, %xmm0 > addsd (%rdi), %xmm0 > .LBB0_2: > ret > ... >Thanks. Please file a bug for this on llvm.org/bugs . The crux of the problem is that machine CSE runs before register allocation and is consequently extremely conservative when doing CSE to avoid potentially increasing register pressure. Of course, with such a small testcase, register pressure isn't a problem. MachineCSE might be able to do a better job here. Nick Note that there are 2 xorps instructions, the one in BB#1 being clearly> redundant > as it's dominated by the first one. Two xorps come from 2 FsFLD0SD > generated by > instruction selection and never eliminated by machine passes. My guess > would be > machine CSE should take care of it. > > A variation of this case without indirection shows the same problem, as > well as > not commuting addps, resulting in an extra movps: > > $ cat t.c > double f(double p, int n) > { > double s = 0; > if (n) > s += p; > return s; > } > $ clang -S -O3 t.c -o - > ... > f: # @f > .cfi_startproc > # BB#0: > xorps %xmm1, %xmm1 > testl %edi, %edi > je .LBB0_2 > # BB#1: > xorps %xmm1, %xmm1 > addsd %xmm1, %xmm0 > movaps %xmm0, %xmm1 > .LBB0_2: > movaps %xmm1, %xmm0 > ret > ... > > Thanks, > Eugene > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130819/186c5c3b/attachment.html>
On Mon, Aug 19, 2013 at 8:50 PM, Nick Lewycky <nlewycky at google.com> wrote:> > Thanks. Please file a bug for this on llvm.org/bugs . >Done (PR16938).> The crux of the problem is that machine CSE runs before register > allocation and is consequently extremely conservative when doing CSE to > avoid potentially increasing register pressure. Of course, with such a > small testcase, register pressure isn't a problem. MachineCSE might be able > to do a better job here. > > Nick >I figured it was trying to avoid adding register pressure, but shouldn't it be more aggressive with constants? Isn't register allocator smart enough to spill constants when it runs out of registers? Also, what do you think on commuting addps in the second example? Thanks, Eugene -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130819/2acabbd0/attachment.html>
Reasonably Related Threads
- [LLVMdev] Duplicate loading of double constants
- New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
- [LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
- [LLVMdev] Is it a bug or am I missing something ?
- [LLVMdev] Suboptimal code due to excessive spilling