Hi,
I found that in some cases llvm generates duplicate loads of double
constants,
e.g.
$ cat t.c
double f(double* p, int n)
{
    double s = 0;
    if (n)
        s += *p;
    return s;
}
$ clang -S -O3 t.c -o -
...
f:                                      # @f
        .cfi_startproc
# BB#0:
        xorps   %xmm0, %xmm0
        testl   %esi, %esi
        je      .LBB0_2
# BB#1:
        xorps   %xmm0, %xmm0
        addsd   (%rdi), %xmm0
.LBB0_2:
        ret
...
Note that there are 2 xorps instructions, the one in BB#1 being clearly
redundant
as it's dominated by the first one. Two xorps come from 2 FsFLD0SD
generated by
instruction selection and never eliminated by machine passes. My guess
would be
machine CSE should take care of it.
A variation of this case without indirection shows the same problem, as
well as
not commuting addps, resulting in an extra movps:
$ cat t.c
double f(double p, int n)
{
    double s = 0;
    if (n)
        s += p;
    return s;
}
$ clang -S -O3 t.c -o -
...
f:                                      # @f
        .cfi_startproc
# BB#0:
        xorps   %xmm1, %xmm1
        testl   %edi, %edi
        je      .LBB0_2
# BB#1:
        xorps   %xmm1, %xmm1
        addsd   %xmm1, %xmm0
        movaps  %xmm0, %xmm1
.LBB0_2:
        movaps  %xmm1, %xmm0
        ret
...
Thanks,
Eugene
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130819/96dd40e6/attachment.html>
On 18 August 2013 22:38, Eugene Toder <eltoder at gmail.com> wrote:> Hi, > > I found that in some cases llvm generates duplicate loads of double > constants, > e.g. > > $ cat t.c > double f(double* p, int n) > { > double s = 0; > if (n) > s += *p; > return s; > } > $ clang -S -O3 t.c -o - > ... > f: # @f > .cfi_startproc > # BB#0: > xorps %xmm0, %xmm0 > testl %esi, %esi > je .LBB0_2 > # BB#1: > xorps %xmm0, %xmm0 > addsd (%rdi), %xmm0 > .LBB0_2: > ret > ... >Thanks. Please file a bug for this on llvm.org/bugs . The crux of the problem is that machine CSE runs before register allocation and is consequently extremely conservative when doing CSE to avoid potentially increasing register pressure. Of course, with such a small testcase, register pressure isn't a problem. MachineCSE might be able to do a better job here. Nick Note that there are 2 xorps instructions, the one in BB#1 being clearly> redundant > as it's dominated by the first one. Two xorps come from 2 FsFLD0SD > generated by > instruction selection and never eliminated by machine passes. My guess > would be > machine CSE should take care of it. > > A variation of this case without indirection shows the same problem, as > well as > not commuting addps, resulting in an extra movps: > > $ cat t.c > double f(double p, int n) > { > double s = 0; > if (n) > s += p; > return s; > } > $ clang -S -O3 t.c -o - > ... > f: # @f > .cfi_startproc > # BB#0: > xorps %xmm1, %xmm1 > testl %edi, %edi > je .LBB0_2 > # BB#1: > xorps %xmm1, %xmm1 > addsd %xmm1, %xmm0 > movaps %xmm0, %xmm1 > .LBB0_2: > movaps %xmm1, %xmm0 > ret > ... > > Thanks, > Eugene > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130819/186c5c3b/attachment.html>
On Mon, Aug 19, 2013 at 8:50 PM, Nick Lewycky <nlewycky at google.com> wrote:> > Thanks. Please file a bug for this on llvm.org/bugs . >Done (PR16938).> The crux of the problem is that machine CSE runs before register > allocation and is consequently extremely conservative when doing CSE to > avoid potentially increasing register pressure. Of course, with such a > small testcase, register pressure isn't a problem. MachineCSE might be able > to do a better job here. > > Nick >I figured it was trying to avoid adding register pressure, but shouldn't it be more aggressive with constants? Isn't register allocator smart enough to spill constants when it runs out of registers? Also, what do you think on commuting addps in the second example? Thanks, Eugene -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130819/2acabbd0/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Duplicate loading of double constants
- New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
- [LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
- [LLVMdev] Is it a bug or am I missing something ?
- [LLVMdev] Suboptimal code due to excessive spilling