Hi,
I found that in some cases llvm generates duplicate loads of double
constants,
e.g.
$ cat t.c
double f(double* p, int n)
{
double s = 0;
if (n)
s += *p;
return s;
}
$ clang -S -O3 t.c -o -
...
f: # @f
.cfi_startproc
# BB#0:
xorps %xmm0, %xmm0
testl %esi, %esi
je .LBB0_2
# BB#1:
xorps %xmm0, %xmm0
addsd (%rdi), %xmm0
.LBB0_2:
ret
...
Note that there are 2 xorps instructions, the one in BB#1 being clearly
redundant
as it's dominated by the first one. Two xorps come from 2 FsFLD0SD
generated by
instruction selection and never eliminated by machine passes. My guess
would be
machine CSE should take care of it.
A variation of this case without indirection shows the same problem, as
well as
not commuting addps, resulting in an extra movps:
$ cat t.c
double f(double p, int n)
{
double s = 0;
if (n)
s += p;
return s;
}
$ clang -S -O3 t.c -o -
...
f: # @f
.cfi_startproc
# BB#0:
xorps %xmm1, %xmm1
testl %edi, %edi
je .LBB0_2
# BB#1:
xorps %xmm1, %xmm1
addsd %xmm1, %xmm0
movaps %xmm0, %xmm1
.LBB0_2:
movaps %xmm1, %xmm0
ret
...
Thanks,
Eugene
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130819/96dd40e6/attachment.html>
On 18 August 2013 22:38, Eugene Toder <eltoder at gmail.com> wrote:> Hi, > > I found that in some cases llvm generates duplicate loads of double > constants, > e.g. > > $ cat t.c > double f(double* p, int n) > { > double s = 0; > if (n) > s += *p; > return s; > } > $ clang -S -O3 t.c -o - > ... > f: # @f > .cfi_startproc > # BB#0: > xorps %xmm0, %xmm0 > testl %esi, %esi > je .LBB0_2 > # BB#1: > xorps %xmm0, %xmm0 > addsd (%rdi), %xmm0 > .LBB0_2: > ret > ... >Thanks. Please file a bug for this on llvm.org/bugs . The crux of the problem is that machine CSE runs before register allocation and is consequently extremely conservative when doing CSE to avoid potentially increasing register pressure. Of course, with such a small testcase, register pressure isn't a problem. MachineCSE might be able to do a better job here. Nick Note that there are 2 xorps instructions, the one in BB#1 being clearly> redundant > as it's dominated by the first one. Two xorps come from 2 FsFLD0SD > generated by > instruction selection and never eliminated by machine passes. My guess > would be > machine CSE should take care of it. > > A variation of this case without indirection shows the same problem, as > well as > not commuting addps, resulting in an extra movps: > > $ cat t.c > double f(double p, int n) > { > double s = 0; > if (n) > s += p; > return s; > } > $ clang -S -O3 t.c -o - > ... > f: # @f > .cfi_startproc > # BB#0: > xorps %xmm1, %xmm1 > testl %edi, %edi > je .LBB0_2 > # BB#1: > xorps %xmm1, %xmm1 > addsd %xmm1, %xmm0 > movaps %xmm0, %xmm1 > .LBB0_2: > movaps %xmm1, %xmm0 > ret > ... > > Thanks, > Eugene > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130819/186c5c3b/attachment.html>
On Mon, Aug 19, 2013 at 8:50 PM, Nick Lewycky <nlewycky at google.com> wrote:> > Thanks. Please file a bug for this on llvm.org/bugs . >Done (PR16938).> The crux of the problem is that machine CSE runs before register > allocation and is consequently extremely conservative when doing CSE to > avoid potentially increasing register pressure. Of course, with such a > small testcase, register pressure isn't a problem. MachineCSE might be able > to do a better job here. > > Nick >I figured it was trying to avoid adding register pressure, but shouldn't it be more aggressive with constants? Isn't register allocator smart enough to spill constants when it runs out of registers? Also, what do you think on commuting addps in the second example? Thanks, Eugene -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130819/2acabbd0/attachment.html>