thr3ads.net - llvm dev - [llvm-dev] Should llvm optimize 1.0 / x ? [Aug 2020]

If this information is useful, please help other people find it:
Share via:

Alexandre Bique via llvm-dev

2020-Aug-31 21:21 UTC

[llvm-dev] Should llvm optimize 1.0 / x ?

Hi,

Here is a small C++ program:

vec.cc:

#include <cmath>

using v4f32 = float __attribute__((__vector_size__(16)));

v4f32 fct1(v4f32 x)
{
  return 1.0 / x;
}

v4f32 fct2(v4f32 x)
{
  return __builtin_ia32_rcpps(x);
}

Which is compiled to:

vec.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_Z4fct1Dv4_f>:
   0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1        # 9
<_Z4fct1Dv4_f+0x9>
   7: 00 00
   9: c5 f0 5e c0          vdivps %xmm0,%xmm1,%xmm0
   d: c3                    retq
   e: 66 90                xchg   %ax,%ax

0000000000000010 <_Z4fct2Dv4_f>:
  10: c5 f8 53 c0          vrcpps %xmm0,%xmm0
  14: c3                    retq


As you can see, 1.0 / x is not turned into vrcpps. Is it because of
precision or a missing optimization?

Regards,
-- 
Alexandre Bique

Quentin Colombet via llvm-dev

2020-Aug-31 22:59 UTC

head link

[llvm-dev] Should llvm optimize 1.0 / x ?

Hi Alexandre,

Have you tried to compile this with fast-math enabled (`-ffast-math`
https://clang.llvm.org/docs/UsersManual.html#controlling-floating-point-behavior)?

I would expect LLVM to require the `arcp` flag to perform this optimization
(https://www.llvm.org/docs/LangRef.html#fast-math-flags).

Cheers,
-Quentin

> On Aug 31, 2020, at 2:21 PM, Alexandre Bique via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
> 
> Here is a small C++ program:
> 
> vec.cc:
> 
> #include <cmath>
> 
> using v4f32 = float __attribute__((__vector_size__(16)));
> 
> v4f32 fct1(v4f32 x)
> {
>  return 1.0 / x;
> }
> 
> v4f32 fct2(v4f32 x)
> {
>  return __builtin_ia32_rcpps(x);
> }
> 
> Which is compiled to:
> 
> vec.o:     file format elf64-x86-64
> 
> 
> Disassembly of section .text:
> 
> 0000000000000000 <_Z4fct1Dv4_f>:
>   0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1        # 9
> <_Z4fct1Dv4_f+0x9>
>   7: 00 00
>   9: c5 f0 5e c0          vdivps %xmm0,%xmm1,%xmm0
>   d: c3                    retq
>   e: 66 90                xchg   %ax,%ax
> 
> 0000000000000010 <_Z4fct2Dv4_f>:
>  10: c5 f8 53 c0          vrcpps %xmm0,%xmm0
>  14: c3                    retq
> 
> 
> As you can see, 1.0 / x is not turned into vrcpps. Is it because of
> precision or a missing optimization?
> 
> Regards,
> -- 
> Alexandre Bique
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Alexandre Bique via llvm-dev

2020-Sep-01 06:44 UTC

head link

[llvm-dev] Should llvm optimize 1.0 / x ?

Hi Quentin,

You are correct, I could manage to get clang to use vrcpps, but not in
a satisfying way:

clang++ -O3 -march=native -mtune=native \
-Rpass=loop-vectorize -Rpass-missed=loop-vectorize
-Rpass-analysis=loop-vectorize \
-ffast-math -ffp-model=fast -ffp-exception-behavior=ignore -ffp-contract=fast \
-c -o vec.o vec.cc

0000000000000140 <_Z4fct4Dv4_f>:
 140: c5 f8 53 c8          vrcpps %xmm0,%xmm1
 144: c4 e2 79 18 15 00 00 vbroadcastss 0x0(%rip),%xmm2        # 14d
<_Z4fct4Dv4_f+0xd>
 14b: 00 00
 14d: c4 e2 71 ac c2        vfnmadd213ps %xmm2,%xmm1,%xmm0
 152: c4 e2 71 98 c1        vfmadd132ps %xmm1,%xmm1,%xmm0
 157: c3                    retq
 158: 0f 1f 84 00 00 00 00 nopl   0x0(%rax,%rax,1)
 15f: 00

0000000000000160 <_Z4fct5Dv4_f>:
 160: c5 f8 53 c0          vrcpps %xmm0,%xmm0
 164: c3                    retq

As you can see, fct4 is not equivalent to fct5.

Regards,
Alexandre Bique

On Tue, Sep 1, 2020 at 12:59 AM Quentin Colombet <qcolombet at apple.com>
wrote:>
> Hi Alexandre,
>
> Have you tried to compile this with fast-math enabled (`-ffast-math`
https://clang.llvm.org/docs/UsersManual.html#controlling-floating-point-behavior)?
>
> I would expect LLVM to require the `arcp` flag to perform this optimization
(https://www.llvm.org/docs/LangRef.html#fast-math-flags).
>
> Cheers,
> -Quentin
>
>
> > On Aug 31, 2020, at 2:21 PM, Alexandre Bique via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
> >
> > Hi,
> >
> > Here is a small C++ program:
> >
> > vec.cc:
> >
> > #include <cmath>
> >
> > using v4f32 = float __attribute__((__vector_size__(16)));
> >
> > v4f32 fct1(v4f32 x)
> > {
> >  return 1.0 / x;
> > }
> >
> > v4f32 fct2(v4f32 x)
> > {
> >  return __builtin_ia32_rcpps(x);
> > }
> >
> > Which is compiled to:
> >
> > vec.o:     file format elf64-x86-64
> >
> >
> > Disassembly of section .text:
> >
> > 0000000000000000 <_Z4fct1Dv4_f>:
> >   0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1        # 9
> > <_Z4fct1Dv4_f+0x9>
> >   7: 00 00
> >   9: c5 f0 5e c0          vdivps %xmm0,%xmm1,%xmm0
> >   d: c3                    retq
> >   e: 66 90                xchg   %ax,%ax
> >
> > 0000000000000010 <_Z4fct2Dv4_f>:
> >  10: c5 f8 53 c0          vrcpps %xmm0,%xmm0
> >  14: c3                    retq
> >
> >
> > As you can see, 1.0 / x is not turned into vrcpps. Is it because of
> > precision or a missing optimization?
> >
> > Regards,
> > --
> > Alexandre Bique
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

llvm dev - Aug 2020 - Should llvm optimize 1.0 / x ?

[llvm-dev] Should llvm optimize 1.0 / x ?

[llvm-dev] Should llvm optimize 1.0 / x ?

[llvm-dev] Should llvm optimize 1.0 / x ?