Displaying 3 results from an estimated 3 matches for "_z4fct1dv4_f".
2020 Aug 31
2
Should llvm optimize 1.0 / x ?
...#include <cmath>
using v4f32 = float __attribute__((__vector_size__(16)));
v4f32 fct1(v4f32 x)
{
return 1.0 / x;
}
v4f32 fct2(v4f32 x)
{
return __builtin_ia32_rcpps(x);
}
Which is compiled to:
vec.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_Z4fct1Dv4_f>:
0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1 # 9
<_Z4fct1Dv4_f+0x9>
7: 00 00
9: c5 f0 5e c0 vdivps %xmm0,%xmm1,%xmm0
d: c3 retq
e: 66 90 xchg %ax,%ax
0000000000000010 <_Z4fct2Dv4_f>:
10: c5 f8 53 c0...
2020 Aug 31
2
Vectorization of math function failed?
...vectorize(enable)
for (int i = 0; i < 16; ++i)
x[i] = sinf(x[i]);
}
Which I compiled with: clang++ -O3 -march=native -mtune=native -c -o
vec.o vec.cc -lmvec -fno-math-errno
And here is what I get:
vec.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_Z4fct1Dv4_f>:
0: 48 83 ec 48 sub $0x48,%rsp
4: c5 f8 29 04 24 vmovaps %xmm0,(%rsp)
9: e8 00 00 00 00 callq e <_Z4fct1Dv4_f+0xe>
e: c5 f8 29 44 24 30 vmovaps %xmm0,0x30(%rsp)
14: c5 fa 16 04 24 vmovshdup (%rsp),%xmm0
19: e8 00 00 00 00 callq...
2020 Sep 01
2
Should llvm optimize 1.0 / x ?
...gt;
> > v4f32 fct2(v4f32 x)
> > {
> > return __builtin_ia32_rcpps(x);
> > }
> >
> > Which is compiled to:
> >
> > vec.o: file format elf64-x86-64
> >
> >
> > Disassembly of section .text:
> >
> > 0000000000000000 <_Z4fct1Dv4_f>:
> > 0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1 # 9
> > <_Z4fct1Dv4_f+0x9>
> > 7: 00 00
> > 9: c5 f0 5e c0 vdivps %xmm0,%xmm1,%xmm0
> > d: c3 retq
> > e: 66 90 xchg %ax,%ax
> >...