thr3ads.net - search: "_z4fct1dv4

Displaying 3 results from an estimated 3 matches for "_z4fct1dv4_f".

2020 Aug 31

Should llvm optimize 1.0 / x ?

...#include <cmath> using v4f32 = float __attribute__((__vector_size__(16))); v4f32 fct1(v4f32 x) { return 1.0 / x; } v4f32 fct2(v4f32 x) { return __builtin_ia32_rcpps(x); } Which is compiled to: vec.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <_Z4fct1Dv4_f>: 0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1 # 9 <_Z4fct1Dv4_f+0x9> 7: 00 00 9: c5 f0 5e c0 vdivps %xmm0,%xmm1,%xmm0 d: c3 retq e: 66 90 xchg %ax,%ax 0000000000000010 <_Z4fct2Dv4_f>: 10: c5 f8 53 c0...

Vectorization of math function failed?

2020 Aug 31

Vectorization of math function failed?

...vectorize(enable) for (int i = 0; i < 16; ++i) x[i] = sinf(x[i]); } Which I compiled with: clang++ -O3 -march=native -mtune=native -c -o vec.o vec.cc -lmvec -fno-math-errno And here is what I get: vec.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <_Z4fct1Dv4_f>: 0: 48 83 ec 48 sub $0x48,%rsp 4: c5 f8 29 04 24 vmovaps %xmm0,(%rsp) 9: e8 00 00 00 00 callq e <_Z4fct1Dv4_f+0xe> e: c5 f8 29 44 24 30 vmovaps %xmm0,0x30(%rsp) 14: c5 fa 16 04 24 vmovshdup (%rsp),%xmm0 19: e8 00 00 00 00 callq...

Should llvm optimize 1.0 / x ?

2020 Sep 01

Should llvm optimize 1.0 / x ?

...gt; > > v4f32 fct2(v4f32 x) > > { > > return __builtin_ia32_rcpps(x); > > } > > > > Which is compiled to: > > > > vec.o: file format elf64-x86-64 > > > > > > Disassembly of section .text: > > > > 0000000000000000 <_Z4fct1Dv4_f>: > > 0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1 # 9 > > <_Z4fct1Dv4_f+0x9> > > 7: 00 00 > > 9: c5 f0 5e c0 vdivps %xmm0,%xmm1,%xmm0 > > d: c3 retq > > e: 66 90 xchg %ax,%ax > &gt...

search for: _z4fct1dv4_f