Displaying 2 results from an estimated 2 matches for "vfmadd132ps".
2020 Sep 01
2
Should llvm optimize 1.0 / x ?
...\
-c -o vec.o vec.cc
0000000000000140 <_Z4fct4Dv4_f>:
140: c5 f8 53 c8 vrcpps %xmm0,%xmm1
144: c4 e2 79 18 15 00 00 vbroadcastss 0x0(%rip),%xmm2 # 14d
<_Z4fct4Dv4_f+0xd>
14b: 00 00
14d: c4 e2 71 ac c2 vfnmadd213ps %xmm2,%xmm1,%xmm0
152: c4 e2 71 98 c1 vfmadd132ps %xmm1,%xmm1,%xmm0
157: c3 retq
158: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
15f: 00
0000000000000160 <_Z4fct5Dv4_f>:
160: c5 f8 53 c0 vrcpps %xmm0,%xmm0
164: c3 retq
As you can see, fct4 is not equivalent to fct5.
Regards,
Alexandre...
2020 Aug 31
2
Should llvm optimize 1.0 / x ?
Hi,
Here is a small C++ program:
vec.cc:
#include <cmath>
using v4f32 = float __attribute__((__vector_size__(16)));
v4f32 fct1(v4f32 x)
{
return 1.0 / x;
}
v4f32 fct2(v4f32 x)
{
return __builtin_ia32_rcpps(x);
}
Which is compiled to:
vec.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_Z4fct1Dv4_f>:
0: c4 e2 79 18 0d 00 00 vbroadcastss