Displaying 2 results from an estimated 2 matches for "vrcpps".
Did you mean:
rcpps
2020 Aug 31
2
Should llvm optimize 1.0 / x ?
...:
0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1 # 9
<_Z4fct1Dv4_f+0x9>
7: 00 00
9: c5 f0 5e c0 vdivps %xmm0,%xmm1,%xmm0
d: c3 retq
e: 66 90 xchg %ax,%ax
0000000000000010 <_Z4fct2Dv4_f>:
10: c5 f8 53 c0 vrcpps %xmm0,%xmm0
14: c3 retq
As you can see, 1.0 / x is not turned into vrcpps. Is it because of
precision or a missing optimization?
Regards,
--
Alexandre Bique
2020 Sep 01
2
Should llvm optimize 1.0 / x ?
Hi Quentin,
You are correct, I could manage to get clang to use vrcpps, but not in
a satisfying way:
clang++ -O3 -march=native -mtune=native \
-Rpass=loop-vectorize -Rpass-missed=loop-vectorize
-Rpass-analysis=loop-vectorize \
-ffast-math -ffp-model=fast -ffp-exception-behavior=ignore -ffp-contract=fast \
-c -o vec.o vec.cc
0000000000000140 <_Z4fct4Dv4_f>:
14...