Displaying 3 results from an estimated 3 matches for "_z4fct2dv4_f".
Did you mean:
_z4fct1dv4_f
2020 Aug 31
2
Vectorization of math function failed?
...%rsp),%xmm1,%xmm1
57: 20
58: c4 e3 71 21 c0 30 vinsertps $0x30,%xmm0,%xmm1,%xmm0
5e: 48 83 c4 48 add $0x48,%rsp
62: c3 retq
63: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
6a: 00 00 00
6d: 0f 1f 00 nopl (%rax)
0000000000000070 <_Z4fct2Dv4_f>:
70: 48 83 ec 48 sub $0x48,%rsp
74: c5 f8 29 04 24 vmovaps %xmm0,(%rsp)
79: e8 00 00 00 00 callq 7e <_Z4fct2Dv4_f+0xe>
7e: c5 f8 29 44 24 30 vmovaps %xmm0,0x30(%rsp)
84: c5 fa 16 04 24 vmovshdup (%rsp),%xmm0
89: e8 00 00 00 00 callq...
2020 Aug 31
2
Should llvm optimize 1.0 / x ?
....text:
0000000000000000 <_Z4fct1Dv4_f>:
0: c4 e2 79 18 0d 00 00 vbroadcastss 0x0(%rip),%xmm1 # 9
<_Z4fct1Dv4_f+0x9>
7: 00 00
9: c5 f0 5e c0 vdivps %xmm0,%xmm1,%xmm0
d: c3 retq
e: 66 90 xchg %ax,%ax
0000000000000010 <_Z4fct2Dv4_f>:
10: c5 f8 53 c0 vrcpps %xmm0,%xmm0
14: c3 retq
As you can see, 1.0 / x is not turned into vrcpps. Is it because of
precision or a missing optimization?
Regards,
--
Alexandre Bique
2020 Sep 01
2
Should llvm optimize 1.0 / x ?
...0d 00 00 vbroadcastss 0x0(%rip),%xmm1 # 9
> > <_Z4fct1Dv4_f+0x9>
> > 7: 00 00
> > 9: c5 f0 5e c0 vdivps %xmm0,%xmm1,%xmm0
> > d: c3 retq
> > e: 66 90 xchg %ax,%ax
> >
> > 0000000000000010 <_Z4fct2Dv4_f>:
> > 10: c5 f8 53 c0 vrcpps %xmm0,%xmm0
> > 14: c3 retq
> >
> >
> > As you can see, 1.0 / x is not turned into vrcpps. Is it because of
> > precision or a missing optimization?
> >
> > Regards,
> > --
> >...