Displaying 5 results from an estimated 5 matches for "vmulss".
Did you mean:
vmulps
2019 Jan 22
4
_Float16 support
...rgument 1 from single to half
vcvtph2ps xmm1, xmm1 # Convert argument 1 back to single
vcvtps2ph xmm0, xmm0, 4 # Convert argument 0 from single to half
vcvtph2ps xmm0, xmm0 # Convert argument 0 back to single
vmulss xmm0, xmm0, xmm1 # xmm0 = xmm0*xmm1 (single precision)
vcvtps2ph xmm1, xmm0, 4 # Convert the single precision result to half
vmovd eax, xmm1 # Move the half precision result to eax
mov word ptr [...
2014 Jul 31
2
[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines
Hi Tim,
Thanks for the thorough explanation. It makes perfect sense.
I was not aware fast-math is supposed to prevent more precision being used
than what is in the standard.
I came across this issue while looking into the output or different
compilers. XL and Microsoft compiler seem
to have that turned on by default. But I assume that clang follows what gcc
does, and have that turned off.
2019 Jan 24
2
[cfe-dev] _Float16 support
...e to half
> vcvtph2ps xmm1, xmm1 # Convert argument 1 back to single
> vcvtps2ph xmm0, xmm0, 4 # Convert argument 0 from single to half
> vcvtph2ps xmm0, xmm0 # Convert argument 0 back to single
> vmulss xmm0, xmm0, xmm1 # xmm0 = xmm0*xmm1 (single precision)
> vcvtps2ph xmm1, xmm0, 4 # Convert the single precision result to half
> vmovd eax, xmm1 # Move the half precision result to eax
> mov...
2019 Jan 24
4
[cfe-dev] _Float16 support
...1, xmm1 # Convert argument
>> 1 back to single
>> vcvtps2ph xmm0, xmm0, 4 # Convert argument 0
>> from single to half
>> vcvtph2ps xmm0, xmm0 # Convert argument
>> 0 back to single
>> vmulss xmm0, xmm0, xmm1 # xmm0 = xmm0*xmm1
>> (single precision)
>> vcvtps2ph xmm1, xmm0, 4 # Convert the single
>> precision result to half
>> vmovd eax, xmm1 # Move the
>> half precision resu...
2014 Aug 07
2
[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines
...regarding FMA - at least
> by default. I don't have a PPC compiler to test with, but for x86-64
> using clang trunk and gcc 4.9:
>
> $ cat fma.c
> float foo(float x, float y, float z) { return x * y + z; }
>
> $ ./clang -march=core-avx2 -O2 -S fma.c -o - | grep ss
> vmulss %xmm1, %xmm0, %xmm0
> vaddss %xmm2, %xmm0, %xmm0
>
> $ ./gcc -march=core-avx2 -O2 -S fma.c -o - | grep ss
> vfmadd132ss %xmm1, %xmm2, %xmm0
>
> ----------------------------------------------------------------------
> This was brought up in Dec 2013 on this lis...