search for: vmulss

Displaying 5 results from an estimated 5 matches for "vmulss".

Did you mean: vmulps
2019 Jan 22
4
_Float16 support
...rgument 1 from single to half vcvtph2ps xmm1, xmm1 # Convert argument 1 back to single vcvtps2ph xmm0, xmm0, 4 # Convert argument 0 from single to half vcvtph2ps xmm0, xmm0 # Convert argument 0 back to single vmulss xmm0, xmm0, xmm1 # xmm0 = xmm0*xmm1 (single precision) vcvtps2ph xmm1, xmm0, 4 # Convert the single precision result to half vmovd eax, xmm1 # Move the half precision result to eax mov word ptr [...
2014 Jul 31
2
[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines
Hi Tim, Thanks for the thorough explanation. It makes perfect sense. I was not aware fast-math is supposed to prevent more precision being used than what is in the standard. I came across this issue while looking into the output or different compilers. XL and Microsoft compiler seem to have that turned on by default. But I assume that clang follows what gcc does, and have that turned off.
2019 Jan 24
2
[cfe-dev] _Float16 support
...e to half > vcvtph2ps xmm1, xmm1 # Convert argument 1 back to single > vcvtps2ph xmm0, xmm0, 4 # Convert argument 0 from single to half > vcvtph2ps xmm0, xmm0 # Convert argument 0 back to single > vmulss xmm0, xmm0, xmm1 # xmm0 = xmm0*xmm1 (single precision) > vcvtps2ph xmm1, xmm0, 4 # Convert the single precision result to half > vmovd eax, xmm1 # Move the half precision result to eax > mov...
2019 Jan 24
4
[cfe-dev] _Float16 support
...1, xmm1 # Convert argument >> 1 back to single >> vcvtps2ph xmm0, xmm0, 4 # Convert argument 0 >> from single to half >> vcvtph2ps xmm0, xmm0 # Convert argument >> 0 back to single >> vmulss xmm0, xmm0, xmm1 # xmm0 = xmm0*xmm1 >> (single precision) >> vcvtps2ph xmm1, xmm0, 4 # Convert the single >> precision result to half >> vmovd eax, xmm1 # Move the >> half precision resu...
2014 Aug 07
2
[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines
...regarding FMA - at least > by default. I don't have a PPC compiler to test with, but for x86-64 > using clang trunk and gcc 4.9: > > $ cat fma.c > float foo(float x, float y, float z) { return x * y + z; } > > $ ./clang -march=core-avx2 -O2 -S fma.c -o - | grep ss >     vmulss    %xmm1, %xmm0, %xmm0 >     vaddss    %xmm2, %xmm0, %xmm0 > > $ ./gcc -march=core-avx2 -O2 -S fma.c -o - | grep ss >     vfmadd132ss    %xmm1, %xmm2, %xmm0 > > ---------------------------------------------------------------------- > This was brought up in Dec 2013 on this lis...