thr3ads.net - search: "vmulss"

2019 Jan 22

4

_Float16 support

...rgument 1 from single to half vcvtph2ps xmm1, xmm1 # Convert argument 1 back to single vcvtps2ph xmm0, xmm0, 4 # Convert argument 0 from single to half vcvtph2ps xmm0, xmm0 # Convert argument 0 back to single vmulss xmm0, xmm0, xmm1 # xmm0 = xmm0*xmm1 (single precision) vcvtps2ph xmm1, xmm0, 4 # Convert the single precision result to half vmovd eax, xmm1 # Move the half precision result to eax mov word ptr [...

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

2014 Jul 31

2

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

Hi Tim, Thanks for the thorough explanation. It makes perfect sense. I was not aware fast-math is supposed to prevent more precision being used than what is in the standard. I came across this issue while looking into the output or different compilers. XL and Microsoft compiler seem to have that turned on by default. But I assume that clang follows what gcc does, and have that turned off.

[cfe-dev] _Float16 support

2019 Jan 24

2

[cfe-dev] _Float16 support

...e to half > vcvtph2ps xmm1, xmm1 # Convert argument 1 back to single > vcvtps2ph xmm0, xmm0, 4 # Convert argument 0 from single to half > vcvtph2ps xmm0, xmm0 # Convert argument 0 back to single > vmulss xmm0, xmm0, xmm1 # xmm0 = xmm0*xmm1 (single precision) > vcvtps2ph xmm1, xmm0, 4 # Convert the single precision result to half > vmovd eax, xmm1 # Move the half precision result to eax > mov...

[cfe-dev] _Float16 support

2019 Jan 24

4

[cfe-dev] _Float16 support

...1, xmm1 # Convert argument >> 1 back to single >> vcvtps2ph xmm0, xmm0, 4 # Convert argument 0 >> from single to half >> vcvtph2ps xmm0, xmm0 # Convert argument >> 0 back to single >> vmulss xmm0, xmm0, xmm1 # xmm0 = xmm0*xmm1 >> (single precision) >> vcvtps2ph xmm1, xmm0, 4 # Convert the single >> precision result to half >> vmovd eax, xmm1 # Move the >> half precision resu...

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

2014 Aug 07

2

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

...regarding FMA - at least > by default. I don't have a PPC compiler to test with, but for x86-64 > using clang trunk and gcc 4.9: > > $ cat fma.c > float foo(float x, float y, float z) { return x * y + z; } > > $ ./clang -march=core-avx2 -O2 -S fma.c -o - | grep ss > vmulss %xmm1, %xmm0, %xmm0 > vaddss %xmm2, %xmm0, %xmm0 > > $ ./gcc -march=core-avx2 -O2 -S fma.c -o - | grep ss > vfmadd132ss %xmm1, %xmm2, %xmm0 > > ---------------------------------------------------------------------- > This was brought up in Dec 2013 on this lis...

search for: vmulss