thr3ads.net - search: "vaddss"

Displaying 3 results from an estimated 3 matches for "vaddss".

Did you mean: vaddrs

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

2014 Jul 31

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

Hi Tim, Thanks for the thorough explanation. It makes perfect sense. I was not aware fast-math is supposed to prevent more precision being used than what is in the standard. I came across this issue while looking into the output or different compilers. XL and Microsoft compiler seem to have that turned on by default. But I assume that clang follows what gcc does, and have that turned off.

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

2014 Aug 07

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

...lt. I don't have a PPC compiler to test with, but for x86-64 > using clang trunk and gcc 4.9: > > $ cat fma.c > float foo(float x, float y, float z) { return x * y + z; } > > $ ./clang -march=core-avx2 -O2 -S fma.c -o - | grep ss > vmulss %xmm1, %xmm0, %xmm0 > vaddss %xmm2, %xmm0, %xmm0 > > $ ./gcc -march=core-avx2 -O2 -S fma.c -o - | grep ss > vfmadd132ss %xmm1, %xmm2, %xmm0 > > ---------------------------------------------------------------------- > This was brought up in Dec 2013 on this list: > http://lists.cs.uiuc.edu/piperm...

[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.

2013 Oct 15

[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.

...-bytes) and is 4-bytes aligned. >> ; >> ; STRESS-LABEL: t1: >> -; Load out[out_start + 8].imm, this is base + 8 * 8 + 4. >> -; STRESS: vmovss 68([[BASE:[^)]+]]), [[OUT_Imm:%xmm[0-9]+]] >> -; Add high slice: out[out_start].imm, this is base + 4. >> -; STRESS-NEXT: vaddss 4([[BASE]]), [[OUT_Imm]], [[RES_Imm:%xmm[0-9]+]] >> ; Load out[out_start + 8].real, this is base + 8 * 8 + 0. >> -; STRESS-NEXT: vmovss 64([[BASE]]), [[OUT_Real:%xmm[0-9]+]] >> +; STRESS: vmovss 64([[BASE:[^(]+]]), [[OUT_Real:%xmm[0-9]+]] >> ; Add low slice: out[out_start].r...

search for: vaddss