Displaying 3 results from an estimated 3 matches for "vaddss".
Did you mean:
vaddrs
2014 Jul 31
2
[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines
Hi Tim,
Thanks for the thorough explanation. It makes perfect sense.
I was not aware fast-math is supposed to prevent more precision being used
than what is in the standard.
I came across this issue while looking into the output or different
compilers. XL and Microsoft compiler seem
to have that turned on by default. But I assume that clang follows what gcc
does, and have that turned off.
2014 Aug 07
2
[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines
...lt. I don't have a PPC compiler to test with, but for x86-64
> using clang trunk and gcc 4.9:
>
> $ cat fma.c
> float foo(float x, float y, float z) { return x * y + z; }
>
> $ ./clang -march=core-avx2 -O2 -S fma.c -o - | grep ss
> vmulss %xmm1, %xmm0, %xmm0
> vaddss %xmm2, %xmm0, %xmm0
>
> $ ./gcc -march=core-avx2 -O2 -S fma.c -o - | grep ss
> vfmadd132ss %xmm1, %xmm2, %xmm0
>
> ----------------------------------------------------------------------
> This was brought up in Dec 2013 on this list:
> http://lists.cs.uiuc.edu/piperm...
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
...-bytes) and is 4-bytes aligned.
>> ;
>> ; STRESS-LABEL: t1:
>> -; Load out[out_start + 8].imm, this is base + 8 * 8 + 4.
>> -; STRESS: vmovss 68([[BASE:[^)]+]]), [[OUT_Imm:%xmm[0-9]+]]
>> -; Add high slice: out[out_start].imm, this is base + 4.
>> -; STRESS-NEXT: vaddss 4([[BASE]]), [[OUT_Imm]], [[RES_Imm:%xmm[0-9]+]]
>> ; Load out[out_start + 8].real, this is base + 8 * 8 + 0.
>> -; STRESS-NEXT: vmovss 64([[BASE]]), [[OUT_Real:%xmm[0-9]+]]
>> +; STRESS: vmovss 64([[BASE:[^(]+]]), [[OUT_Real:%xmm[0-9]+]]
>> ; Add low slice: out[out_start].r...