thr3ads.net - search: "addss"

[LLVMdev] X86 - Help on fixing a poor code generation bug

2013 Dec 05

3

[LLVMdev] X86 - Help on fixing a poor code generation bug

Hi all, I noticed that the x86 backend tends to emit unnecessary vector insert instructions immediately after sse scalar fp instructions like addss/mulss. For example: ///////////////////////////////// __m128 foo(__m128 A, __m128 B) { _mm_add_ss(A, B); } ///////////////////////////////// produces the sequence: addss %xmm0, %xmm1 movss %xmm1, %xmm0 which could be easily optimized into addss %xmm1, %xmm0 The first step is to under...

[LLVMdev] X86 - Help on fixing a poor code generation bug

2013 Dec 05

0

[LLVMdev] X86 - Help on fixing a poor code generation bug

...is. I can see two approaches to solving this problem. The first one (that you suggested) is to catch this pattern after register allocation. The second approach is to eliminate this redundancy during instruction selection. Can you please look into catching this pattern during iSel? The idea is that ADDSS does an ADD plus BLEND operations, and you can easily catch them. You can add a new target specific DAGCombine or a table-ten pattern. You should also handle mul/add/sub. define <4 x float> @foo(<4 x float> %A, <4 x float> %B) nounwind readnone ssp uwtable { %1 = extractelemen...

[LLVMdev] Poor floating point optimizations?

2010 Nov 20

2

[LLVMdev] Poor floating point optimizations?

I wanted to use LLVM for my math parser but it seems that floating point optimizations are poor. For example consider such C code: float foo(float x) { return x+x+x; } and here is the code generated with "optimized" live demo: define float @foo(float %x) nounwind readnone { entry: %0 = fmul float %x, 2.000000e+00 ; <float> [#uses=1] %1 = fadd float %0, %x

[LLVMdev] Poor floating point optimizations?

2010 Nov 20

0

[LLVMdev] Poor floating point optimizations?

And also the resulting assembly code is very poor: 00460013 movss xmm0,dword ptr [esp+8] 00460019 movaps xmm1,xmm0 0046001C addss xmm1,xmm1 00460020 pxor xmm2,xmm2 00460024 addss xmm2,xmm1 00460028 addss xmm2,xmm0 0046002C movss dword ptr [esp],xmm2 00460031 fld dword ptr [esp] Especially pxor&and instead of movss (which is unnecessary anyway) is just pure madness. Bob...

[LLVMdev] llvm-gcc + abi stuff

2008 Jan 24

2

[LLVMdev] llvm-gcc + abi stuff

...;float*> [#uses=1] %tmp14 = load float* %tmp13, align 4 ; <float> [#uses=1] %tmp15 = add float %tmp12, %tmp14 ; <float> [#uses=1] ret float %tmp15 } This yields correct but suboptimal code: _foo: subq $16, %rsp movsd %xmm0, (%rsp) movsd %xmm1, 8(%rsp) movss (%rsp), %xmm0 addss 4(%rsp), %xmm0 addss 8(%rsp), %xmm0 addss 12(%rsp), %xmm0 addq $16, %rsp ret We really want: _foo: movaps %xmm0, %xmm2 shufps $1, %xmm2, %xmm2 addss %xmm2, %xmm0 addss %xmm1, %xmm0 shufps $1, %xmm1, %xmm1 addss %xmm1, %xmm0 ret -Chris

TypePromoteFloat loses intermediate rounding operations

2019 Dec 10

2

TypePromoteFloat loses intermediate rounding operations

...* @x, align 2, !dbg !29, !tbaa !22 ret void, !dbg !30 } Then SelectionDAG type legalization comes along and creates this as the final assembly pushq %rax .cfi_def_cfa_offset 16 movzwl y(%rip), %edi callq __gnu_h2f_ieee movss %xmm0, 4(%rsp) # 4-byte Spill movzwl z(%rip), %edi callq __gnu_h2f_ieee addss 4(%rsp), %xmm0 # 4-byte Folded Reload movss %xmm0, 4(%rsp) # 4-byte Spill movzwl w(%rip), %edi callq __gnu_h2f_ieee addss 4(%rsp), %xmm0 # 4-byte Folded Reload callq __gnu_f2h_ieee movw %ax, x(%rip) popq %rax I assumed SelectionDAG should produce something equivalent to the original clang code wi...

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

2

[LLVMdev] Packed instructions generaetd by LoopVectorize?

...or(int i = 0; i < n; ++i) { sum += A[i] * B[i]; } return sum; } clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - <loop body> .LBB1_1: movss (%rdi), %xmm1 addq $4, %rdi mulss (%rsi), %xmm1 addq $4, %rsi decl %edx addss %xmm1, %xmm0 jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130403/529c8ae3/attachment.html>

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

2

[PATCH] Make SSE Run Time option. Add Win32 SSE code

..._asm + { + mov eax, num + mov ebx, den + mov ecx, mem + + mov edx, in1 + movss xmm0, [edx] + + movss xmm1, [ecx] + addss xmm1, xmm0 + + mov edx, in2 + movss [edx], xmm1 + + shufps xmm0, xmm0, 0x00 + shufps xmm1, xmm1, 0x00 + + movaps xmm2, [eax+4] + movaps xmm3, [ebx+4] +...

TypePromoteFloat loses intermediate rounding operations

2019 Dec 10

2

TypePromoteFloat loses intermediate rounding operations

...comes along and creates this as the > final assembly > > > > pushq %rax > > .cfi_def_cfa_offset 16 > > movzwl y(%rip), %edi > > callq __gnu_h2f_ieee > > movss %xmm0, 4(%rsp) # 4-byte Spill > > movzwl z(%rip), %edi > > callq __gnu_h2f_ieee > > addss 4(%rsp), %xmm0 # 4-byte Folded Reload > > movss %xmm0, 4(%rsp) # 4-byte Spill > > movzwl w(%rip), %edi > > callq __gnu_h2f_ieee > > addss 4(%rsp), %xmm0 # 4-byte Folded Reload > > callq __gnu_f2h_ieee > > movw %ax, x(%rip) > > popq %rax > > > >...

[LLVMdev] Missuse of xmm register on X86-64

2010 May 07

1

[LLVMdev] Missuse of xmm register on X86-64

All, I've been working on a new scheduler and have somehow affected register selection. My problem is that an xmm register is being used as an index expression. Specifically, addss (%xmm1,%rax,4), %xmm0 I like the idea of a floating-point index, but, like the assembler, I don't know what that means. Any suggestions on where I should look for a solution to my problem? Aran -------------- next part -------------- A non-text attachment was scrubbed... Name: signature....

[LLVMdev] Poor floating point optimizations?

2010 Nov 20

3

[LLVMdev] Poor floating point optimizations?

On Nov 20, 2010, at 2:41 PM, Sdadsda Sdasdaas wrote: > And also the resulting assembly code is very poor: > > 00460013 movss xmm0,dword ptr [esp+8] > 00460019 movaps xmm1,xmm0 > 0046001C addss xmm1,xmm1 > 00460020 pxor xmm2,xmm2 > 00460024 addss xmm2,xmm1 > 00460028 addss xmm2,xmm0 > 0046002C movss dword ptr [esp],xmm2 > 00460031 fld dword ptr [esp] > > Especially pxor&and instead of movss (which is unnecessary a...

[LLVMdev] State of Loop Unrolling and Vectorization in LLVM

2013 Apr 15

1

[LLVMdev] State of Loop Unrolling and Vectorization in LLVM

...t able to see the loop being unrolled / vectorized. The microbenchmark which runs the function g() over a billion times shows quite some performance difference on gcc against clang Gcc - 8.6 seconds Clang - 12.7 seconds Evidently, the addition operation can be vectorized to use addps, (clang does addss), and the loop can be unrolled for better performance. Any idea why this is happening ? Thanks Sriram -- Sriram Murali SSG/DPD/ECDL/DMP +1 (519) 772 - 2579 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/201...

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

0

[LLVMdev] Packed instructions generaetd by LoopVectorize?

...eturn sum; > } > > clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - > > <loop body> > .LBB1_1: > movss (%rdi), %xmm1 > addq $4, %rdi > mulss (%rsi), %xmm1 > addq $4, %rsi > decl %edx > addss %xmm1, %xmm0 > jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130403/13b053c5/attachment.html>

Trouble when suppressing a portion of fast-math-transformations

2017 Sep 29

2

Trouble when suppressing a portion of fast-math-transformations

...ed, the reassociation enabled by it allows us to simply return the first argument (and that reassociation does happen with '-ffast-math', with both the old and new compilers): $ clang -c -O2 -o x.o assoc.cpp $ llvm-objdump -d x.o | grep "^ .*: " 0: f3 0f 58 c1 addss %xmm1, %xmm0 4: f3 0f 5c c1 subss %xmm1, %xmm0 8: c3 retq $ clang -c -O2 -ffast-math -o x.o assoc.cpp $ llvm-objdump -d x.o | grep "^ .*: " 0: c3 retq $ FTR, GCC also does the reassociation transformation here when '-ffast-m...

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 04

1

[LLVMdev] Packed instructions generaetd by LoopVectorize?

...or(int i = 0; i < n; ++i) { sum += A[i] * B[i]; } return sum; } clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - <loop body> .LBB1_1: movss (%rdi), %xmm1 addq $4, %rdi mulss (%rsi), %xmm1 addq $4, %rsi decl %edx addss %xmm1, %xmm0 jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130404/25c661a3/attachment.html>

[LLVMdev] better code for IV

2014 Feb 19

2

[LLVMdev] better code for IV

...# imm = 0x100000000 .align 16, 0x90 .LBB0_1: # %L_entry # =>This Inner Loop Header: Depth=1 movq %r9, %rax sarq $32, %rax movss (%rdi,%rax,4), %xmm0 addss (%rsi,%rax,4), %xmm0 movss %xmm0, (%rdx,%rax,4) addq %r8, %r9 decq %rcx jne .LBB0_1 # BB#2: Ret This is what I want to get: ArrayAdd2: # @ArrayAdd2 .cfi_startproc # BB#0:...

Trouble when suppressing a portion of fast-math-transformations

2017 Sep 29

0

Trouble when suppressing a portion of fast-math-transformations

...us to > > simply return the first argument (and that reassociation does happen with > > '-ffast-math', with both the old and new compilers): > > $ clang -c -O2 -o x.o assoc.cpp > > $ llvm-objdump -d x.o | grep "^ .*: " > > 0: f3 0f 58 c1 addss %xmm1, %xmm0 > > 4: f3 0f 5c c1 subss %xmm1, %xmm0 > > 8: c3 retq > > $ clang -c -O2 -ffast-math -o x.o assoc.cpp > > $ llvm-objdump -d x.o | grep "^ .*: " > > 0: c3 retq > > $ > > FTR, GCC also does the...

[LLVMdev] Win64 Calling Convention problem

2009 Dec 03

4

[LLVMdev] Win64 Calling Convention problem

...eax,0CCCCCCCCh 0000000140067AFC rep stos dword ptr [rdi] 0000000140067AFE mov rcx,qword ptr [rsp+20h] return v.x + v.y; 0000000140067B03 mov rax,qword ptr [v] 0000000140067B08 mov rcx,qword ptr [v] 0000000140067B0D movss xmm0,dword ptr [rax] 0000000140067B11 addss xmm0,dword ptr [rcx+4] 0000000140067B16 add rsp,10h 0000000140067B1A pop rdi 0000000140067B1B ret } --- snip --- noise4 is supposed to be called by jitted LLVM code, just like in the following example. --- snip --- target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16...

Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

2004 Aug 06

2

Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

...lea edx,[ecx+eax*4] 0041348C movss xmm0,dword ptr [edx] 00413490 shufps xmm0,xmm0,0 00413494 movaps xmmword ptr [xx],xmm0 260: yy = _mm_add_ss(xx, mem[0]); 00413498 movaps xmm0,xmmword ptr [ebp-60h] 0041349C movaps xmm1,xmmword ptr [xx] 004134A0 addss xmm1,xmm0 004134A4 movaps xmmword ptr [yy],xmm1 261: _mm_store_ss(y+i, yy); 004134AB movaps xmm0,xmmword ptr [yy] 004134B2 mov eax,dword ptr [ebp-64h] 004134B5 mov ecx,dword ptr [ebx+10h] 004134B8 lea edx,[ecx+eax*4] 004134BB movss dw...

[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.

2013 Oct 15

0

[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.

...llvm/trunk/test/CodeGen/X86/2007-01-08-InstrSched.ll (original) >> +++ llvm/trunk/test/CodeGen/X86/2007-01-08-InstrSched.ll Tue Oct 15 18:33:07 2013 >> @@ -13,10 +13,10 @@ define float @foo(float %x) nounwind { >> >> ; CHECK: mulss >> ; CHECK: mulss >> -; CHECK: addss >> ; CHECK: mulss >> -; CHECK: addss >> ; CHECK: mulss >> ; CHECK: addss >> +; CHECK: addss >> +; CHECK: addss >> ; CHECK: ret >> } >> >> Modified: llvm/trunk/test/CodeGen/X86/2009-02-26-MachineLICMBug.ll >> URL: http://llvm.org/vie...

search for: addss