search for: mulss

Displaying 14 results from an estimated 14 matches for "mulss".

Did you mean: mulhs
2010 Nov 03
1
[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism
...with back-to-back dependencies. It is as if all p1 = .. expressions are collected at once followed by all p2 = .. expressions and so forth. p1 = p1 * a p1 = p1 * a . . p2 = p2 * b p2 = p2 * b . . p3 = p3 * c p3 = p3 * c . . An actual excerpt of the generated x86 assembly follows: mulss %xmm8, %xmm10 mulss %xmm8, %xmm10 . . repeated 512 times . mulss %xmm7, %xmm9 mulss %xmm7, %xmm9 . . repeated 512 times . mulss %xmm6, %xmm3 mulss %xmm6, %xmm3 . . repeated 512 times . Since p1, p2, p3, and p4 are all indepe...
2007 Sep 24
2
[LLVMdev] RFC: Tail call optimization X86
...nce > +; with new fastcc has std call semantics causing a stack adjustment > +; after the function call > > Not sure if I understand this. Can you illustrate with an example? Sure The code generated used to be _array: subl $12, %esp movss LCPI1_0, %xmm0 mulss 16(%esp), %xmm0 movss %xmm0, (%esp) call L_qux$stub mulss LCPI1_0, %xmm0 addl $12, %esp ret FastCC use to be caller pops arguments so there was no stack adjustment after the call to qux. Now FastCC has callee pops arguments on return seman...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
...m5, xmm1 + addps xmm4, [ecx+20] + subps xmm2, xmm3 + movups [ecx], xmm2 + subps xmm4, xmm5 + movups [ecx+16], xmm4 + + movss xmm2, [eax+36] + mulss xmm2, xmm0 + movss xmm3, [ebx+36] + mulss xmm3, xmm1 + addss xmm2, [ecx+36] + movss xmm4, [eax+40] + mulss xmm4, xmm0 + movss xmm5, [ebx+40] +...
2007 Sep 24
0
[LLVMdev] RFC: Tail call optimization X86
...ics causing a stack adjustment >> +; after the function call >> >> Not sure if I understand this. Can you illustrate with an example? > > Sure > > The code generated used to be > _array: > subl $12, %esp > movss LCPI1_0, %xmm0 > mulss 16(%esp), %xmm0 > movss %xmm0, (%esp) > call L_qux$stub > mulss LCPI1_0, %xmm0 > addl $12, %esp > ret > > FastCC use to be caller pops arguments so there was no stack > adjustment after the > call to qux. Now FastCC has...
2013 Apr 03
2
[LLVMdev] Packed instructions generaetd by LoopVectorize?
...? Tyler float dotproduct(float *A, float *B, int n) { float sum = 0; for(int i = 0; i < n; ++i) { sum += A[i] * B[i]; } return sum; } clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - <loop body> .LBB1_1: movss (%rdi), %xmm1 addq $4, %rdi mulss (%rsi), %xmm1 addq $4, %rsi decl %edx addss %xmm1, %xmm0 jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130403/529c8ae3/attachment.html>
2007 Sep 25
2
[LLVMdev] RFC: Tail call optimization X86
...stment after the > > call to qux. Now FastCC has callee pops arguments on return semantics > > so the > > x86 backend inserts a stack adjustment after the call. > > > > _array: > > subl $12, %esp > > movss LCPI1_0, %xmm0 > > mulss 16(%esp), %xmm0 > > movss %xmm0, (%esp) > > call L_qux$stub > > subl $4, %esp << stack adjustment because qux pops > > arguments on return > > mulss LCPI1_0, %xmm0 > > addl $12, %esp > >...
2007 Sep 24
0
[LLVMdev] RFC: Tail call optimization X86
Hi Arnold, This is a very good first step! Thanks! Comments below. Evan Index: test/CodeGen/X86/constant-pool-remat-0.ll =================================================================== --- test/CodeGen/X86/constant-pool-remat-0.ll (revision 42247) +++ test/CodeGen/X86/constant-pool-remat-0.ll (working copy) @@ -1,8 +1,10 @@ ; RUN: llvm-as < %s | llc -march=x86-64 | grep LCPI | count 3 ;
2013 Dec 05
3
[LLVMdev] X86 - Help on fixing a poor code generation bug
Hi all, I noticed that the x86 backend tends to emit unnecessary vector insert instructions immediately after sse scalar fp instructions like addss/mulss. For example: ///////////////////////////////// __m128 foo(__m128 A, __m128 B) { _mm_add_ss(A, B); } ///////////////////////////////// produces the sequence: addss %xmm0, %xmm1 movss %xmm1, %xmm0 which could be easily optimized into addss %xmm1, %xmm0 The first step is to understand...
2007 Sep 23
2
[LLVMdev] RFC: Tail call optimization X86
The patch is against revision 42247. -------------- next part -------------- A non-text attachment was scrubbed... Name: tailcall-src.patch Type: application/octet-stream Size: 62639 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070923/4770302f/attachment.obj>
2013 Apr 03
0
[LLVMdev] Packed instructions generaetd by LoopVectorize?
...= 0; > for(int i = 0; i < n; ++i) { > sum += A[i] * B[i]; > } > return sum; > } > > clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - > > <loop body> > .LBB1_1: > movss (%rdi), %xmm1 > addq $4, %rdi > mulss (%rsi), %xmm1 > addq $4, %rsi > decl %edx > addss %xmm1, %xmm0 > jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130403/13b053c5/attachme...
2013 Apr 04
1
[LLVMdev] Packed instructions generaetd by LoopVectorize?
...? Tyler float dotproduct(float *A, float *B, int n) { float sum = 0; for(int i = 0; i < n; ++i) { sum += A[i] * B[i]; } return sum; } clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - <loop body> .LBB1_1: movss (%rdi), %xmm1 addq $4, %rdi mulss (%rsi), %xmm1 addq $4, %rsi decl %edx addss %xmm1, %xmm0 jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130404/25c661a3/attachment.html>
2007 Sep 25
0
[LLVMdev] RFC: Tail call optimization X86
...ux. Now FastCC has callee pops arguments on return >>> semantics >>> so the >>> x86 backend inserts a stack adjustment after the call. >>> >>> _array: >>> subl $12, %esp >>> movss LCPI1_0, %xmm0 >>> mulss 16(%esp), %xmm0 >>> movss %xmm0, (%esp) >>> call L_qux$stub >>> subl $4, %esp << stack adjustment because qux pops >>> arguments on return >>> mulss LCPI1_0, %xmm0 >>> addl $12, %esp...
2013 Dec 05
0
[LLVMdev] X86 - Help on fixing a poor code generation bug
...;4 x float> %4 } Thanks, Nadav On Dec 5, 2013, at 7:35 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote: > Hi all, > > I noticed that the x86 backend tends to emit unnecessary vector insert > instructions immediately after sse scalar fp instructions like > addss/mulss. > > For example: > ///////////////////////////////// > __m128 foo(__m128 A, __m128 B) { > _mm_add_ss(A, B); > } > ///////////////////////////////// > > produces the sequence: > addss %xmm0, %xmm1 > movss %xmm1, %xmm0 > > which could be easily optimize...
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
...=================================== >> --- llvm/trunk/test/CodeGen/X86/2007-01-08-InstrSched.ll (original) >> +++ llvm/trunk/test/CodeGen/X86/2007-01-08-InstrSched.ll Tue Oct 15 18:33:07 2013 >> @@ -13,10 +13,10 @@ define float @foo(float %x) nounwind { >> >> ; CHECK: mulss >> ; CHECK: mulss >> -; CHECK: addss >> ; CHECK: mulss >> -; CHECK: addss >> ; CHECK: mulss >> ; CHECK: addss >> +; CHECK: addss >> +; CHECK: addss >> ; CHECK: ret >> } >> >> Modified: llvm/trunk/test/CodeGen/X86/2009-02-26-Ma...