thr3ads.net - search: "mulss"

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

2010 Nov 03

1

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

...with back-to-back dependencies. It is as if all p1 = .. expressions are collected at once followed by all p2 = .. expressions and so forth. p1 = p1 * a p1 = p1 * a . . p2 = p2 * b p2 = p2 * b . . p3 = p3 * c p3 = p3 * c . . An actual excerpt of the generated x86 assembly follows: mulss %xmm8, %xmm10 mulss %xmm8, %xmm10 . . repeated 512 times . mulss %xmm7, %xmm9 mulss %xmm7, %xmm9 . . repeated 512 times . mulss %xmm6, %xmm3 mulss %xmm6, %xmm3 . . repeated 512 times . Since p1, p2, p3, and p4 are all indepe...

[LLVMdev] RFC: Tail call optimization X86

2007 Sep 24

2

[LLVMdev] RFC: Tail call optimization X86

...nce > +; with new fastcc has std call semantics causing a stack adjustment > +; after the function call > > Not sure if I understand this. Can you illustrate with an example? Sure The code generated used to be _array: subl $12, %esp movss LCPI1_0, %xmm0 mulss 16(%esp), %xmm0 movss %xmm0, (%esp) call L_qux$stub mulss LCPI1_0, %xmm0 addl $12, %esp ret FastCC use to be caller pops arguments so there was no stack adjustment after the call to qux. Now FastCC has callee pops arguments on return seman...

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

2

[PATCH] Make SSE Run Time option. Add Win32 SSE code

...m5, xmm1 + addps xmm4, [ecx+20] + subps xmm2, xmm3 + movups [ecx], xmm2 + subps xmm4, xmm5 + movups [ecx+16], xmm4 + + movss xmm2, [eax+36] + mulss xmm2, xmm0 + movss xmm3, [ebx+36] + mulss xmm3, xmm1 + addss xmm2, [ecx+36] + movss xmm4, [eax+40] + mulss xmm4, xmm0 + movss xmm5, [ebx+40] +...

[LLVMdev] RFC: Tail call optimization X86

2007 Sep 24

0

[LLVMdev] RFC: Tail call optimization X86

...ics causing a stack adjustment >> +; after the function call >> >> Not sure if I understand this. Can you illustrate with an example? > > Sure > > The code generated used to be > _array: > subl $12, %esp > movss LCPI1_0, %xmm0 > mulss 16(%esp), %xmm0 > movss %xmm0, (%esp) > call L_qux$stub > mulss LCPI1_0, %xmm0 > addl $12, %esp > ret > > FastCC use to be caller pops arguments so there was no stack > adjustment after the > call to qux. Now FastCC has...

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

2

[LLVMdev] Packed instructions generaetd by LoopVectorize?

...? Tyler float dotproduct(float *A, float *B, int n) { float sum = 0; for(int i = 0; i < n; ++i) { sum += A[i] * B[i]; } return sum; } clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - <loop body> .LBB1_1: movss (%rdi), %xmm1 addq $4, %rdi mulss (%rsi), %xmm1 addq $4, %rsi decl %edx addss %xmm1, %xmm0 jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130403/529c8ae3/attachment.html>

[LLVMdev] RFC: Tail call optimization X86

2007 Sep 25

2

[LLVMdev] RFC: Tail call optimization X86

...stment after the > > call to qux. Now FastCC has callee pops arguments on return semantics > > so the > > x86 backend inserts a stack adjustment after the call. > > > > _array: > > subl $12, %esp > > movss LCPI1_0, %xmm0 > > mulss 16(%esp), %xmm0 > > movss %xmm0, (%esp) > > call L_qux$stub > > subl $4, %esp << stack adjustment because qux pops > > arguments on return > > mulss LCPI1_0, %xmm0 > > addl $12, %esp > >...

[LLVMdev] RFC: Tail call optimization X86

2007 Sep 24

0

[LLVMdev] RFC: Tail call optimization X86

Hi Arnold, This is a very good first step! Thanks! Comments below. Evan Index: test/CodeGen/X86/constant-pool-remat-0.ll =================================================================== --- test/CodeGen/X86/constant-pool-remat-0.ll (revision 42247) +++ test/CodeGen/X86/constant-pool-remat-0.ll (working copy) @@ -1,8 +1,10 @@ ; RUN: llvm-as < %s | llc -march=x86-64 | grep LCPI | count 3 ;

[LLVMdev] X86 - Help on fixing a poor code generation bug

2013 Dec 05

3

[LLVMdev] X86 - Help on fixing a poor code generation bug

Hi all, I noticed that the x86 backend tends to emit unnecessary vector insert instructions immediately after sse scalar fp instructions like addss/mulss. For example: ///////////////////////////////// __m128 foo(__m128 A, __m128 B) { _mm_add_ss(A, B); } ///////////////////////////////// produces the sequence: addss %xmm0, %xmm1 movss %xmm1, %xmm0 which could be easily optimized into addss %xmm1, %xmm0 The first step is to understand...

[LLVMdev] RFC: Tail call optimization X86

2007 Sep 23

2

[LLVMdev] RFC: Tail call optimization X86

The patch is against revision 42247. -------------- next part -------------- A non-text attachment was scrubbed... Name: tailcall-src.patch Type: application/octet-stream Size: 62639 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070923/4770302f/attachment.obj>

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

0

[LLVMdev] Packed instructions generaetd by LoopVectorize?

...= 0; > for(int i = 0; i < n; ++i) { > sum += A[i] * B[i]; > } > return sum; > } > > clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - > > <loop body> > .LBB1_1: > movss (%rdi), %xmm1 > addq $4, %rdi > mulss (%rsi), %xmm1 > addq $4, %rsi > decl %edx > addss %xmm1, %xmm0 > jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130403/13b053c5/attachme...

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 04

1

[LLVMdev] Packed instructions generaetd by LoopVectorize?

...? Tyler float dotproduct(float *A, float *B, int n) { float sum = 0; for(int i = 0; i < n; ++i) { sum += A[i] * B[i]; } return sum; } clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - <loop body> .LBB1_1: movss (%rdi), %xmm1 addq $4, %rdi mulss (%rsi), %xmm1 addq $4, %rsi decl %edx addss %xmm1, %xmm0 jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130404/25c661a3/attachment.html>

[LLVMdev] RFC: Tail call optimization X86

2007 Sep 25

0

[LLVMdev] RFC: Tail call optimization X86

...ux. Now FastCC has callee pops arguments on return >>> semantics >>> so the >>> x86 backend inserts a stack adjustment after the call. >>> >>> _array: >>> subl $12, %esp >>> movss LCPI1_0, %xmm0 >>> mulss 16(%esp), %xmm0 >>> movss %xmm0, (%esp) >>> call L_qux$stub >>> subl $4, %esp << stack adjustment because qux pops >>> arguments on return >>> mulss LCPI1_0, %xmm0 >>> addl $12, %esp...

[LLVMdev] X86 - Help on fixing a poor code generation bug

2013 Dec 05

0

[LLVMdev] X86 - Help on fixing a poor code generation bug

...;4 x float> %4 } Thanks, Nadav On Dec 5, 2013, at 7:35 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote: > Hi all, > > I noticed that the x86 backend tends to emit unnecessary vector insert > instructions immediately after sse scalar fp instructions like > addss/mulss. > > For example: > ///////////////////////////////// > __m128 foo(__m128 A, __m128 B) { > _mm_add_ss(A, B); > } > ///////////////////////////////// > > produces the sequence: > addss %xmm0, %xmm1 > movss %xmm1, %xmm0 > > which could be easily optimize...

[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.

2013 Oct 15

0

[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.

...=================================== >> --- llvm/trunk/test/CodeGen/X86/2007-01-08-InstrSched.ll (original) >> +++ llvm/trunk/test/CodeGen/X86/2007-01-08-InstrSched.ll Tue Oct 15 18:33:07 2013 >> @@ -13,10 +13,10 @@ define float @foo(float %x) nounwind { >> >> ; CHECK: mulss >> ; CHECK: mulss >> -; CHECK: addss >> ; CHECK: mulss >> -; CHECK: addss >> ; CHECK: mulss >> ; CHECK: addss >> +; CHECK: addss >> +; CHECK: addss >> ; CHECK: ret >> } >> >> Modified: llvm/trunk/test/CodeGen/X86/2009-02-26-Ma...

search for: mulss