Displaying 14 results from an estimated 14 matches for "mulss".
Did you mean:
mulhs
2010 Nov 03
1
[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism
...with back-to-back dependencies. It is as if
all p1 = .. expressions are collected at once followed by all p2 = .. expressions and so
forth.
p1 = p1 * a
p1 = p1 * a
.
.
p2 = p2 * b
p2 = p2 * b
.
.
p3 = p3 * c
p3 = p3 * c
.
.
An actual excerpt of the generated x86 assembly follows:
mulss %xmm8, %xmm10
mulss %xmm8, %xmm10
.
. repeated 512 times
.
mulss %xmm7, %xmm9
mulss %xmm7, %xmm9
.
. repeated 512 times
.
mulss %xmm6, %xmm3
mulss %xmm6, %xmm3
.
. repeated 512 times
.
Since p1, p2, p3, and p4 are all indepe...
2007 Sep 24
2
[LLVMdev] RFC: Tail call optimization X86
...nce
> +; with new fastcc has std call semantics causing a stack adjustment
> +; after the function call
>
> Not sure if I understand this. Can you illustrate with an example?
Sure
The code generated used to be
_array:
subl $12, %esp
movss LCPI1_0, %xmm0
mulss 16(%esp), %xmm0
movss %xmm0, (%esp)
call L_qux$stub
mulss LCPI1_0, %xmm0
addl $12, %esp
ret
FastCC use to be caller pops arguments so there was no stack
adjustment after the
call to qux. Now FastCC has callee pops arguments on return seman...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
...m5, xmm1
+ addps xmm4, [ecx+20]
+ subps xmm2, xmm3
+ movups [ecx], xmm2
+ subps xmm4, xmm5
+ movups [ecx+16], xmm4
+
+ movss xmm2, [eax+36]
+ mulss xmm2, xmm0
+ movss xmm3, [ebx+36]
+ mulss xmm3, xmm1
+ addss xmm2, [ecx+36]
+ movss xmm4, [eax+40]
+ mulss xmm4, xmm0
+ movss xmm5, [ebx+40]
+...
2007 Sep 24
0
[LLVMdev] RFC: Tail call optimization X86
...ics causing a stack adjustment
>> +; after the function call
>>
>> Not sure if I understand this. Can you illustrate with an example?
>
> Sure
>
> The code generated used to be
> _array:
> subl $12, %esp
> movss LCPI1_0, %xmm0
> mulss 16(%esp), %xmm0
> movss %xmm0, (%esp)
> call L_qux$stub
> mulss LCPI1_0, %xmm0
> addl $12, %esp
> ret
>
> FastCC use to be caller pops arguments so there was no stack
> adjustment after the
> call to qux. Now FastCC has...
2013 Apr 03
2
[LLVMdev] Packed instructions generaetd by LoopVectorize?
...?
Tyler
float dotproduct(float *A, float *B, int n) {
float sum = 0;
for(int i = 0; i < n; ++i) {
sum += A[i] * B[i];
}
return sum;
}
clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o -
<loop body>
.LBB1_1:
movss (%rdi), %xmm1
addq $4, %rdi
mulss (%rsi), %xmm1
addq $4, %rsi
decl %edx
addss %xmm1, %xmm0
jne .LBB1_1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130403/529c8ae3/attachment.html>
2007 Sep 25
2
[LLVMdev] RFC: Tail call optimization X86
...stment after the
> > call to qux. Now FastCC has callee pops arguments on return semantics
> > so the
> > x86 backend inserts a stack adjustment after the call.
> >
> > _array:
> > subl $12, %esp
> > movss LCPI1_0, %xmm0
> > mulss 16(%esp), %xmm0
> > movss %xmm0, (%esp)
> > call L_qux$stub
> > subl $4, %esp << stack adjustment because qux pops
> > arguments on return
> > mulss LCPI1_0, %xmm0
> > addl $12, %esp
> >...
2007 Sep 24
0
[LLVMdev] RFC: Tail call optimization X86
Hi Arnold,
This is a very good first step! Thanks! Comments below.
Evan
Index: test/CodeGen/X86/constant-pool-remat-0.ll
===================================================================
--- test/CodeGen/X86/constant-pool-remat-0.ll (revision 42247)
+++ test/CodeGen/X86/constant-pool-remat-0.ll (working copy)
@@ -1,8 +1,10 @@
; RUN: llvm-as < %s | llc -march=x86-64 | grep LCPI | count 3
;
2013 Dec 05
3
[LLVMdev] X86 - Help on fixing a poor code generation bug
Hi all,
I noticed that the x86 backend tends to emit unnecessary vector insert
instructions immediately after sse scalar fp instructions like
addss/mulss.
For example:
/////////////////////////////////
__m128 foo(__m128 A, __m128 B) {
_mm_add_ss(A, B);
}
/////////////////////////////////
produces the sequence:
addss %xmm0, %xmm1
movss %xmm1, %xmm0
which could be easily optimized into
addss %xmm1, %xmm0
The first step is to understand...
2007 Sep 23
2
[LLVMdev] RFC: Tail call optimization X86
The patch is against revision 42247.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tailcall-src.patch
Type: application/octet-stream
Size: 62639 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070923/4770302f/attachment.obj>
2013 Apr 03
0
[LLVMdev] Packed instructions generaetd by LoopVectorize?
...= 0;
> for(int i = 0; i < n; ++i) {
> sum += A[i] * B[i];
> }
> return sum;
> }
>
> clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o -
>
> <loop body>
> .LBB1_1:
> movss (%rdi), %xmm1
> addq $4, %rdi
> mulss (%rsi), %xmm1
> addq $4, %rsi
> decl %edx
> addss %xmm1, %xmm0
> jne .LBB1_1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130403/13b053c5/attachme...
2013 Apr 04
1
[LLVMdev] Packed instructions generaetd by LoopVectorize?
...?
Tyler
float dotproduct(float *A, float *B, int n) {
float sum = 0;
for(int i = 0; i < n; ++i) {
sum += A[i] * B[i];
}
return sum;
}
clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o -
<loop body>
.LBB1_1:
movss (%rdi), %xmm1
addq $4, %rdi
mulss (%rsi), %xmm1
addq $4, %rsi
decl %edx
addss %xmm1, %xmm0
jne .LBB1_1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130404/25c661a3/attachment.html>
2007 Sep 25
0
[LLVMdev] RFC: Tail call optimization X86
...ux. Now FastCC has callee pops arguments on return
>>> semantics
>>> so the
>>> x86 backend inserts a stack adjustment after the call.
>>>
>>> _array:
>>> subl $12, %esp
>>> movss LCPI1_0, %xmm0
>>> mulss 16(%esp), %xmm0
>>> movss %xmm0, (%esp)
>>> call L_qux$stub
>>> subl $4, %esp << stack adjustment because qux pops
>>> arguments on return
>>> mulss LCPI1_0, %xmm0
>>> addl $12, %esp...
2013 Dec 05
0
[LLVMdev] X86 - Help on fixing a poor code generation bug
...;4 x float> %4
}
Thanks,
Nadav
On Dec 5, 2013, at 7:35 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
> Hi all,
>
> I noticed that the x86 backend tends to emit unnecessary vector insert
> instructions immediately after sse scalar fp instructions like
> addss/mulss.
>
> For example:
> /////////////////////////////////
> __m128 foo(__m128 A, __m128 B) {
> _mm_add_ss(A, B);
> }
> /////////////////////////////////
>
> produces the sequence:
> addss %xmm0, %xmm1
> movss %xmm1, %xmm0
>
> which could be easily optimize...
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
...===================================
>> --- llvm/trunk/test/CodeGen/X86/2007-01-08-InstrSched.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/2007-01-08-InstrSched.ll Tue Oct 15 18:33:07 2013
>> @@ -13,10 +13,10 @@ define float @foo(float %x) nounwind {
>>
>> ; CHECK: mulss
>> ; CHECK: mulss
>> -; CHECK: addss
>> ; CHECK: mulss
>> -; CHECK: addss
>> ; CHECK: mulss
>> ; CHECK: addss
>> +; CHECK: addss
>> +; CHECK: addss
>> ; CHECK: ret
>> }
>>
>> Modified: llvm/trunk/test/CodeGen/X86/2009-02-26-Ma...