Displaying 20 results from an estimated 22 matches for "mulps".
Did you mean:
vmulps
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
...xmm0
+
+ mov edx, in2
+ movss [edx], xmm1
+
+ shufps xmm0, xmm0, 0x00
+ shufps xmm1, xmm1, 0x00
+
+ movaps xmm2, [eax+4]
+ movaps xmm3, [ebx+4]
+ mulps xmm2, xmm0
+ mulps xmm3, xmm1
+ movaps xmm4, [eax+20]
+ mulps xmm4, xmm0
+ addps xmm2, [ecx+4]
+ movaps xmm5, [ebx+20]
+ mulps xmm5, xmm1
+...
2013 Aug 22
2
New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
...ual_from_qlp_coefficients_asm_ia32
cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32_mmx
@@ -596,7 +597,7 @@
movss xmm3, xmm2
movss xmm2, xmm0
- ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm3:xmm3:xmm2
+ ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm4:xmm3:xmm2
movaps xmm1, xmm0
mulps xmm1, xmm2
addps xmm5, xmm1
@@ -619,6 +620,95 @@
ret
ALIGN 16
+cident FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
+ ;[ebp + 20] == autoc[]
+ ;[ebp + 16] == lag
+ ;[ebp + 12] == data_len
+ ;[ebp + 8] == data[]
+ ;[esp] == __m128
+ ;[esp + 16] == __m128
+
+ push ebp
+ mov ebp, es...
2008 Jul 10
3
[LLVMdev] InstructionCombining forgets alignment of globals
Hi all,
The InstructionCombining pass causes alignment of globals to be ignored.
I've attached a replacement of Fibonacci.cpp which reproduces this (I used
2.3 release). Here's the x86 code it produces:
03C20019 movaps xmm0,xmmword ptr ds:[164E799h]
03C20020 mulps xmm0,xmmword ptr ds:[164E79Ah]
03C20027 movaps xmmword ptr ds:[164E799h],xmm0
03C2002E mov esp,ebp
03C20030 pop ebp
03C20031 ret
All three SSE instructions will generate a fault for accessing unaligned
memory. Disabling InstructionCombining give...
2008 Jul 14
5
[LLVMdev] Spilled variables using unaligned moves
...ion opportunity.
The attached replacement of fibonacci.cpp generates x86 code like this:
03A70010 push ebp
03A70011 mov ebp,esp
03A70013 and esp,0FFFFFFF0h
03A70019 sub esp,1A0h
...
03A7006C movups xmmword ptr [esp+180h],xmm7
...
03A70229 mulps xmm1,xmmword ptr [esp+180h]
...
03A70682 movups xmm0,xmmword ptr [esp+180h]
Note how stores and loads use unaligned moves while it could use aligned
moves. It's also interesting that the multiply does correctly assume the
stack to be 16-byte aligned.
Is there something I...
2008 Jul 14
0
[LLVMdev] Spilled variables using unaligned moves
...lacement of fibonacci.cpp generates x86 code like
> this:
>
> 03A70010 push ebp
> 03A70011 mov ebp,esp
> 03A70013 and esp,0FFFFFFF0h
> 03A70019 sub esp,1A0h
> ...
> 03A7006C movups xmmword ptr [esp+180h],xmm7
> ...
> 03A70229 mulps xmm1,xmmword ptr [esp+180h]
> ...
> 03A70682 movups xmm0,xmmword ptr [esp+180h]
>
> Note how stores and loads use unaligned moves while it could use
> aligned moves. It’s also interesting that the multiply does
> correctly assume the stack to be 16-byte aligned.
&g...
2008 Jul 10
0
[LLVMdev] InstructionCombining forgets alignment of globals
...globals
Hi all,
The InstructionCombining pass causes alignment of globals to be ignored.
I've attached a replacement of Fibonacci.cpp which reproduces this (I used
2.3 release). Here's the x86 code it produces:
03C20019 movaps xmm0,xmmword ptr ds:[164E799h]
03C20020 mulps xmm0,xmmword ptr ds:[164E79Ah]
03C20027 movaps xmmword ptr ds:[164E799h],xmm0
03C2002E mov esp,ebp
03C20030 pop ebp
03C20031 ret
All three SSE instructions will generate a fault for accessing unaligned
memory. Disabling InstructionCombining give...
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...%xmm0,%rdx
400533: movslq %edx,%rsi
400536: sar $0x20,%rdx
40053a: vmovss 0x4006c0(,%rcx,4),%xmm0
400543: vinsertps $0x10,0x4006c0(,%rax,4),%xmm0,%xmm0
40054e: vinsertps $0x20,0x4006c0(,%rsi,4),%xmm0,%xmm0
400559: vinsertps $0x30,0x4006c0(,%rdx,4),%xmm0,%xmm0
400564: vmulps 0x144(%rip),%xmm0,%xmm0 # 4006b0
<__dso_handle+0x38>
40056c: vmovaps %xmm0,0x20046c(%rip) # 6009e0 <r>
400574: xor %eax,%eax
400576: retq
$ clang++ test.cpp -O3 -fstrict-aliasing -funroll-loops -ffast-math
-march=native -mtune=native -DSPILLING_ENSUES=1...
2008 Jul 14
0
[LLVMdev] Spilled variables using unaligned moves
...lacement of fibonacci.cpp generates x86 code like
> this:
>
> 03A70010 push ebp
> 03A70011 mov ebp,esp
> 03A70013 and esp,0FFFFFFF0h
> 03A70019 sub esp,1A0h
> ...
> 03A7006C movups xmmword ptr [esp+180h],xmm7
> ...
> 03A70229 mulps xmm1,xmmword ptr [esp+180h]
> ...
> 03A70682 movups xmm0,xmmword ptr [esp+180h]
>
> Note how stores and loads use unaligned moves while it could use
> aligned moves. It’s also interesting that the multiply does
> correctly assume the stack to be 16-byte aligned.
&g...
2012 Apr 09
1
[LLVMdev] Question about CriticalAntiDepBreaker.cpp
...ferences. This would be all well and good, were it not for the fact that the result of the expression needs to be in XMM0 because it is being returned as the function result in that register.
The end of the original code sequence, prior to being processed by CriticalAntiDepBreaker.cpp, is
...
mulps %xmm4, %xmm0
addps %xmm2, %xmm0
.LBB0_3: # %none_on
leaq 904(%rsp), %rsp
ret
The critical anti-dependency breaking code changes XMM0 to XMM9, preventing the proper value from being returned in XMM0.
The program that generates this code is the Intel SPMD Pr...
2008 Jul 15
1
[LLVMdev] Spilled variables using unaligned moves
...zation opportunity.
The attached replacement of fibonacci.cpp generates x86 code like this:
03A70010 push ebp
03A70011 mov ebp,esp
03A70013 and esp,0FFFFFFF0h
03A70019 sub esp,1A0h
...
03A7006C movups xmmword ptr [esp+180h],xmm7
...
03A70229 mulps xmm1,xmmword ptr [esp+180h]
...
03A70682 movups xmm0,xmmword ptr [esp+180h]
Note how stores and loads use unaligned moves while it could use aligned
moves. It's also interesting that the multiply does correctly assume the
stack to be 16-byte aligned.
Is there something I...
2009 Jul 07
1
Installation from source on Ubuntu 9.04, make kernel failure
Hi all,
I install Xen on a fresh Ubuntu 9.04 from source. During the code runing it shows usually "Warning: not literal format or no paramaters". By "make" linux kernel after 1 hours code running it ends with: (The installation steps is shown below)
---------------------------------------------------------------------------
WARNING: modpost: Found 1 section mismatch(es).
To
2004 Aug 06
2
Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others
...vaps xmm1,xmmword ptr [ebp-60h]
004134F0 shufps xmm1,xmm0,39h
004134F4 movaps xmmword ptr [ebp-60h],xmm1
267:
268: mem[0] = _mm_add_ps(mem[0], _mm_mul_ps(xx, num[0]));
004134F8 movaps xmm0,xmmword ptr [ebp-30h]
004134FC movaps xmm1,xmmword ptr [xx]
00413500 mulps xmm1,xmm0
00413503 movaps xmm0,xmmword ptr [ebp-60h]
00413507 addps xmm0,xmm1
0041350A movaps xmmword ptr [ebp-60h],xmm0
269:
270: mem[1] = _mm_move_ss(mem[1], mem[2]);
0041350E movaps xmm0,xmmword ptr [ebp-40h]
00413512 movaps xmm1,xmmword ptr [ebp-...
2010 Jan 03
21
Re: Xenified linux kernel
Known bug for 32-bit. Workaround is on the net.
http://mulps.wordpress.com/2009/05/29/compiling-xen-kernel-2-6-29-2/
Boris
________________________________
From: Mehdi Sheikhalishahi <mehdi.alishahi@gmail.com>
To: bderzhavets@yahoo.com
Sent: Sun, January 3, 2010 4:25:00 PM
Subject: Xenified linux kernel
Hi Dear:
I am refering to the
http://bderzh...
2004 Aug 06
0
[PATCH] Make SSE Run Time option. Add Win32 SSE code
...l chips with SSE support
> do, however no current 32 bit AMD chips support the XMM registers. They
> will support the SSE instructions but not those registers. You are right
> about the SSE2 not being used.
I'm still not sure I get it. On an Athlon XP, I can do something like
"mulps xmm0, xmm1", which means that the xmm registers are indeed
supported. Besides, without the xmm registers, you can't use much of
SSE. You can use the prefetch instructions that were in the Athlon
T-Bird, but that's about it (and I don't think that makes it SSE1).
> The AMD Opter...
2004 Aug 06
2
[PATCH] Make SSE Run Time option.
> Please note that dot products of simple vector floats are usually
> faster
> in the scalar units. The add across and transfer to scalar is just too
> expensive.
Or do four at once, with some shuffling (which is basically free);
almost the same code as a 4x4 matrix/vector multiply.
<p>Segher
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage:
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
Jean-Marc,
There is a big difference between SSE and SSEFP. The SSEFP means
that the CPU supports the xmm registers. All Intel chips with SSE support
do, however no current 32 bit AMD chips support the XMM registers. They
will support the SSE instructions but not those registers. You are right
about the SSE2 not being used.
The AMD Opterons are the first AMD CPU's which support
2007 Jul 11
1
[LLVMdev] New LLVM C front-end: "clang"
...[#uses=1]
ret <4 x float> %add
}
$ clang ~/t.c -emit-llvm | llvm-as | opt -std-compile-opts | llc -
march=ppc32 -mcpu=g5
..
_foo:
vmaddfp v2, v3, v2, v2
blr
$ clang ~/t.c -emit-llvm | llvm-as | opt -std-compile-opts | llc -
march=x86 -mcpu=yonah
..
_foo:
mulps %xmm0, %xmm1
addps %xmm0, %xmm1
movaps %xmm1, %xmm0
ret
etc.
In any case, we welcome questions, comments, and especially
patches :). To avoid spamming llvmdev, please take all discussion to
the cfe-dev mailing list, and patch submission/discussion to cfe-
commit...
2004 Aug 06
5
[PATCH] Make SSE Run Time option.
...ever need SSE2. There are some
> instructions that are legitimately useful to single precision floating
> point work, such as cvtps2dq and cvttps2dq.
There are so few conversions in Speex in the first place that it's not
even bothering with that. You get all the gain from just addps and mulps
(and the "glue instructions" that allows to use them like movaps and
shufps).
Jean-Marc
--
Jean-Marc Valin, M.Sc.A., ing. jr.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada
-------------- next part --------------
A non-text attachment w...
2010 May 29
1
[LLVMdev] Vectorized LLVM IR
On Sat, May 29, 2010 at 1:18 AM, Eli Friedman <eli.friedman at gmail.com> wrote:
> On Sat, May 29, 2010 at 12:42 AM, Stéphane Letz <letz at grame.fr> wrote:
>>
>> Le 29 mai 2010 à 01:08, Bill Wendling a écrit :
>>
>>> Hi Stéphane,
>>>
>>> The SSE support is the LLVM backend is fine. What is the code that's generated? Do you have some
2006 Apr 13
3
[LLVMdev] Re: Creating Release 1.7 Branch at 1:00pm PDT
Here's what's left on Linux (GCC 4.1.0), after all updates that went
into the branch:
Running /proj/llvm/build/../llvm/test/Regression/CFrontend/dg.exp ...
FAIL: /proj/llvm/build/../llvm/test/Regression/CFrontend/2004-02-12-
LargeAggregateCopy.c.tr:
gccas: /proj/llvm/build/../llvm/lib/VMCore/Function.cpp:266: unsigned
int llvm::Function::getIntrinsicID() const: Assertion `0 &&