thr3ads.net - search: "mulps"

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

2

[PATCH] Make SSE Run Time option. Add Win32 SSE code

...xmm0 + + mov edx, in2 + movss [edx], xmm1 + + shufps xmm0, xmm0, 0x00 + shufps xmm1, xmm1, 0x00 + + movaps xmm2, [eax+4] + movaps xmm3, [ebx+4] + mulps xmm2, xmm0 + mulps xmm3, xmm1 + movaps xmm4, [eax+20] + mulps xmm4, xmm0 + addps xmm2, [ecx+4] + movaps xmm5, [ebx+20] + mulps xmm5, xmm1 +...

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

2013 Aug 22

2

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

...ual_from_qlp_coefficients_asm_ia32 cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32_mmx @@ -596,7 +597,7 @@ movss xmm3, xmm2 movss xmm2, xmm0 - ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm3:xmm3:xmm2 + ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm4:xmm3:xmm2 movaps xmm1, xmm0 mulps xmm1, xmm2 addps xmm5, xmm1 @@ -619,6 +620,95 @@ ret ALIGN 16 +cident FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16 + ;[ebp + 20] == autoc[] + ;[ebp + 16] == lag + ;[ebp + 12] == data_len + ;[ebp + 8] == data[] + ;[esp] == __m128 + ;[esp + 16] == __m128 + + push ebp + mov ebp, es...

[LLVMdev] InstructionCombining forgets alignment of globals

2008 Jul 10

3

[LLVMdev] InstructionCombining forgets alignment of globals

Hi all, The InstructionCombining pass causes alignment of globals to be ignored. I've attached a replacement of Fibonacci.cpp which reproduces this (I used 2.3 release). Here's the x86 code it produces: 03C20019 movaps xmm0,xmmword ptr ds:[164E799h] 03C20020 mulps xmm0,xmmword ptr ds:[164E79Ah] 03C20027 movaps xmmword ptr ds:[164E799h],xmm0 03C2002E mov esp,ebp 03C20030 pop ebp 03C20031 ret All three SSE instructions will generate a fault for accessing unaligned memory. Disabling InstructionCombining give...

[LLVMdev] Spilled variables using unaligned moves

2008 Jul 14

5

[LLVMdev] Spilled variables using unaligned moves

...ion opportunity. The attached replacement of fibonacci.cpp generates x86 code like this: 03A70010 push ebp 03A70011 mov ebp,esp 03A70013 and esp,0FFFFFFF0h 03A70019 sub esp,1A0h ... 03A7006C movups xmmword ptr [esp+180h],xmm7 ... 03A70229 mulps xmm1,xmmword ptr [esp+180h] ... 03A70682 movups xmm0,xmmword ptr [esp+180h] Note how stores and loads use unaligned moves while it could use aligned moves. It's also interesting that the multiply does correctly assume the stack to be 16-byte aligned. Is there something I...

[LLVMdev] Spilled variables using unaligned moves

2008 Jul 14

0

[LLVMdev] Spilled variables using unaligned moves

...lacement of fibonacci.cpp generates x86 code like > this: > > 03A70010 push ebp > 03A70011 mov ebp,esp > 03A70013 and esp,0FFFFFFF0h > 03A70019 sub esp,1A0h > ... > 03A7006C movups xmmword ptr [esp+180h],xmm7 > ... > 03A70229 mulps xmm1,xmmword ptr [esp+180h] > ... > 03A70682 movups xmm0,xmmword ptr [esp+180h] > > Note how stores and loads use unaligned moves while it could use > aligned moves. It’s also interesting that the multiply does > correctly assume the stack to be 16-byte aligned. &g...

[LLVMdev] InstructionCombining forgets alignment of globals

2008 Jul 10

0

[LLVMdev] InstructionCombining forgets alignment of globals

...globals Hi all, The InstructionCombining pass causes alignment of globals to be ignored. I've attached a replacement of Fibonacci.cpp which reproduces this (I used 2.3 release). Here's the x86 code it produces: 03C20019 movaps xmm0,xmmword ptr ds:[164E799h] 03C20020 mulps xmm0,xmmword ptr ds:[164E79Ah] 03C20027 movaps xmmword ptr ds:[164E799h],xmm0 03C2002E mov esp,ebp 03C20030 pop ebp 03C20031 ret All three SSE instructions will generate a fault for accessing unaligned memory. Disabling InstructionCombining give...

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

2

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

...%xmm0,%rdx 400533: movslq %edx,%rsi 400536: sar $0x20,%rdx 40053a: vmovss 0x4006c0(,%rcx,4),%xmm0 400543: vinsertps $0x10,0x4006c0(,%rax,4),%xmm0,%xmm0 40054e: vinsertps $0x20,0x4006c0(,%rsi,4),%xmm0,%xmm0 400559: vinsertps $0x30,0x4006c0(,%rdx,4),%xmm0,%xmm0 400564: vmulps 0x144(%rip),%xmm0,%xmm0 # 4006b0 <__dso_handle+0x38> 40056c: vmovaps %xmm0,0x20046c(%rip) # 6009e0 <r> 400574: xor %eax,%eax 400576: retq $ clang++ test.cpp -O3 -fstrict-aliasing -funroll-loops -ffast-math -march=native -mtune=native -DSPILLING_ENSUES=1...

[LLVMdev] Spilled variables using unaligned moves

2008 Jul 14

0

[LLVMdev] Spilled variables using unaligned moves

...lacement of fibonacci.cpp generates x86 code like > this: > > 03A70010 push ebp > 03A70011 mov ebp,esp > 03A70013 and esp,0FFFFFFF0h > 03A70019 sub esp,1A0h > ... > 03A7006C movups xmmword ptr [esp+180h],xmm7 > ... > 03A70229 mulps xmm1,xmmword ptr [esp+180h] > ... > 03A70682 movups xmm0,xmmword ptr [esp+180h] > > Note how stores and loads use unaligned moves while it could use > aligned moves. It’s also interesting that the multiply does > correctly assume the stack to be 16-byte aligned. &g...

[LLVMdev] Question about CriticalAntiDepBreaker.cpp

2012 Apr 09

1

[LLVMdev] Question about CriticalAntiDepBreaker.cpp

...ferences. This would be all well and good, were it not for the fact that the result of the expression needs to be in XMM0 because it is being returned as the function result in that register. The end of the original code sequence, prior to being processed by CriticalAntiDepBreaker.cpp, is ... mulps %xmm4, %xmm0 addps %xmm2, %xmm0 .LBB0_3: # %none_on leaq 904(%rsp), %rsp ret The critical anti-dependency breaking code changes XMM0 to XMM9, preventing the proper value from being returned in XMM0. The program that generates this code is the Intel SPMD Pr...

[LLVMdev] Spilled variables using unaligned moves

2008 Jul 15

1

[LLVMdev] Spilled variables using unaligned moves

...zation opportunity. The attached replacement of fibonacci.cpp generates x86 code like this: 03A70010 push ebp 03A70011 mov ebp,esp 03A70013 and esp,0FFFFFFF0h 03A70019 sub esp,1A0h ... 03A7006C movups xmmword ptr [esp+180h],xmm7 ... 03A70229 mulps xmm1,xmmword ptr [esp+180h] ... 03A70682 movups xmm0,xmmword ptr [esp+180h] Note how stores and loads use unaligned moves while it could use aligned moves. It's also interesting that the multiply does correctly assume the stack to be 16-byte aligned. Is there something I...

Installation from source on Ubuntu 9.04, make kernel failure

2009 Jul 07

1

Installation from source on Ubuntu 9.04, make kernel failure

Hi all, I install Xen on a fresh Ubuntu 9.04 from source. During the code runing it shows usually "Warning: not literal format or no paramaters". By "make" linux kernel after 1 hours code running it ends with: (The installation steps is shown below) --------------------------------------------------------------------------- WARNING: modpost: Found 1 section mismatch(es). To

Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

2004 Aug 06

2

Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

...vaps xmm1,xmmword ptr [ebp-60h] 004134F0 shufps xmm1,xmm0,39h 004134F4 movaps xmmword ptr [ebp-60h],xmm1 267: 268: mem[0] = _mm_add_ps(mem[0], _mm_mul_ps(xx, num[0])); 004134F8 movaps xmm0,xmmword ptr [ebp-30h] 004134FC movaps xmm1,xmmword ptr [xx] 00413500 mulps xmm1,xmm0 00413503 movaps xmm0,xmmword ptr [ebp-60h] 00413507 addps xmm0,xmm1 0041350A movaps xmmword ptr [ebp-60h],xmm0 269: 270: mem[1] = _mm_move_ss(mem[1], mem[2]); 0041350E movaps xmm0,xmmword ptr [ebp-40h] 00413512 movaps xmm1,xmmword ptr [ebp-...

Re: Xenified linux kernel

2010 Jan 03

21

Re: Xenified linux kernel

Known bug for 32-bit. Workaround is on the net. http://mulps.wordpress.com/2009/05/29/compiling-xen-kernel-2-6-29-2/ Boris ________________________________ From: Mehdi Sheikhalishahi <mehdi.alishahi@gmail.com> To: bderzhavets@yahoo.com Sent: Sun, January 3, 2010 4:25:00 PM Subject: Xenified linux kernel Hi Dear: I am refering to the http://bderzh...

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

0

[PATCH] Make SSE Run Time option. Add Win32 SSE code

...l chips with SSE support > do, however no current 32 bit AMD chips support the XMM registers. They > will support the SSE instructions but not those registers. You are right > about the SSE2 not being used. I'm still not sure I get it. On an Athlon XP, I can do something like "mulps xmm0, xmm1", which means that the xmm registers are indeed supported. Besides, without the xmm registers, you can't use much of SSE. You can use the prefetch instructions that were in the Athlon T-Bird, but that's about it (and I don't think that makes it SSE1). > The AMD Opter...

[PATCH] Make SSE Run Time option.

2004 Aug 06

2

[PATCH] Make SSE Run Time option.

> Please note that dot products of simple vector floats are usually > faster > in the scalar units. The add across and transfer to scalar is just too > expensive. Or do four at once, with some shuffling (which is basically free); almost the same code as a 4x4 matrix/vector multiply. <p>Segher --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage:

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

2

[PATCH] Make SSE Run Time option. Add Win32 SSE code

Jean-Marc, There is a big difference between SSE and SSEFP. The SSEFP means that the CPU supports the xmm registers. All Intel chips with SSE support do, however no current 32 bit AMD chips support the XMM registers. They will support the SSE instructions but not those registers. You are right about the SSE2 not being used. The AMD Opterons are the first AMD CPU's which support

[LLVMdev] New LLVM C front-end: "clang"

2007 Jul 11

1

[LLVMdev] New LLVM C front-end: "clang"

...[#uses=1] ret <4 x float> %add } $ clang ~/t.c -emit-llvm | llvm-as | opt -std-compile-opts | llc - march=ppc32 -mcpu=g5 .. _foo: vmaddfp v2, v3, v2, v2 blr $ clang ~/t.c -emit-llvm | llvm-as | opt -std-compile-opts | llc - march=x86 -mcpu=yonah .. _foo: mulps %xmm0, %xmm1 addps %xmm0, %xmm1 movaps %xmm1, %xmm0 ret etc. In any case, we welcome questions, comments, and especially patches :). To avoid spamming llvmdev, please take all discussion to the cfe-dev mailing list, and patch submission/discussion to cfe- commit...

[PATCH] Make SSE Run Time option.

2004 Aug 06

5

[PATCH] Make SSE Run Time option.

...ever need SSE2. There are some > instructions that are legitimately useful to single precision floating > point work, such as cvtps2dq and cvttps2dq. There are so few conversions in Speex in the first place that it's not even bothering with that. You get all the gain from just addps and mulps (and the "glue instructions" that allows to use them like movaps and shufps). Jean-Marc -- Jean-Marc Valin, M.Sc.A., ing. jr. LABORIUS (http://www.gel.usherb.ca/laborius) Université de Sherbrooke, Québec, Canada -------------- next part -------------- A non-text attachment w...

[LLVMdev] Vectorized LLVM IR

2010 May 29

1

[LLVMdev] Vectorized LLVM IR

On Sat, May 29, 2010 at 1:18 AM, Eli Friedman <eli.friedman at gmail.com> wrote: > On Sat, May 29, 2010 at 12:42 AM, Stéphane Letz <letz at grame.fr> wrote: >> >> Le 29 mai 2010 à 01:08, Bill Wendling a écrit : >> >>> Hi Stéphane, >>> >>> The SSE support is the LLVM backend is fine. What is the code that's generated? Do you have some

[LLVMdev] Re: Creating Release 1.7 Branch at 1:00pm PDT

2006 Apr 13

3

[LLVMdev] Re: Creating Release 1.7 Branch at 1:00pm PDT

Here's what's left on Linux (GCC 4.1.0), after all updates that went into the branch: Running /proj/llvm/build/../llvm/test/Regression/CFrontend/dg.exp ... FAIL: /proj/llvm/build/../llvm/test/Regression/CFrontend/2004-02-12- LargeAggregateCopy.c.tr: gccas: /proj/llvm/build/../llvm/lib/VMCore/Function.cpp:266: unsigned int llvm::Function::getIntrinsicID() const: Assertion `0 &&

search for: mulps