thr3ads.net - search: "movaps"

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

2013 Jul 19

0

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

...ord ptr [ebp+8] 002B00CA movsd xmm0,mmword ptr [eax+10h] 002B00CF unpcklpd xmm0,xmm0 002B00D3 movsd xmm1,mmword ptr [eax] 002B00D7 movsd xmm2,mmword ptr [eax+8] 002B00DC unpcklpd xmm2,xmm2 002B00E0 unpcklpd xmm1,xmm1 002B00E4 xorps xmm3,xmm3 002B00E7 movaps xmm4,xmm3 002B00EA movaps xmm5,xmm3 002B00ED movaps xmmword ptr [esp+4F0h],xmm2 002B00F5 movaps xmmword ptr [esp+4E0h],xmm0 002B00FD movaps xmmword ptr [esp+4D0h],xmm1 002B0105 movaps xmmword ptr [esp+4C0h],xmm5 002B010D movaps xmmword ptr [esp+4B0h],x...

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

2012 Jul 06

2

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

...on that computes an 8-point >> complex FFT, but from 16-point upwards, icc or gcc generates much >> better code. Here is an example of a sequence of instructions from a >> 32-point FFT, compiled with clang/LLVM 3.1 for x86_64 with SSE: >> >> [...] >> movaps 32(%rdi), %xmm3 >> movaps 48(%rdi), %xmm2 >> movaps %xmm3, %xmm1 ### <-- xmm3 mov'ed into xmm1 >> movaps %xmm3, %xmm4 ### <-- xmm3 mov'ed into xmm4 >> addps %xmm0, %xmm1 >> movaps %xmm1, -16(%rbp) ## 16-...

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

2012 Jul 06

0

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

...2 at 12:25 AM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: > On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: >> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote: >>> [...] >>> movaps 32(%rdi), %xmm3 >>> movaps 48(%rdi), %xmm2 >>> movaps %xmm3, %xmm1 ### <-- xmm3 mov'ed into xmm1 >>> movaps %xmm3, %xmm4 ### <-- xmm3 mov'ed into xmm4 >>> addps %xmm0, %xmm1 >>> movaps %xmm1, -16...

[LLVMdev] Bug in X86CompilationCallback_SSE

2009 Mar 11

4

[LLVMdev] Bug in X86CompilationCallback_SSE

.../ alloca is 16byte aligned asm volatile ( "movl %%eax,(%0)\n" "movl %%edx,4(%0)\n" // Save EAX/EDX/ECX "movl %%ecx,8(%0)\n" :: "r"(SAVEBUF+64): "memory" ); asm volatile ( // Save all XMM arg registers "movaps %%xmm0, (%0)\n" "movaps %%xmm1, 16(%0)\n" "movaps %%xmm2, 32(%0)\n" "movaps %%xmm3, 48(%0)\n" :: "r"(SAVEBUF) : "memory" ); intptr_t *StackPtr=0, RetAddr=0; asm volatile ( // get stack ptr and retaddr &quot...

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 19

4

[LLVMdev] SIMD instructions and memory alignment on X86

Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it. On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote: > Is there something specifically required to enable SSE? If it's not > detected as available (based from the target triple?) then I don't think

[LLVMdev] Bug in X86CompilationCallback_SSE

2009 Mar 10

2

[LLVMdev] Bug in X86CompilationCallback_SSE

...k_SSE+4>: push %edx 0xb74544fd <X86CompilationCallback_SSE+5>: push %ecx 0xb74544fe <X86CompilationCallback_SSE+6>: and $0xfffffff0,%esp 0xb7454501 <X86CompilationCallback_SSE+9>: sub $0x40,%esp 0xb7454504 <X86CompilationCallback_SSE+12>: movaps %xmm0,(%esp) 0xb7454508 <X86CompilationCallback_SSE+16>: movaps %xmm1,0x10(%esp) 0xb745450d <X86CompilationCallback_SSE+21>: movaps %xmm2,0x20(%esp) 0xb7454512 <X86CompilationCallback_SSE+26>: movaps %xmm3,0x30(%esp) 0xb7454517 <X86CompilationCallback_SSE+31>:...

Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

2004 Aug 06

2

Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

...8: /* Compute next filter result */ 259: xx = _mm_load_ps1(x+i); 00413483 mov eax,dword ptr [ebp-64h] 00413486 mov ecx,dword ptr [ebx+8] 00413489 lea edx,[ecx+eax*4] 0041348C movss xmm0,dword ptr [edx] 00413490 shufps xmm0,xmm0,0 00413494 movaps xmmword ptr [xx],xmm0 260: yy = _mm_add_ss(xx, mem[0]); 00413498 movaps xmm0,xmmword ptr [ebp-60h] 0041349C movaps xmm1,xmmword ptr [xx] 004134A0 addss xmm1,xmm0 004134A4 movaps xmmword ptr [yy],xmm1 261: _mm_store_ss(y+i, yy); 004134AB movaps...

[LLVMdev] Bug in X86CompilationCallback_SSE

2009 Mar 11

0

[LLVMdev] Bug in X86CompilationCallback_SSE

Hello, Corrado > Before you can correctly invoke a function via the Procedure Linkage > Table (plt), the ABI mandates that ebx is pointing to the GOT (Global > Offset Table) (see http://www.greyhat.ch/lab/downloads/pic.html) This is known issue, just nobody realized, that we have bunch of non- PIC-aware assembler code. :) Fixing would be not so trivial though, mostly due to ABI

[LLVMdev] Bug in X86CompilationCallback_SSE

2009 Mar 12

0

[LLVMdev] Bug in X86CompilationCallback_SSE

...asm volatile ( > "movl %%eax,(%0)\n" > "movl %%edx,4(%0)\n" // Save EAX/EDX/ECX > "movl %%ecx,8(%0)\n" > :: "r"(SAVEBUF+64): "memory" ); > > asm volatile ( > // Save all XMM arg registers > "movaps %%xmm0, (%0)\n" > "movaps %%xmm1, 16(%0)\n" > "movaps %%xmm2, 32(%0)\n" > "movaps %%xmm3, 48(%0)\n" > :: "r"(SAVEBUF) : "memory" ); > > intptr_t *StackPtr=0, RetAddr=0; > > asm volatile ( // get...

[LLVMdev] Bug in X86CompilationCallback_SSE

2009 Mar 12

0

[LLVMdev] Bug in X86CompilationCallback_SSE

...asm volatile ( > "movl %%eax,(%0)\n" > "movl %%edx,4(%0)\n" // Save EAX/EDX/ECX > "movl %%ecx,8(%0)\n" > :: "r"(SAVEBUF+64): "memory" ); > > asm volatile ( > // Save all XMM arg registers > "movaps %%xmm0, (%0)\n" > "movaps %%xmm1, 16(%0)\n" > "movaps %%xmm2, 32(%0)\n" > "movaps %%xmm3, 48(%0)\n" > :: "r"(SAVEBUF) : "memory" ); > > intptr_t *StackPtr=0, RetAddr=0; > > asm volatile ( // get...

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

2012 Jul 06

0

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

...rates good code for a function that computes an 8-point > complex FFT, but from 16-point upwards, icc or gcc generates much > better code. Here is an example of a sequence of instructions from a > 32-point FFT, compiled with clang/LLVM 3.1 for x86_64 with SSE: > > [...] > movaps 32(%rdi), %xmm3 > movaps 48(%rdi), %xmm2 > movaps %xmm3, %xmm1 ### <-- xmm3 mov'ed into xmm1 > movaps %xmm3, %xmm4 ### <-- xmm3 mov'ed into xmm4 > addps %xmm0, %xmm1 > movaps %xmm1, -16(%rbp) ## 16-byte Spill > movaps 144(%rdi), %xmm3 ### <-...

[LLVMdev] movaps being generated despite alignment 1 being specified

2007 Oct 19

0

[LLVMdev] movaps being generated despite alignment 1 being specified

On Oct 18, 2007, at 1:52 PM, Chuck Rose III wrote: > > Here are the instructions for evaluateDependents. The JITter > hasn’t compiled foo yet. What’s confusing to me is why did my > movups suddenly become a movaps? All the stores and loads have > align 1 on them. Hi Chuck, I believe this is a bug but am unable to reproduce it with the test case you've provided. I should be able to see the same problem using llc since the code generator is going through all the same passes. The only differen...

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

2012 Jul 06

2

[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW

...generated by FFTW. LLVM generates good code for a function that computes an 8-point complex FFT, but from 16-point upwards, icc or gcc generates much better code. Here is an example of a sequence of instructions from a 32-point FFT, compiled with clang/LLVM 3.1 for x86_64 with SSE: [...] movaps 32(%rdi), %xmm3 movaps 48(%rdi), %xmm2 movaps %xmm3, %xmm1 ### <-- xmm3 mov'ed into xmm1 movaps %xmm3, %xmm4 ### <-- xmm3 mov'ed into xmm4 addps %xmm0, %xmm1 movaps %xmm1, -16(%rbp) ## 16-byte Spill movaps 144(%rdi), %xmm3 ### <-- new data mov'ed into xmm...

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

2013 Aug 22

2

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

...AC__lpc_compute_residual_from_qlp_coefficients_asm_ia32 cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32_mmx @@ -596,7 +597,7 @@ movss xmm3, xmm2 movss xmm2, xmm0 - ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm3:xmm3:xmm2 + ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm4:xmm3:xmm2 movaps xmm1, xmm0 mulps xmm1, xmm2 addps xmm5, xmm1 @@ -619,6 +620,95 @@ ret ALIGN 16 +cident FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16 + ;[ebp + 20] == autoc[] + ;[ebp + 16] == lag + ;[ebp + 12] == data_len + ;[ebp + 8] == data[] + ;[esp] == __m128 + ;[esp + 16] == __m128 + + pu...

[LLVMdev] Instruction MVT::ValueTypes

2008 Sep 03

3

[LLVMdev] Instruction MVT::ValueTypes

...6:47, Evan Cheng wrote: > On Sep 2, 2008, at 10:42 AM, David Greene wrote: > > Is there an easy way to get the MVT::ValueType of a MachineInstruction > > MachineOperand? For example, the register operand of an x86 MOVAPD > > should > > have an MVT::ValueType of v2f64. A MOVAPS register operand should > > have an > > MVT::ValueType of v4f32. > > The short answer is no. A op of a number of different VTs can map to > the same instruction. In general, that may be true, but for most instructions isn't it 1:1? What are some examples where it isn...

[LLVMdev] Shuffle regression

2008 Jul 12

2

[LLVMdev] Shuffle regression

...fails, and I suspect that the issue is still present. 2.3 generates the following x86 code: 03A10010 push ebp 03A10011 mov ebp,esp 03A10013 and esp,0FFFFFFF0h 03A10019 movups xmm0,xmmword ptr ds:[141D280h] 03A10020 xorps xmm1,xmm1 03A10023 movaps xmm2,xmm0 03A10026 shufps xmm2,xmm1,32h 03A1002A movaps xmm1,xmm0 03A1002D shufps xmm1,xmm2,84h 03A10031 shufps xmm0,xmm1,23h 03A10035 shufps xmm1,xmm1,40h 03A10039 shufps xmm1,xmm0,2Eh 03A1003D movups xmmword ptr ds:[14262C0h],xmm1 03A...

[LLVMdev] movaps being generated despite alignment 1 being specified

2007 Oct 18

3

[LLVMdev] movaps being generated despite alignment 1 being specified

...oping you'll have an idea what's going on or at least know if it's a new issue I should log. It's related to the stack alignment issue that I know is being worked on, but seems sufficiently different to ask about it here. I checked the bug database for "align" and "movaps" and didn't see this issue raised. Ok, the first bit of code here seems to generate correct assembly for me. Basically, it copies the float4 stored at globalV and copies it into the address pointed to by dependentV. Along the way, it creates a <4 x float> and copies globalV int...

[LLVMdev] MCJIT generates MOVAPS on unaligned address

2014 Aug 07

3

[LLVMdev] MCJIT generates MOVAPS on unaligned address

...ough 'opt -slp-vectorizer' results in no code changes. What could I be missing here? Frank On 08/07/2014 04:29 PM, Arnold Schwaighofer wrote: >> On Aug 7, 2014, at 12:42 PM, Frank Winter <fwinter at jlab.org> wrote: >> >> MCJIT when lowering to x86-64 generates a MOVAPS (Move Aligned Packed Single-Precision Floating-Point Values) on a non-aligned memory address: >> >> movaps 88(%rdx), %xmm0 >> >> where %rdx comes in as a function argument with only natural alignment (float*). This x86 instruction requires the memory address to be 16...

[LLVMdev] Instruction MVT::ValueTypes

2008 Sep 03

0

[LLVMdev] Instruction MVT::ValueTypes

...; On Sep 2, 2008, at 10:42 AM, David Greene wrote: >>> Is there an easy way to get the MVT::ValueType of a >>> MachineInstruction >>> MachineOperand? For example, the register operand of an x86 MOVAPD >>> should >>> have an MVT::ValueType of v2f64. A MOVAPS register operand should >>> have an >>> MVT::ValueType of v4f32. >> >> The short answer is no. A op of a number of different VTs can map to >> the same instruction. > > In general, that may be true, but for most instructions isn't it > 1:1? What &...

[LLVMdev] MCJIT generates MOVAPS on unaligned address

2014 Aug 07

2

[LLVMdev] MCJIT generates MOVAPS on unaligned address

...ould I be missing here? >> >> Frank >> >> >> On 08/07/2014 04:29 PM, Arnold Schwaighofer wrote: >>>> On Aug 7, 2014, at 12:42 PM, Frank Winter <fwinter at jlab.org> wrote: >>>> >>>> MCJIT when lowering to x86-64 generates a MOVAPS (Move Aligned Packed Single-Precision Floating-Point Values) on a non-aligned memory address: >>>> >>>> movaps 88(%rdx), %xmm0 >>>> >>>> where %rdx comes in as a function argument with only natural alignment (float*). This x86 instruction requi...

search for: movaps