Displaying 20 results from an estimated 130 matches for "movaps".
2013 Jul 19
0
[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
...ord ptr [ebp+8]
002B00CA movsd xmm0,mmword ptr [eax+10h]
002B00CF unpcklpd xmm0,xmm0
002B00D3 movsd xmm1,mmword ptr [eax]
002B00D7 movsd xmm2,mmword ptr [eax+8]
002B00DC unpcklpd xmm2,xmm2
002B00E0 unpcklpd xmm1,xmm1
002B00E4 xorps xmm3,xmm3
002B00E7 movaps xmm4,xmm3
002B00EA movaps xmm5,xmm3
002B00ED movaps xmmword ptr [esp+4F0h],xmm2
002B00F5 movaps xmmword ptr [esp+4E0h],xmm0
002B00FD movaps xmmword ptr [esp+4D0h],xmm1
002B0105 movaps xmmword ptr [esp+4C0h],xmm5
002B010D movaps xmmword ptr [esp+4B0h],x...
2012 Jul 06
2
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
...on that computes an 8-point
>> complex FFT, but from 16-point upwards, icc or gcc generates much
>> better code. Here is an example of a sequence of instructions from a
>> 32-point FFT, compiled with clang/LLVM 3.1 for x86_64 with SSE:
>>
>> [...]
>> movaps 32(%rdi), %xmm3
>> movaps 48(%rdi), %xmm2
>> movaps %xmm3, %xmm1 ### <-- xmm3 mov'ed into xmm1
>> movaps %xmm3, %xmm4 ### <-- xmm3 mov'ed into xmm4
>> addps %xmm0, %xmm1
>> movaps %xmm1, -16(%rbp) ## 16-...
2012 Jul 06
0
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
...2 at 12:25 AM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
> On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
>>> [...]
>>> movaps 32(%rdi), %xmm3
>>> movaps 48(%rdi), %xmm2
>>> movaps %xmm3, %xmm1 ### <-- xmm3 mov'ed into xmm1
>>> movaps %xmm3, %xmm4 ### <-- xmm3 mov'ed into xmm4
>>> addps %xmm0, %xmm1
>>> movaps %xmm1, -16...
2009 Mar 11
4
[LLVMdev] Bug in X86CompilationCallback_SSE
.../ alloca is 16byte aligned
asm volatile (
"movl %%eax,(%0)\n"
"movl %%edx,4(%0)\n" // Save EAX/EDX/ECX
"movl %%ecx,8(%0)\n"
:: "r"(SAVEBUF+64): "memory" );
asm volatile (
// Save all XMM arg registers
"movaps %%xmm0, (%0)\n"
"movaps %%xmm1, 16(%0)\n"
"movaps %%xmm2, 32(%0)\n"
"movaps %%xmm3, 48(%0)\n"
:: "r"(SAVEBUF) : "memory" );
intptr_t *StackPtr=0, RetAddr=0;
asm volatile ( // get stack ptr and retaddr
"...
2013 Jul 19
4
[LLVMdev] SIMD instructions and memory alignment on X86
Hmm, I'm not able to get those .ll files to compile if I disable SSE and I
end up with SSE instructions(including sqrtpd) if I don't disable it.
On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote:
> Is there something specifically required to enable SSE? If it's not
> detected as available (based from the target triple?) then I don't think
2009 Mar 10
2
[LLVMdev] Bug in X86CompilationCallback_SSE
...k_SSE+4>: push %edx
0xb74544fd <X86CompilationCallback_SSE+5>: push %ecx
0xb74544fe <X86CompilationCallback_SSE+6>: and $0xfffffff0,%esp
0xb7454501 <X86CompilationCallback_SSE+9>: sub $0x40,%esp
0xb7454504 <X86CompilationCallback_SSE+12>: movaps %xmm0,(%esp)
0xb7454508 <X86CompilationCallback_SSE+16>: movaps %xmm1,0x10(%esp)
0xb745450d <X86CompilationCallback_SSE+21>: movaps %xmm2,0x20(%esp)
0xb7454512 <X86CompilationCallback_SSE+26>: movaps %xmm3,0x30(%esp)
0xb7454517 <X86CompilationCallback_SSE+31>:...
2004 Aug 06
2
Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others
...8: /* Compute next filter result */
259: xx = _mm_load_ps1(x+i);
00413483 mov eax,dword ptr [ebp-64h]
00413486 mov ecx,dword ptr [ebx+8]
00413489 lea edx,[ecx+eax*4]
0041348C movss xmm0,dword ptr [edx]
00413490 shufps xmm0,xmm0,0
00413494 movaps xmmword ptr [xx],xmm0
260: yy = _mm_add_ss(xx, mem[0]);
00413498 movaps xmm0,xmmword ptr [ebp-60h]
0041349C movaps xmm1,xmmword ptr [xx]
004134A0 addss xmm1,xmm0
004134A4 movaps xmmword ptr [yy],xmm1
261: _mm_store_ss(y+i, yy);
004134AB movaps...
2009 Mar 11
0
[LLVMdev] Bug in X86CompilationCallback_SSE
Hello, Corrado
> Before you can correctly invoke a function via the Procedure Linkage
> Table (plt), the ABI mandates that ebx is pointing to the GOT (Global
> Offset Table) (see http://www.greyhat.ch/lab/downloads/pic.html)
This is known issue, just nobody realized, that we have bunch of non-
PIC-aware assembler code. :) Fixing would be not so trivial though,
mostly due to ABI
2009 Mar 12
0
[LLVMdev] Bug in X86CompilationCallback_SSE
...asm volatile (
> "movl %%eax,(%0)\n"
> "movl %%edx,4(%0)\n" // Save EAX/EDX/ECX
> "movl %%ecx,8(%0)\n"
> :: "r"(SAVEBUF+64): "memory" );
>
> asm volatile (
> // Save all XMM arg registers
> "movaps %%xmm0, (%0)\n"
> "movaps %%xmm1, 16(%0)\n"
> "movaps %%xmm2, 32(%0)\n"
> "movaps %%xmm3, 48(%0)\n"
> :: "r"(SAVEBUF) : "memory" );
>
> intptr_t *StackPtr=0, RetAddr=0;
>
> asm volatile ( // get...
2009 Mar 12
0
[LLVMdev] Bug in X86CompilationCallback_SSE
...asm volatile (
> "movl %%eax,(%0)\n"
> "movl %%edx,4(%0)\n" // Save EAX/EDX/ECX
> "movl %%ecx,8(%0)\n"
> :: "r"(SAVEBUF+64): "memory" );
>
> asm volatile (
> // Save all XMM arg registers
> "movaps %%xmm0, (%0)\n"
> "movaps %%xmm1, 16(%0)\n"
> "movaps %%xmm2, 32(%0)\n"
> "movaps %%xmm3, 48(%0)\n"
> :: "r"(SAVEBUF) : "memory" );
>
> intptr_t *StackPtr=0, RetAddr=0;
>
> asm volatile ( // get...
2012 Jul 06
0
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
...rates good code for a function that computes an 8-point
> complex FFT, but from 16-point upwards, icc or gcc generates much
> better code. Here is an example of a sequence of instructions from a
> 32-point FFT, compiled with clang/LLVM 3.1 for x86_64 with SSE:
>
> [...]
> movaps 32(%rdi), %xmm3
> movaps 48(%rdi), %xmm2
> movaps %xmm3, %xmm1 ### <-- xmm3 mov'ed into xmm1
> movaps %xmm3, %xmm4 ### <-- xmm3 mov'ed into xmm4
> addps %xmm0, %xmm1
> movaps %xmm1, -16(%rbp) ## 16-byte Spill
> movaps 144(%rdi), %xmm3 ### <-...
2007 Oct 19
0
[LLVMdev] movaps being generated despite alignment 1 being specified
On Oct 18, 2007, at 1:52 PM, Chuck Rose III wrote:
>
> Here are the instructions for evaluateDependents. The JITter
> hasn’t compiled foo yet. What’s confusing to me is why did my
> movups suddenly become a movaps? All the stores and loads have
> align 1 on them.
Hi Chuck,
I believe this is a bug but am unable to reproduce it with the test
case you've provided. I should be able to see the same problem using
llc since the code generator is going through all the same passes.
The only differen...
2012 Jul 06
2
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
...generated by FFTW.
LLVM generates good code for a function that computes an 8-point
complex FFT, but from 16-point upwards, icc or gcc generates much
better code. Here is an example of a sequence of instructions from a
32-point FFT, compiled with clang/LLVM 3.1 for x86_64 with SSE:
[...]
movaps 32(%rdi), %xmm3
movaps 48(%rdi), %xmm2
movaps %xmm3, %xmm1 ### <-- xmm3 mov'ed into xmm1
movaps %xmm3, %xmm4 ### <-- xmm3 mov'ed into xmm4
addps %xmm0, %xmm1
movaps %xmm1, -16(%rbp) ## 16-byte Spill
movaps 144(%rdi), %xmm3 ### <-- new data mov'ed into xmm...
2013 Aug 22
2
New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
...AC__lpc_compute_residual_from_qlp_coefficients_asm_ia32
cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32_mmx
@@ -596,7 +597,7 @@
movss xmm3, xmm2
movss xmm2, xmm0
- ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm3:xmm3:xmm2
+ ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm4:xmm3:xmm2
movaps xmm1, xmm0
mulps xmm1, xmm2
addps xmm5, xmm1
@@ -619,6 +620,95 @@
ret
ALIGN 16
+cident FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
+ ;[ebp + 20] == autoc[]
+ ;[ebp + 16] == lag
+ ;[ebp + 12] == data_len
+ ;[ebp + 8] == data[]
+ ;[esp] == __m128
+ ;[esp + 16] == __m128
+
+ pu...
2008 Sep 03
3
[LLVMdev] Instruction MVT::ValueTypes
...6:47, Evan Cheng wrote:
> On Sep 2, 2008, at 10:42 AM, David Greene wrote:
> > Is there an easy way to get the MVT::ValueType of a MachineInstruction
> > MachineOperand? For example, the register operand of an x86 MOVAPD
> > should
> > have an MVT::ValueType of v2f64. A MOVAPS register operand should
> > have an
> > MVT::ValueType of v4f32.
>
> The short answer is no. A op of a number of different VTs can map to
> the same instruction.
In general, that may be true, but for most instructions isn't it 1:1? What
are some examples where it isn...
2008 Jul 12
2
[LLVMdev] Shuffle regression
...fails, and I suspect that the issue is still present.
2.3 generates the following x86 code:
03A10010 push ebp
03A10011 mov ebp,esp
03A10013 and esp,0FFFFFFF0h
03A10019 movups xmm0,xmmword ptr ds:[141D280h]
03A10020 xorps xmm1,xmm1
03A10023 movaps xmm2,xmm0
03A10026 shufps xmm2,xmm1,32h
03A1002A movaps xmm1,xmm0
03A1002D shufps xmm1,xmm2,84h
03A10031 shufps xmm0,xmm1,23h
03A10035 shufps xmm1,xmm1,40h
03A10039 shufps xmm1,xmm0,2Eh
03A1003D movups xmmword ptr ds:[14262C0h],xmm1
03A...
2007 Oct 18
3
[LLVMdev] movaps being generated despite alignment 1 being specified
...oping you'll have an idea what's going on or at
least know if it's a new issue I should log. It's related to the stack
alignment issue that I know is being worked on, but seems sufficiently
different to ask about it here. I checked the bug database for "align"
and "movaps" and didn't see this issue raised.
Ok, the first bit of code here seems to generate correct assembly for
me. Basically, it copies the float4 stored at globalV and copies it
into the address pointed to by dependentV. Along the way, it creates a
<4 x float> and copies globalV int...
2014 Aug 07
3
[LLVMdev] MCJIT generates MOVAPS on unaligned address
...ough 'opt -slp-vectorizer' results in no code changes.
What could I be missing here?
Frank
On 08/07/2014 04:29 PM, Arnold Schwaighofer wrote:
>> On Aug 7, 2014, at 12:42 PM, Frank Winter <fwinter at jlab.org> wrote:
>>
>> MCJIT when lowering to x86-64 generates a MOVAPS (Move Aligned Packed Single-Precision Floating-Point Values) on a non-aligned memory address:
>>
>> movaps 88(%rdx), %xmm0
>>
>> where %rdx comes in as a function argument with only natural alignment (float*). This x86 instruction requires the memory address to be 16...
2008 Sep 03
0
[LLVMdev] Instruction MVT::ValueTypes
...; On Sep 2, 2008, at 10:42 AM, David Greene wrote:
>>> Is there an easy way to get the MVT::ValueType of a
>>> MachineInstruction
>>> MachineOperand? For example, the register operand of an x86 MOVAPD
>>> should
>>> have an MVT::ValueType of v2f64. A MOVAPS register operand should
>>> have an
>>> MVT::ValueType of v4f32.
>>
>> The short answer is no. A op of a number of different VTs can map to
>> the same instruction.
>
> In general, that may be true, but for most instructions isn't it
> 1:1? What
&...
2014 Aug 07
2
[LLVMdev] MCJIT generates MOVAPS on unaligned address
...ould I be missing here?
>>
>> Frank
>>
>>
>> On 08/07/2014 04:29 PM, Arnold Schwaighofer wrote:
>>>> On Aug 7, 2014, at 12:42 PM, Frank Winter <fwinter at jlab.org> wrote:
>>>>
>>>> MCJIT when lowering to x86-64 generates a MOVAPS (Move Aligned Packed Single-Precision Floating-Point Values) on a non-aligned memory address:
>>>>
>>>> movaps 88(%rdx), %xmm0
>>>>
>>>> where %rdx comes in as a function argument with only natural alignment (float*). This x86 instruction requi...