thr3ads.net - search: "movdqa"

Displaying 20 results from an estimated 51 matches for "movdqa".

Did you mean: vmovdqa

2017 Aug 18

[PATCH] fix alignment exceptions

Jonathan, Here's the code difference we see with the recent change -- what amounts to reverting your change from a couple years back. It doesn't look like we're getting superfluous instructions from clang now. the bad behavior for us was the alignment exception on the movdqa instructions when the input data wasn't 128-bit aligned. We had to change something because the code as-is was taking alignment faults on the movdqa instructions. For reference, the clang version I used for this is: | Android clang version 5.0.300080 (based on LLVM 5.0.300080) | Targ...

[PATCH] fix alignment exceptions

2017 Aug 18

[PATCH] fix alignment exceptions

We see the MOVQ instruction but this patch deliberately uses it rather than MOVQDA (load 128-bits aligned). We were seeing that with the trace below, the final invocation is not 128-bit aligned but MOVQDA insists on it (the calling function was pitch_sse4_1.c:90, in the 4-way N - i >= 4 loop). 07-31 11:00:13.469 210 2540 <(469)%20210-2540> D opus_sse1: RBE celt_inner_prod_sse4_1: x

[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!

2010 Aug 02

[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!

...al error: error in backend: Ran out of registers during register allocation! Please check your inline asm statement for invalid constraints: INLINEASM <es:movd %eax, %xmm3 pshuflw $$0, %xmm3, %xmm3 punpcklwd %xmm3, %xmm3 pxor %xmm7, %xmm7 pxor %xmm4, %xmm4 movdqa ($2), %xmm5 pxor %xmm6, %xmm6 psubw ($3), %xmm6 mov $$-128, %eax .align 1 << 4 1: movdqa ($1, %eax), %xmm0 movdqa %xmm0, %xmm1 pabsw %xmm0, %xmm0 psubusw %xmm6, %xmm0 pmulhw %xmm5, %xmm0 por %xmm0, %xmm4...

Bug or incorrect use of inline asm?

2017 Aug 04

Bug or incorrect use of inline asm?

...nstant: ``` source_filename = "asanasm.d" target datalayout = "e-m:w-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-windows-msvc" @globVar = global [2 x i32] [i32 66051, i32 66051] define void @_D7asanasm8offconstFZv() { call void asm sideeffect "movdqa 4$0, %xmm7", "*m,~{xmm7}"([2 x i32]* @globVar) ret void } ``` results in: <inline asm>:1:10: error: unexpected token in argument list movdqa 4globVar(%rip), %xmm7 So in that case, I do have to add the '+' to make it work ("4+$0"). So depending on...

[LLVMdev] How does SSEDomainFix work?

2010 May 11

[LLVMdev] How does SSEDomainFix work?

Hello. This is my 1st post. I have tried SSE execution domain fixup pass. But I am not able to see any improvements. I expect for the example below to use MOVDQA, PAND &c. (On nehalem, ANDPS is extremely slower than PAND) Please tell me if something would be wrong for me. Thank you. Takumi Host: i386-mingw32 Build: trunk at 103373 foo.ll: define <4 x i32> @foo(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) nounwind readnone { ent...

[LLVMdev] How does SSEDomainFix work?

2010 May 11

[LLVMdev] How does SSEDomainFix work?

...n May 10, 2010, at 9:07 PM, NAKAMURA Takumi wrote: > Hello. This is my 1st post. ようこそ！ > I have tried SSE execution domain fixup pass. > But I am not able to see any improvements. Did you actually measure runtime, or did you look at assembly? > I expect for the example below to use MOVDQA, PAND &c. > (On nehalem, ANDPS is extremely slower than PAND) Are you sure? The andps and pand instructions are actually the same speed, but on Nehalem there is a latency penalty for moving data between the int and float domains. The SSE execution domain pass tries to minimize the extra la...

[LLVMdev] How to force stack alignment for particular target triple in JIT?

2011 Feb 21

[LLVMdev] How to force stack alignment for particular target triple in JIT?

I get SEGV in gcc-compiled procedure in Solaris10-i386. This procedure is called from llvm JIT code. Exact instruction that crashes is this: movdqa %xmm0, 0x10(%esp) %esp is 8-aligned, and by definition of movdqa it expects 16-aligned stack. This leads me to believe that llvm uses wrong ABI when calling external procedures and doesn't align stack properly. llvm module executing in JIT has this target triple: i386-pc-solaris2.10 Isn'...

[PATCH] fix alignment exceptions

2017 Aug 22

[PATCH] fix alignment exceptions

...m>> wrote: Jonathan, Here's the code difference we see with the recent change -- what amounts to reverting your change from a couple years back. It doesn't look like we're getting superfluous instructions from clang now. the bad behavior for us was the alignment exception on the movdqa instructions when the input data wasn't 128-bit aligned. We had to change something because the code as-is was taking alignment faults on the movdqa instructions. For reference, the clang version I used for this is: | Android clang version 5.0.300080 (based on LLVM 5.0.300080) | Targ...

[LLVMdev] How to force stack alignment for particular target triple in JIT?

2011 Feb 21

[LLVMdev] How to force stack alignment for particular target triple in JIT?

Hi Yuri, > I get SEGV in gcc-compiled procedure in Solaris10-i386. This procedure > is called from llvm JIT code. > Exact instruction that crashes is this: movdqa %xmm0, 0x10(%esp) > %esp is 8-aligned, and by definition of movdqa it expects 16-aligned stack. > This leads me to believe that llvm uses wrong ABI when calling external > procedures and doesn't align stack properly. > > llvm module executing in JIT has this target triple: i386-p...

[LLVMdev] i1* function argument on x86-64

2015 Jul 27

[LLVMdev] i1* function argument on x86-64

I am running into a problem with 'i1*' as a function's argument which seems to have appeared since I switched to LLVM 3.6 (but can have other source, of course). If I look at the assembler that the MCJIT generates for an x86-64 target I see that the array 'i1*' is taken as a sequence of 1 bit wide elements. (I guess that's correct). However, I used to call the function

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

2014 Jul 23

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

..., %r8 addq $16, %rdi movq %rsi, %rdx andq $-8, %rdx pxor %xmm0, %xmm0 pxor %xmm1, %xmm1 .align 16, 0x90 .LBB0_3: # %vector.body # =>This Inner Loop Header: Depth=1 movdqa %xmm1, %xmm2 movdqa %xmm0, %xmm3 movdqu -16(%rdi), %xmm0 movdqu (%rdi), %xmm1 paddd %xmm3, %xmm0 paddd %xmm2, %xmm1 addq $32, %rdi addq $-8, %rdx jne .LBB0_3 # BB#4: movq %r8, %rdi movq %rax, %rdx jmp...

[LLVMdev] extractelement causes memory access violation - what to do?

2015 Jun 26

[LLVMdev] extractelement causes memory access violation - what to do?

Hi, Let's have a simple program: define i32 @main(i32 %n, i64 %idx) { %idxSafe = trunc i64 %idx to i5 %r = extractelement <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, i64 %idx ret i32 %r } The assembly of that would be: pcmpeqd %xmm0, %xmm0 movdqa %xmm0, -24(%rsp) movl -24(%rsp,%rsi,4), %eax retq The language reference states that the extractelement instruction produces undefined value in case the index argument is invalid (our case). But the implementation simply dumps the vector to the stack memory, calculates the memory offset out of the...

[LLVMdev] Proposal for a new LLVM concurrency memory model

2010 Apr 26

[LLVMdev] Proposal for a new LLVM concurrency memory model

...t; do. Because of that, I'm not sure we should support vectors as elsewhere >> they degrade gracefully. > > Vector atomics are extremely useful on architectures that support them. I'm curious about the architectures/instructions you're thinking of. Something like 'lock; movdqa'? > I'm not sure we need atomicity across vector elements, so decomposing > shouldn't be a problem, but I will have to think about it a bit. That's interesting. Naïvely, it seems to violate the whole point of atomics, since it means their side-effects don't appear atomic...

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

2014 Oct 13

[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets

...ger lane accessor was used. Output from clang 3.4 for target corei7-avx: $ clang++ test.cpp -O3 -fstrict-aliasing -funroll-loops -ffast-math -march=native -mtune=native -DSPILLING_ENSUES=0 /* no spilling */ $ objdump -dC --no-show-raw-insn ./a.out ... 00000000004004f0 <main>: 4004f0: vmovdqa 0x2004c8(%rip),%xmm0 # 6009c0 <x> 4004f8: vpsrld $0x17,%xmm0,%xmm0 4004fd: vpaddd 0x17b(%rip),%xmm0,%xmm0 # 400680 <__dso_handle+0x8> 400505: vcvtdq2ps %xmm0,%xmm1 400509: vdivps 0x17f(%rip),%xmm1,%xmm1 # 400690 <__dso_handle+0x18> 400511:...

[LLVMdev] Fix crash in llvm_gcda_emit_arcs()

2013 Aug 30

[LLVMdev] Fix crash in llvm_gcda_emit_arcs()

Hi, I've been seeing a crash in llvm_gcda_emit_arcs() on x86_64. The crash occurs executing a movdqa instruction with an unaligned src address. The attached patch to the compiler-rt project fixes the problem by using memcpy() to read data from the write_buffer[] in GCDAProfiling.c. This is my first patch submission to llvm so please let me know if I've missed any steps. I'm not on the m...

[LLVMdev] extractelement causes memory access violation - what to do?

2015 Jun 26

[LLVMdev] extractelement causes memory access violation - what to do?

...efine i32 @main(i32 %n, i64 %idx) { > %idxSafe = trunc i64 %idx to i5 > %r = extractelement <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, > i64 %idx > ret i32 %r > } > > The assembly of that would be: > pcmpeqd%xmm0, %xmm0 > movdqa%xmm0, -24(%rsp) > movl-24(%rsp,%rsi,4), %eax > retq > > The language reference states that the extractelement instruction > produces undefined value in case the index argument is invalid > (our case). But the implementation simply dumps the vector to the >...

[LLVMdev] Fix crash in llvm_gcda_emit_arcs()

2013 Sep 05

[LLVMdev] Fix crash in llvm_gcda_emit_arcs()

...s obviously-correct to me, but I wish it did a compare against > cur_buffer_size to make sure it's in range. > > Nick > > Joseph Kain wrote: > >> Hi, >> >> I've been seeing a crash in llvm_gcda_emit_arcs() on x86_64. The crash >> occurs executing a movdqa instruction with an unaligned src address. >> The attached patch to the compiler-rt project fixes the problem by >> using memcpy() to read data from the write_buffer[] in GCDAProfiling.c. >> >> This is my first patch submission to llvm so please let me know if I've &gt...

[LLVMdev] extractelement causes memory access violation - what to do?

2015 Jun 30

[LLVMdev] extractelement causes memory access violation - what to do?

...t; define i32 @main(i32 %n, i64 %idx) { >> %idxSafe = trunc i64 %idx to i5 >> %r = extractelement <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, i64 %idx >> ret i32 %r >> } >> >> The assembly of that would be: >> pcmpeqd %xmm0, %xmm0 >> movdqa %xmm0, -24(%rsp) >> movl -24(%rsp,%rsi,4), %eax >> retq >> >> The language reference states that the extractelement instruction >> produces undefined value in case the index argument is invalid (our case). >> But the implementation simply dumps the vector to the...

[LLVMdev] Proposal for a new LLVM concurrency memory model

2010 Apr 27

[LLVMdev] Proposal for a new LLVM concurrency memory model

On Monday 26 April 2010 16:09:48 Jeffrey Yasskin wrote: > > Vector atomics are extremely useful on architectures that support them. > > I'm curious about the architectures/instructions you're thinking of. > Something like 'lock; movdqa'? Don't think X86. Think traditional vector machines like the Cray X1/X2. Atomic vector adds and logicals are common operations. > > I'm not sure we need atomicity across vector elements, so decomposing > > shouldn't be a problem, but I will have to think about it a b...

[LLVMdev] How to force stack alignment for particular target triple in JIT?

2011 Feb 21

[LLVMdev] How to force stack alignment for particular target triple in JIT?

On 02/20/2011 23:50, Duncan Sands wrote: > Hi Yuri, > > >> I get SEGV in gcc-compiled procedure in Solaris10-i386. This procedure >> is called from llvm JIT code. >> Exact instruction that crashes is this: movdqa %xmm0, 0x10(%esp) >> %esp is 8-aligned, and by definition of movdqa it expects 16-aligned stack. >> This leads me to believe that llvm uses wrong ABI when calling external >> procedures and doesn't align stack properly. >> >> llvm module executing in JIT has this t...

search for: movdqa