Displaying 20 results from an estimated 51 matches for "movdqa".
Did you mean:
vmovdqa
2017 Aug 18
1
[PATCH] fix alignment exceptions
Jonathan,
Here's the code difference we see with the recent change -- what amounts to
reverting your change from a couple years back.
It doesn't look like we're getting superfluous instructions from clang now.
the bad behavior for us was the alignment exception on the movdqa
instructions when the input data wasn't 128-bit aligned.
We had to change something because the code as-is was taking alignment
faults on the movdqa instructions.
For reference, the clang version I used for this is:
| Android clang version 5.0.300080 (based on LLVM 5.0.300080)
| Targ...
2017 Aug 18
2
[PATCH] fix alignment exceptions
We see the MOVQ instruction but this patch deliberately uses it rather than
MOVQDA (load 128-bits aligned). We were seeing that with the trace below,
the final invocation is not 128-bit aligned but MOVQDA insists on it (the
calling function was pitch_sse4_1.c:90, in the 4-way N - i >= 4 loop).
07-31 11:00:13.469 210 2540 <(469)%20210-2540> D opus_sse1: RBE
celt_inner_prod_sse4_1: x
2010 Aug 02
0
[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!
...al error: error in backend: Ran out of registers during register
allocation!
Please check your inline asm statement for invalid constraints:
INLINEASM <es:movd %eax, %xmm3
pshuflw $$0, %xmm3, %xmm3
punpcklwd %xmm3, %xmm3
pxor %xmm7, %xmm7
pxor %xmm4, %xmm4
movdqa ($2), %xmm5
pxor %xmm6, %xmm6
psubw ($3), %xmm6
mov $$-128, %eax
.align 1 << 4
1:
movdqa ($1, %eax), %xmm0
movdqa %xmm0, %xmm1
pabsw %xmm0, %xmm0
psubusw %xmm6, %xmm0
pmulhw %xmm5, %xmm0
por %xmm0, %xmm4...
2017 Aug 04
2
Bug or incorrect use of inline asm?
...nstant:
```
source_filename = "asanasm.d"
target datalayout = "e-m:w-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc"
@globVar = global [2 x i32] [i32 66051, i32 66051]
define void @_D7asanasm8offconstFZv() {
call void asm sideeffect "movdqa 4$0, %xmm7", "*m,~{xmm7}"([2 x i32]*
@globVar)
ret void
}
```
results in:
<inline asm>:1:10: error: unexpected token in argument list
movdqa 4globVar(%rip), %xmm7
So in that case, I do have to add the '+' to make it work ("4+$0").
So depending on...
2010 May 11
2
[LLVMdev] How does SSEDomainFix work?
Hello. This is my 1st post.
I have tried SSE execution domain fixup pass.
But I am not able to see any improvements.
I expect for the example below to use MOVDQA, PAND &c.
(On nehalem, ANDPS is extremely slower than PAND)
Please tell me if something would be wrong for me.
Thank you.
Takumi
Host: i386-mingw32
Build: trunk at 103373
foo.ll:
define <4 x i32> @foo(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z)
nounwind readnone {
ent...
2010 May 11
0
[LLVMdev] How does SSEDomainFix work?
...n May 10, 2010, at 9:07 PM, NAKAMURA Takumi wrote:
> Hello. This is my 1st post.
ようこそ!
> I have tried SSE execution domain fixup pass.
> But I am not able to see any improvements.
Did you actually measure runtime, or did you look at assembly?
> I expect for the example below to use MOVDQA, PAND &c.
> (On nehalem, ANDPS is extremely slower than PAND)
Are you sure? The andps and pand instructions are actually the same speed, but on Nehalem there is a latency penalty for moving data between the int and float domains.
The SSE execution domain pass tries to minimize the extra la...
2011 Feb 21
4
[LLVMdev] How to force stack alignment for particular target triple in JIT?
I get SEGV in gcc-compiled procedure in Solaris10-i386. This procedure
is called from llvm JIT code.
Exact instruction that crashes is this: movdqa %xmm0, 0x10(%esp)
%esp is 8-aligned, and by definition of movdqa it expects 16-aligned stack.
This leads me to believe that llvm uses wrong ABI when calling external
procedures and doesn't align stack properly.
llvm module executing in JIT has this target triple: i386-pc-solaris2.10
Isn'...
2017 Aug 22
0
[PATCH] fix alignment exceptions
...m>> wrote:
Jonathan,
Here's the code difference we see with the recent change -- what amounts to reverting your change from a couple years back.
It doesn't look like we're getting superfluous instructions from clang now.
the bad behavior for us was the alignment exception on the movdqa instructions when the input data wasn't 128-bit aligned.
We had to change something because the code as-is was taking alignment faults on the movdqa instructions.
For reference, the clang version I used for this is:
| Android clang version 5.0.300080 (based on LLVM 5.0.300080)
| Targ...
2011 Feb 21
0
[LLVMdev] How to force stack alignment for particular target triple in JIT?
Hi Yuri,
> I get SEGV in gcc-compiled procedure in Solaris10-i386. This procedure
> is called from llvm JIT code.
> Exact instruction that crashes is this: movdqa %xmm0, 0x10(%esp)
> %esp is 8-aligned, and by definition of movdqa it expects 16-aligned stack.
> This leads me to believe that llvm uses wrong ABI when calling external
> procedures and doesn't align stack properly.
>
> llvm module executing in JIT has this target triple: i386-p...
2015 Jul 27
3
[LLVMdev] i1* function argument on x86-64
I am running into a problem with 'i1*' as a function's argument which
seems to have appeared since I switched to LLVM 3.6 (but can have other
source, of course). If I look at the assembler that the MCJIT generates
for an x86-64 target I see that the array 'i1*' is taken as a sequence
of 1 bit wide elements. (I guess that's correct). However, I used to
call the function
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
..., %r8
addq $16, %rdi
movq %rsi, %rdx
andq $-8, %rdx
pxor %xmm0, %xmm0
pxor %xmm1, %xmm1
.align 16, 0x90
.LBB0_3: # %vector.body
# =>This Inner Loop Header: Depth=1
movdqa %xmm1, %xmm2
movdqa %xmm0, %xmm3
movdqu -16(%rdi), %xmm0
movdqu (%rdi), %xmm1
paddd %xmm3, %xmm0
paddd %xmm2, %xmm1
addq $32, %rdi
addq $-8, %rdx
jne .LBB0_3
# BB#4:
movq %r8, %rdi
movq %rax, %rdx
jmp...
2015 Jun 26
3
[LLVMdev] extractelement causes memory access violation - what to do?
Hi,
Let's have a simple program:
define i32 @main(i32 %n, i64 %idx) {
%idxSafe = trunc i64 %idx to i5
%r = extractelement <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, i64 %idx
ret i32 %r
}
The assembly of that would be:
pcmpeqd %xmm0, %xmm0
movdqa %xmm0, -24(%rsp)
movl -24(%rsp,%rsi,4), %eax
retq
The language reference states that the extractelement instruction produces
undefined value in case the index argument is invalid (our case). But the
implementation simply dumps the vector to the stack memory, calculates the
memory offset out of the...
2010 Apr 26
2
[LLVMdev] Proposal for a new LLVM concurrency memory model
...t; do. Because of that, I'm not sure we should support vectors as elsewhere
>> they degrade gracefully.
>
> Vector atomics are extremely useful on architectures that support them.
I'm curious about the architectures/instructions you're thinking of.
Something like 'lock; movdqa'?
> I'm not sure we need atomicity across vector elements, so decomposing
> shouldn't be a problem, but I will have to think about it a bit.
That's interesting. Naïvely, it seems to violate the whole point of
atomics, since it means their side-effects don't appear atomic...
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
...ger lane accessor was used.
Output from clang 3.4 for target corei7-avx:
$ clang++ test.cpp -O3 -fstrict-aliasing -funroll-loops -ffast-math
-march=native -mtune=native -DSPILLING_ENSUES=0 /* no spilling */
$ objdump -dC --no-show-raw-insn ./a.out
...
00000000004004f0 <main>:
4004f0: vmovdqa 0x2004c8(%rip),%xmm0 # 6009c0 <x>
4004f8: vpsrld $0x17,%xmm0,%xmm0
4004fd: vpaddd 0x17b(%rip),%xmm0,%xmm0 # 400680
<__dso_handle+0x8>
400505: vcvtdq2ps %xmm0,%xmm1
400509: vdivps 0x17f(%rip),%xmm1,%xmm1 # 400690
<__dso_handle+0x18>
400511:...
2013 Aug 30
2
[LLVMdev] Fix crash in llvm_gcda_emit_arcs()
Hi,
I've been seeing a crash in llvm_gcda_emit_arcs() on x86_64. The crash
occurs executing a movdqa instruction with an unaligned src address. The
attached patch to the compiler-rt project fixes the problem by using
memcpy() to read data from the write_buffer[] in GCDAProfiling.c.
This is my first patch submission to llvm so please let me know if I've
missed any steps. I'm not on the m...
2015 Jun 26
2
[LLVMdev] extractelement causes memory access violation - what to do?
...efine i32 @main(i32 %n, i64 %idx) {
> %idxSafe = trunc i64 %idx to i5
> %r = extractelement <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>,
> i64 %idx
> ret i32 %r
> }
>
> The assembly of that would be:
> pcmpeqd%xmm0, %xmm0
> movdqa%xmm0, -24(%rsp)
> movl-24(%rsp,%rsi,4), %eax
> retq
>
> The language reference states that the extractelement instruction
> produces undefined value in case the index argument is invalid
> (our case). But the implementation simply dumps the vector to the
>...
2013 Sep 05
2
[LLVMdev] Fix crash in llvm_gcda_emit_arcs()
...s obviously-correct to me, but I wish it did a compare against
> cur_buffer_size to make sure it's in range.
>
> Nick
>
> Joseph Kain wrote:
>
>> Hi,
>>
>> I've been seeing a crash in llvm_gcda_emit_arcs() on x86_64. The crash
>> occurs executing a movdqa instruction with an unaligned src address.
>> The attached patch to the compiler-rt project fixes the problem by
>> using memcpy() to read data from the write_buffer[] in GCDAProfiling.c.
>>
>> This is my first patch submission to llvm so please let me know if I've
>...
2015 Jun 30
2
[LLVMdev] extractelement causes memory access violation - what to do?
...t; define i32 @main(i32 %n, i64 %idx) {
>> %idxSafe = trunc i64 %idx to i5
>> %r = extractelement <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, i64 %idx
>> ret i32 %r
>> }
>>
>> The assembly of that would be:
>> pcmpeqd %xmm0, %xmm0
>> movdqa %xmm0, -24(%rsp)
>> movl -24(%rsp,%rsi,4), %eax
>> retq
>>
>> The language reference states that the extractelement instruction
>> produces undefined value in case the index argument is invalid (our case).
>> But the implementation simply dumps the vector to the...
2010 Apr 27
0
[LLVMdev] Proposal for a new LLVM concurrency memory model
On Monday 26 April 2010 16:09:48 Jeffrey Yasskin wrote:
> > Vector atomics are extremely useful on architectures that support them.
>
> I'm curious about the architectures/instructions you're thinking of.
> Something like 'lock; movdqa'?
Don't think X86. Think traditional vector machines like the Cray X1/X2.
Atomic vector adds and logicals are common operations.
> > I'm not sure we need atomicity across vector elements, so decomposing
> > shouldn't be a problem, but I will have to think about it a b...
2011 Feb 21
1
[LLVMdev] How to force stack alignment for particular target triple in JIT?
On 02/20/2011 23:50, Duncan Sands wrote:
> Hi Yuri,
>
>
>> I get SEGV in gcc-compiled procedure in Solaris10-i386. This procedure
>> is called from llvm JIT code.
>> Exact instruction that crashes is this: movdqa %xmm0, 0x10(%esp)
>> %esp is 8-aligned, and by definition of movdqa it expects 16-aligned stack.
>> This leads me to believe that llvm uses wrong ABI when calling external
>> procedures and doesn't align stack properly.
>>
>> llvm module executing in JIT has this t...