Displaying 20 results from an estimated 62 matches for "xmm6".
Did you mean:
xmm0
2013 Jul 19
0
[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
...xmm3,xmm3,0EEh
002E017F subpd xmm4,xmm3
002E0183 subpd xmm4,xmm5
002E0187 fld qword ptr [esp+0F0h]
002E018E call 76719BA1 CALL
002E0193 imul ebx,eax,0Ch
002E0196 lea esi,[ebx+3]
002E0199 shl esi,4
002E019C movapd xmm6,xmmword ptr [esi+2C0030h]
002E01A4 mulpd xmm6,xmm4
002E01A8 mulpd xmm3,xmm7
002E01AC movapd xmm7,xmmword ptr [esp+60h]
002E01B2 subpd xmm7,xmm2
002E01B6 subpd xmm7,xmm3
002E01BA subpd xmm7,xmm5
002E01BE movapd xmm2,xmmword ptr [esi+2C0020h]
002...
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
...; -> <2 x i32>. Any ideas to optimize these instructions?
Thanks.
%sub.ptr.sub.i6.i.i.i.i = sub <2 x i64> %sub.ptr.lhs.cast.i4.i.i.i.i,
%sub.ptr.rhs.cast.i5.i.i.i.i
%sub.ptr.div.i7.i.i.i.i = sdiv <2 x i64> %sub.ptr.sub.i6.i.i.i.i, <i64 24,
i64 24>
Assembly:
vpsubq %xmm6, %xmm5, %xmm5
vmovq %xmm5, %rax
movabsq $3074457345618258603, %rbx # imm = 0x2AAAAAAAAAAAAAAB
imulq %rbx
movq %rdx, %rcx
movq %rcx, %rax
shrq $63, %rax
shrq $2, %rcx
addl %eax, %ecx
vpextrq $1, %xmm5, %rax
imulq %rbx
movq %rdx...
2014 Sep 05
3
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...ougher at gmail.com>
wrote:
> Unfortunately, another team, while doing internal testing has seen the
> new path generating illegal insertps masks. A sample here:
>
> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =
> xmm4[0,1],xmm1[2],xmm4[3]
> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 =
> xmm6[0,1],xmm13[2],xmm6[3]
>...
2014 Sep 04
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Greetings all,
As you may have noticed, there is a new vector shuffle lowering path in the
X86 backend. You can try it out with the
'-x86-experimental-vector-shuffle-lowering' flag to llc, or '-mllvm
-x86-experimental-vector-shuffle-lowering' to clang. Please test it out!
There may be some correctness bugs, I'm still fuzz testing it to shake them
out. But I expect fairly few
2013 Aug 22
2
New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
...orrelation_asm_ia32_sse_lag_16
cglobal FLAC__lpc_compute_autocorrelation_asm_ia32_3dnow
cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32
cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32_mmx
@@ -596,7 +597,7 @@
movss xmm3, xmm2
movss xmm2, xmm0
- ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm3:xmm3:xmm2
+ ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm4:xmm3:xmm2
movaps xmm1, xmm0
mulps xmm1, xmm2
addps xmm5, xmm1
@@ -619,6 +620,95 @@
ret
ALIGN 16
+cident FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
+ ;[ebp + 20] == autoc[]
+ ;[ebp + 16] ==...
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
...instructions? Thanks.
>>
>> %sub.ptr.sub.i6.i.i.i.i = sub <2 x i64> %sub.ptr.lhs.cast.i4.i.i.i.i, %sub.ptr.rhs.cast.i5.i.i.i.i
>> %sub.ptr.div.i7.i.i.i.i = sdiv <2 x i64> %sub.ptr.sub.i6.i.i.i.i, <i64 24, i64 24>
>>
>> Assembly:
>> vpsubq %xmm6, %xmm5, %xmm5
>> vmovq %xmm5, %rax
>> movabsq $3074457345618258603, %rbx # imm = 0x2AAAAAAAAAAAAAAB
>> imulq %rbx
>> movq %rdx, %rcx
>> movq %rcx, %rax
>> shrq $63, %rax
>> shrq $2, %rcx
>> addl...
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
...deas to optimize these instructions? Thanks.
>
> %sub.ptr.sub.i6.i.i.i.i = sub <2 x i64> %sub.ptr.lhs.cast.i4.i.i.i.i, %sub.ptr.rhs.cast.i5.i.i.i.i
> %sub.ptr.div.i7.i.i.i.i = sdiv <2 x i64> %sub.ptr.sub.i6.i.i.i.i, <i64 24, i64 24>
>
> Assembly:
> vpsubq %xmm6, %xmm5, %xmm5
> vmovq %xmm5, %rax
> movabsq $3074457345618258603, %rbx # imm = 0x2AAAAAAAAAAAAAAB
> imulq %rbx
> movq %rdx, %rcx
> movq %...
2013 Jul 19
4
[LLVMdev] SIMD instructions and memory alignment on X86
Hmm, I'm not able to get those .ll files to compile if I disable SSE and I
end up with SSE instructions(including sqrtpd) if I don't disable it.
On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote:
> Is there something specifically required to enable SSE? If it's not
> detected as available (based from the target triple?) then I don't think
2015 Jan 29
2
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...p), %xmm2 ## xmm2 = mem[3,0,1,2]
> ...
> vinsertps $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 = xmm4[3],xmm2[1,2,3]
>
> Note that the second version does the shuffle in-place, in xmm2.
>
>
> Some are blends (har har) of those two:
> vpermilps $-0x6d, %xmm_mem_1, %xmm6 ## xmm6 = xmm_mem_1[3,0,1,2]
> vpermilps $-0x6d, -0xXX(%rax), %xmm1 ## xmm1 = mem_2[3,0,1,2]
> vblendps $0x1, %xmm1, %xmm6, %xmm0 ## xmm0 = xmm1[0],xmm6[1,2,3]
> becomes:
> vmovaps -0xXX(%rax), %xmm0 ## %xmm0 = mem_2[0,1,2,3]
> vpermilps $-0x6d, %xm...
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
...--------------------------------------
# BB#3: # %if.then.i.i.i.i.i.i
vpsllq $3, %xmm0, %xmm0
vpextrq $1, %xmm0, %rbx
movq %rbx, %rdi
vmovaps %xmm2, 96(%rsp) # 16-byte Spill
vmovaps %xmm5, 64(%rsp) # 16-byte Spill
vmovdqa %xmm6, 16(%rsp) # 16-byte Spill
callq _Znam
movq %rax, 128(%rsp)
movq 16(%r12), %rsi
movq %rax, %rdi
movq %rbx, %rdx
callq memmove
vmovdqa 16(%rsp), %xmm6 # 16-byte Reload
vmovaps 64(%rsp), %xmm5 # 16-byte Reload
vmovaps 96(%rsp)...
2014 Sep 05
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...;> Unfortunately, another team, while doing internal testing has seen the
>>> new path generating illegal insertps masks. A sample here:
>>>
>>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =
>>> xmm4[0,1],xmm1[2],xmm4[3]
>>> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 =
>>>...
2015 Jul 24
1
[LLVMdev] SIMD for sdiv <2 x i64>
...>
> # BB#3: # %if.then.i.i.i.i.i.i
> vpsllq $3, %xmm0, %xmm0
> vpextrq $1, %xmm0, %rbx
> movq %rbx, %rdi
> vmovaps %xmm2, 96(%rsp) # 16-byte Spill
> vmovaps %xmm5, 64(%rsp) # 16-byte Spill
> vmovdqa %xmm6, 16(%rsp) # 16-byte Spill
> callq _Znam
> movq %rax, 128(%rsp)
> movq 16(%r12), %rsi
> movq %rax, %rdi
> movq %rbx, %rdx
> callq memmove
> vmovdqa 16(%rsp), %xmm6 # 16-byte Reload
> vmovaps 64(%rsp), %xmm5...
2015 Jan 30
4
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...tps $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 =
>>> xmm4[3],xmm2[1,2,3]
>>>
>>> Note that the second version does the shuffle in-place, in xmm2.
>>>
>>>
>>> Some are blends (har har) of those two:
>>> vpermilps $-0x6d, %xmm_mem_1, %xmm6 ## xmm6 = xmm_mem_1[3,0,1,2]
>>> vpermilps $-0x6d, -0xXX(%rax), %xmm1 ## xmm1 = mem_2[3,0,1,2]
>>> vblendps $0x1, %xmm1, %xmm6, %xmm0 ## xmm0 = xmm1[0],xmm6[1,2,3]
>>> becomes:
>>> vmovaps -0xXX(%rax), %xmm0 ## %xmm0 = mem_2[0,1,2,3]
&g...
2015 Jan 29
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...>> ...
>> vinsertps $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 = xmm4[3],xmm2[1,2,3]
>>
>> Note that the second version does the shuffle in-place, in xmm2.
>>
>>
>> Some are blends (har har) of those two:
>> vpermilps $-0x6d, %xmm_mem_1, %xmm6 ## xmm6 = xmm_mem_1[3,0,1,2]
>> vpermilps $-0x6d, -0xXX(%rax), %xmm1 ## xmm1 = mem_2[3,0,1,2]
>> vblendps $0x1, %xmm1, %xmm6, %xmm0 ## xmm0 = xmm1[0],xmm6[1,2,3]
>> becomes:
>> vmovaps -0xXX(%rax), %xmm0 ## %xmm0 = mem_2[0,1,2,3]
>> vperm...
2014 Sep 06
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...>
>>
>> Unfortunately, another team, while doing internal testing has seen the
>> new path generating illegal insertps masks. A sample here:
>>
>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =
>> xmm4[0,1],xmm1[2],xmm4[3]
>> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 =
>> xmm6[0,1],xmm13[2],...
2015 Jan 30
0
[LLVMdev] RFB: Would like to flip the vector shuffle legality flag
...%xmm2 ## xmm2 =
>>>> xmm4[3],xmm2[1,2,3]
>>>>
>>>> Note that the second version does the shuffle in-place, in xmm2.
>>>>
>>>>
>>>> Some are blends (har har) of those two:
>>>> vpermilps $-0x6d, %xmm_mem_1, %xmm6 ## xmm6 = xmm_mem_1[3,0,1,2]
>>>> vpermilps $-0x6d, -0xXX(%rax), %xmm1 ## xmm1 = mem_2[3,0,1,2]
>>>> vblendps $0x1, %xmm1, %xmm6, %xmm0 ## xmm0 =
>>>> xmm1[0],xmm6[1,2,3]
>>>> becomes:
>>>> vmovaps -0xXX(%rax), %...
2010 Nov 03
1
[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism
...2 * b
.
.
p3 = p3 * c
p3 = p3 * c
.
.
An actual excerpt of the generated x86 assembly follows:
mulss %xmm8, %xmm10
mulss %xmm8, %xmm10
.
. repeated 512 times
.
mulss %xmm7, %xmm9
mulss %xmm7, %xmm9
.
. repeated 512 times
.
mulss %xmm6, %xmm3
mulss %xmm6, %xmm3
.
. repeated 512 times
.
Since p1, p2, p3, and p4 are all independent, this reordering is correct. This would have
the possible advantage of reducing live ranges of values. However, in this microbenchmark,
the number of live values is eight single-precision...
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
...%xmm3
je .LBB10_66
# BB#5: # %for.body.preheader
vpaddq %xmm15, %xmm2, %xmm3
vpand %xmm15, %xmm3, %xmm3
vpaddq .LCPI10_1(%rip), %xmm3, %xmm8
vpand .LCPI10_5(%rip), %xmm8, %xmm5
vpxor %xmm4, %xmm4, %xmm4
vpcmpeqq %xmm4, %xmm5, %xmm6
vptest %xmm6, %xmm6
jne .LBB10_9
It turned out that the vector one is way more complicated than the scalar
one. I was expecting that it would be not so tedious.
On Fri, Jun 26, 2015 at 3:49 AM, suyog sarda <sardask01 at gmail.com> wrote:
>
> >
> > Is LLVM be able t...
2010 Aug 02
0
[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!
...ut of registers during register
allocation!
Please check your inline asm statement for invalid constraints:
INLINEASM <es:movd %eax, %xmm3
pshuflw $$0, %xmm3, %xmm3
punpcklwd %xmm3, %xmm3
pxor %xmm7, %xmm7
pxor %xmm4, %xmm4
movdqa ($2), %xmm5
pxor %xmm6, %xmm6
psubw ($3), %xmm6
mov $$-128, %eax
.align 1 << 4
1:
movdqa ($1, %eax), %xmm0
movdqa %xmm0, %xmm1
pabsw %xmm0, %xmm0
psubusw %xmm6, %xmm0
pmulhw %xmm5, %xmm0
por %xmm0, %xmm4
psignw %xmm1, %xmm0...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
...[eax+20]
+ mulps xmm2, xmm1
+ mulps xmm3, xmm1
+ movss xmm4, [eax+36]
+ movss xmm5, [eax+40]
+ mulss xmm4, xmm1
+ mulss xmm5, xmm1
+ movaps xmm6, [ebx+4]
+ subps xmm6, xmm2
+ movups [ebx], xmm6
+ movaps xmm7, [ebx+20]
+ subps xmm7, xmm3
+ movups [ebx+16], xmm7
+
+ movss xmm7, [ebx+36]
+...