thr3ads.net - search: "xmm6"

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

2013 Jul 19

0

[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX

...xmm3,xmm3,0EEh 002E017F subpd xmm4,xmm3 002E0183 subpd xmm4,xmm5 002E0187 fld qword ptr [esp+0F0h] 002E018E call 76719BA1 CALL 002E0193 imul ebx,eax,0Ch 002E0196 lea esi,[ebx+3] 002E0199 shl esi,4 002E019C movapd xmm6,xmmword ptr [esi+2C0030h] 002E01A4 mulpd xmm6,xmm4 002E01A8 mulpd xmm3,xmm7 002E01AC movapd xmm7,xmmword ptr [esp+60h] 002E01B2 subpd xmm7,xmm2 002E01B6 subpd xmm7,xmm3 002E01BA subpd xmm7,xmm5 002E01BE movapd xmm2,xmmword ptr [esi+2C0020h] 002...

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

2

[LLVMdev] SIMD for sdiv <2 x i64>

...; -> <2 x i32>. Any ideas to optimize these instructions? Thanks. %sub.ptr.sub.i6.i.i.i.i = sub <2 x i64> %sub.ptr.lhs.cast.i4.i.i.i.i, %sub.ptr.rhs.cast.i5.i.i.i.i %sub.ptr.div.i7.i.i.i.i = sdiv <2 x i64> %sub.ptr.sub.i6.i.i.i.i, <i64 24, i64 24> Assembly: vpsubq %xmm6, %xmm5, %xmm5 vmovq %xmm5, %rax movabsq $3074457345618258603, %rbx # imm = 0x2AAAAAAAAAAAAAAB imulq %rbx movq %rdx, %rcx movq %rcx, %rax shrq $63, %rax shrq $2, %rcx addl %eax, %ecx vpextrq $1, %xmm5, %rax imulq %rbx movq %rdx...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 05

3

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...ougher at gmail.com> wrote: > Unfortunately, another team, while doing internal testing has seen the > new path generating illegal insertps masks. A sample here: > > vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] > vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] > vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] > vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = > xmm4[0,1],xmm1[2],xmm4[3] > vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 = > xmm6[0,1],xmm13[2],xmm6[3] >...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 04

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Greetings all, As you may have noticed, there is a new vector shuffle lowering path in the X86 backend. You can try it out with the '-x86-experimental-vector-shuffle-lowering' flag to llc, or '-mllvm -x86-experimental-vector-shuffle-lowering' to clang. Please test it out! There may be some correctness bugs, I'm still fuzz testing it to shake them out. But I expect fairly few

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

2013 Aug 22

2

New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16

...orrelation_asm_ia32_sse_lag_16 cglobal FLAC__lpc_compute_autocorrelation_asm_ia32_3dnow cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32 cglobal FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32_mmx @@ -596,7 +597,7 @@ movss xmm3, xmm2 movss xmm2, xmm0 - ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm3:xmm3:xmm2 + ; xmm7:xmm6:xmm5 += xmm0:xmm0:xmm0 * xmm4:xmm3:xmm2 movaps xmm1, xmm0 mulps xmm1, xmm2 addps xmm5, xmm1 @@ -619,6 +620,95 @@ ret ALIGN 16 +cident FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16 + ;[ebp + 20] == autoc[] + ;[ebp + 16] ==...

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

2

[LLVMdev] SIMD for sdiv <2 x i64>

...instructions? Thanks. >> >> %sub.ptr.sub.i6.i.i.i.i = sub <2 x i64> %sub.ptr.lhs.cast.i4.i.i.i.i, %sub.ptr.rhs.cast.i5.i.i.i.i >> %sub.ptr.div.i7.i.i.i.i = sdiv <2 x i64> %sub.ptr.sub.i6.i.i.i.i, <i64 24, i64 24> >> >> Assembly: >> vpsubq %xmm6, %xmm5, %xmm5 >> vmovq %xmm5, %rax >> movabsq $3074457345618258603, %rbx # imm = 0x2AAAAAAAAAAAAAAB >> imulq %rbx >> movq %rdx, %rcx >> movq %rcx, %rax >> shrq $63, %rax >> shrq $2, %rcx >> addl...

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

0

[LLVMdev] SIMD for sdiv <2 x i64>

...deas to optimize these instructions? Thanks. > > %sub.ptr.sub.i6.i.i.i.i = sub <2 x i64> %sub.ptr.lhs.cast.i4.i.i.i.i, %sub.ptr.rhs.cast.i5.i.i.i.i > %sub.ptr.div.i7.i.i.i.i = sdiv <2 x i64> %sub.ptr.sub.i6.i.i.i.i, <i64 24, i64 24> > > Assembly: > vpsubq %xmm6, %xmm5, %xmm5 > vmovq %xmm5, %rax > movabsq $3074457345618258603, %rbx # imm = 0x2AAAAAAAAAAAAAAB > imulq %rbx > movq %rdx, %rcx > movq %...

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 19

4

[LLVMdev] SIMD instructions and memory alignment on X86

Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it. On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote: > Is there something specifically required to enable SSE? If it's not > detected as available (based from the target triple?) then I don't think

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 29

2

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

...p), %xmm2 ## xmm2 = mem[3,0,1,2] > ... > vinsertps $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 = xmm4[3],xmm2[1,2,3] > > Note that the second version does the shuffle in-place, in xmm2. > > > Some are blends (har har) of those two: > vpermilps $-0x6d, %xmm_mem_1, %xmm6 ## xmm6 = xmm_mem_1[3,0,1,2] > vpermilps $-0x6d, -0xXX(%rax), %xmm1 ## xmm1 = mem_2[3,0,1,2] > vblendps $0x1, %xmm1, %xmm6, %xmm0 ## xmm0 = xmm1[0],xmm6[1,2,3] > becomes: > vmovaps -0xXX(%rax), %xmm0 ## %xmm0 = mem_2[0,1,2,3] > vpermilps $-0x6d, %xm...

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

0

[LLVMdev] SIMD for sdiv <2 x i64>

...-------------------------------------- # BB#3: # %if.then.i.i.i.i.i.i vpsllq $3, %xmm0, %xmm0 vpextrq $1, %xmm0, %rbx movq %rbx, %rdi vmovaps %xmm2, 96(%rsp) # 16-byte Spill vmovaps %xmm5, 64(%rsp) # 16-byte Spill vmovdqa %xmm6, 16(%rsp) # 16-byte Spill callq _Znam movq %rax, 128(%rsp) movq 16(%r12), %rsi movq %rax, %rdi movq %rbx, %rdx callq memmove vmovdqa 16(%rsp), %xmm6 # 16-byte Reload vmovaps 64(%rsp), %xmm5 # 16-byte Reload vmovaps 96(%rsp)...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 05

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...;> Unfortunately, another team, while doing internal testing has seen the >>> new path generating illegal insertps masks. A sample here: >>> >>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = >>> xmm4[0,1],xmm1[2],xmm4[3] >>> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 = >>>...

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

1

[LLVMdev] SIMD for sdiv <2 x i64>

...> > # BB#3: # %if.then.i.i.i.i.i.i > vpsllq $3, %xmm0, %xmm0 > vpextrq $1, %xmm0, %rbx > movq %rbx, %rdi > vmovaps %xmm2, 96(%rsp) # 16-byte Spill > vmovaps %xmm5, 64(%rsp) # 16-byte Spill > vmovdqa %xmm6, 16(%rsp) # 16-byte Spill > callq _Znam > movq %rax, 128(%rsp) > movq 16(%r12), %rsi > movq %rax, %rdi > movq %rbx, %rdx > callq memmove > vmovdqa 16(%rsp), %xmm6 # 16-byte Reload > vmovaps 64(%rsp), %xmm5...

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 30

4

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

...tps $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 = >>> xmm4[3],xmm2[1,2,3] >>> >>> Note that the second version does the shuffle in-place, in xmm2. >>> >>> >>> Some are blends (har har) of those two: >>> vpermilps $-0x6d, %xmm_mem_1, %xmm6 ## xmm6 = xmm_mem_1[3,0,1,2] >>> vpermilps $-0x6d, -0xXX(%rax), %xmm1 ## xmm1 = mem_2[3,0,1,2] >>> vblendps $0x1, %xmm1, %xmm6, %xmm0 ## xmm0 = xmm1[0],xmm6[1,2,3] >>> becomes: >>> vmovaps -0xXX(%rax), %xmm0 ## %xmm0 = mem_2[0,1,2,3] &g...

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 29

0

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

...>> ... >> vinsertps $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 = xmm4[3],xmm2[1,2,3] >> >> Note that the second version does the shuffle in-place, in xmm2. >> >> >> Some are blends (har har) of those two: >> vpermilps $-0x6d, %xmm_mem_1, %xmm6 ## xmm6 = xmm_mem_1[3,0,1,2] >> vpermilps $-0x6d, -0xXX(%rax), %xmm1 ## xmm1 = mem_2[3,0,1,2] >> vblendps $0x1, %xmm1, %xmm6, %xmm0 ## xmm0 = xmm1[0],xmm6[1,2,3] >> becomes: >> vmovaps -0xXX(%rax), %xmm0 ## %xmm0 = mem_2[0,1,2,3] >> vperm...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 06

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...> >> >> Unfortunately, another team, while doing internal testing has seen the >> new path generating illegal insertps masks. A sample here: >> >> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = >> xmm4[0,1],xmm1[2],xmm4[3] >> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 = >> xmm6[0,1],xmm13[2],...

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

2015 Jan 30

0

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

...%xmm2 ## xmm2 = >>>> xmm4[3],xmm2[1,2,3] >>>> >>>> Note that the second version does the shuffle in-place, in xmm2. >>>> >>>> >>>> Some are blends (har har) of those two: >>>> vpermilps $-0x6d, %xmm_mem_1, %xmm6 ## xmm6 = xmm_mem_1[3,0,1,2] >>>> vpermilps $-0x6d, -0xXX(%rax), %xmm1 ## xmm1 = mem_2[3,0,1,2] >>>> vblendps $0x1, %xmm1, %xmm6, %xmm0 ## xmm0 = >>>> xmm1[0],xmm6[1,2,3] >>>> becomes: >>>> vmovaps -0xXX(%rax), %...

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

2010 Nov 03

1

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

...2 * b . . p3 = p3 * c p3 = p3 * c . . An actual excerpt of the generated x86 assembly follows: mulss %xmm8, %xmm10 mulss %xmm8, %xmm10 . . repeated 512 times . mulss %xmm7, %xmm9 mulss %xmm7, %xmm9 . . repeated 512 times . mulss %xmm6, %xmm3 mulss %xmm6, %xmm3 . . repeated 512 times . Since p1, p2, p3, and p4 are all independent, this reordering is correct. This would have the possible advantage of reducing live ranges of values. However, in this microbenchmark, the number of live values is eight single-precision...

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 26

2

[LLVMdev] Can LLVM vectorize <2 x i32> type

...%xmm3 je .LBB10_66 # BB#5: # %for.body.preheader vpaddq %xmm15, %xmm2, %xmm3 vpand %xmm15, %xmm3, %xmm3 vpaddq .LCPI10_1(%rip), %xmm3, %xmm8 vpand .LCPI10_5(%rip), %xmm8, %xmm5 vpxor %xmm4, %xmm4, %xmm4 vpcmpeqq %xmm4, %xmm5, %xmm6 vptest %xmm6, %xmm6 jne .LBB10_9 It turned out that the vector one is way more complicated than the scalar one. I was expecting that it would be not so tedious. On Fri, Jun 26, 2015 at 3:49 AM, suyog sarda <sardask01 at gmail.com> wrote: > > > > > Is LLVM be able t...

[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!

2010 Aug 02

0

[LLVMdev] Register Allocation ERROR! Ran out of registers during register allocation!

...ut of registers during register allocation! Please check your inline asm statement for invalid constraints: INLINEASM <es:movd %eax, %xmm3 pshuflw $$0, %xmm3, %xmm3 punpcklwd %xmm3, %xmm3 pxor %xmm7, %xmm7 pxor %xmm4, %xmm4 movdqa ($2), %xmm5 pxor %xmm6, %xmm6 psubw ($3), %xmm6 mov $$-128, %eax .align 1 << 4 1: movdqa ($1, %eax), %xmm0 movdqa %xmm0, %xmm1 pabsw %xmm0, %xmm0 psubusw %xmm6, %xmm0 pmulhw %xmm5, %xmm0 por %xmm0, %xmm4 psignw %xmm1, %xmm0...

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

2

[PATCH] Make SSE Run Time option. Add Win32 SSE code

...[eax+20] + mulps xmm2, xmm1 + mulps xmm3, xmm1 + movss xmm4, [eax+36] + movss xmm5, [eax+40] + mulss xmm4, xmm1 + mulss xmm5, xmm1 + movaps xmm6, [ebx+4] + subps xmm6, xmm2 + movups [ebx], xmm6 + movaps xmm7, [ebx+20] + subps xmm7, xmm3 + movups [ebx+16], xmm7 + + movss xmm7, [ebx+36] +...

search for: xmm6