search for: vpunpcklqdq

Displaying 8 results from an estimated 8 matches for "vpunpcklqdq".

Did you mean: punpcklqdq
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
...shrq $63, %rax shrq $2, %rcx addl %eax, %ecx vpextrq $1, %xmm5, %rax imulq %rbx movq %rdx, %rax shrq $63, %rax shrq $2, %rdx addl %eax, %edx movslq %edx, %rax vmovq %rax, %xmm5 movslq %ecx, %rax vmovq %rax, %xmm6 vpunpcklqdq %xmm5, %xmm6, %xmm5 # xmm5 = xmm6[0],xmm5[0] -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150723/4c853c43/attachment.html>
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
...> imulq %rbx >> movq %rdx, %rax >> shrq $63, %rax >> shrq $2, %rdx >> addl %eax, %edx >> movslq %edx, %rax >> vmovq %rax, %xmm5 >> movslq %ecx, %rax >> vmovq %rax, %xmm6 >> vpunpcklqdq %xmm5, %xmm6, %xmm5 # xmm5 = xmm6[0],xmm5[0] > AVX2 doesn't have integer vector division instructions and LLVM lowers divides by constants into (128 bit) multiplies. However, AVX2 doesn't have a way to get to the upper 64 bits of a 64x64->128 bit multiply either, so LLVM uses the scal...
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
...AAAAAAAAAAB imulq %rcx movq %rdx, %rax shrq $63, %rax sarq $2, %rdx addq %rax, %rdx vmovq %rdx, %xmm1 vmovq %xmm0, %rax imulq %rcx movq %rdx, %rax shrq $63, %rax sarq $2, %rdx addq %rax, %rdx vmovq %rdx, %xmm0 vpunpcklqdq %xmm1, %xmm0, %xmm1 # xmm1 = xmm0[0],xmm1[0] vpxor %xmm4, %xmm1, %xmm0 vpcmpgtq %xmm6, %xmm0, %xmm0 vptest %xmm0, %xmm0 je .LBB582_49 Thanks, Zhi On Fri, Jul 24, 2015 at 10:16 AM, Philip Reames <listmail at philipreames.com> wrote: > > > On 07/24/2015 03:42...
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
...ddl %eax, %edx > movslq %edx, %rax > vmovq %rax, %xmm5 > movslq %ecx, %rax > vmovq %rax, %xmm6 > vpunpcklqdq %xmm5, %xmm6, %xmm5 # xmm5 = xmm6[0],xmm5[0] AVX2 doesn't have integer vector division instructions and LLVM lowers divides by constants into (128 bit) multiplies. However, AVX2 doesn't have a way to get to the upper 64 bits of a 64x64->128 bit multiply either, so LLVM uses the sc...
2015 Jul 24
1
[LLVMdev] SIMD for sdiv <2 x i64>
...shrq $63, %rax > sarq $2, %rdx > addq %rax, %rdx > vmovq %rdx, %xmm1 > vmovq %xmm0, %rax > imulq %rcx > movq %rdx, %rax > shrq $63, %rax > sarq $2, %rdx > addq %rax, %rdx > vmovq %rdx, %xmm0 > vpunpcklqdq %xmm1, %xmm0, %xmm1 # xmm1 = xmm0[0],xmm1[0] > vpxor %xmm4, %xmm1, %xmm0 > vpcmpgtq %xmm6, %xmm0, %xmm0 > vptest %xmm0, %xmm0 > je .LBB582_49 > > Thanks, > Zhi > > On Fri, Jul 24, 2015 at 10:16 AM, Philip Reames > <listmail at philipreames.co...
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
...$32, %xmm4, %xmm4 vpaddq %xmm4, %xmm2, %xmm2 vpsrlq $32, %xmm0, %xmm4 vpmuludq %xmm7, %xmm4, %xmm4 vpsllq $32, %xmm4, %xmm4 vpaddq %xmm4, %xmm2, %xmm2 vpextrq $1, %xmm2, %rax cltq vmovq %rax, %xmm4 vmovq %xmm2, %rax cltq vmovq %rax, %xmm5 vpunpcklqdq %xmm4, %xmm5, %xmm4 # xmm4 = xmm5[0],xmm4[0] vpcmpgtq %xmm3, %xmm4, %xmm3 vptest %xmm3, %xmm3 je .LBB10_66 # BB#5: # %for.body.preheader vpaddq %xmm15, %xmm2, %xmm3 vpand %xmm15, %xmm3, %xmm3 vpaddq .LCPI10_1(%rip), %xmm3, %xmm8 v...
2015 Jun 24
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
Hi, Is LLVM be able to generate code for the following code? %mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type. I am running it on a Haswell processor with LLVM-3.4.2. It seems that it will generates really complicated code with vpaddq, vpmuludq, vpsllq, vpsrlq. Thanks, Zhi -------------- next part -------------- An HTML attachment was scrubbed... URL:
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote: > Awesome, thanks for all the information! > > See below: > > On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> > wrote: >> >> You have already mentioned how the new shuffle lowering is missing >> some features; for example, you explicitly