Displaying 8 results from an estimated 8 matches for "vpunpcklqdq".
Did you mean:
punpcklqdq
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
...shrq $63, %rax
shrq $2, %rcx
addl %eax, %ecx
vpextrq $1, %xmm5, %rax
imulq %rbx
movq %rdx, %rax
shrq $63, %rax
shrq $2, %rdx
addl %eax, %edx
movslq %edx, %rax
vmovq %rax, %xmm5
movslq %ecx, %rax
vmovq %rax, %xmm6
vpunpcklqdq %xmm5, %xmm6, %xmm5 # xmm5 = xmm6[0],xmm5[0]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150723/4c853c43/attachment.html>
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
...> imulq %rbx
>> movq %rdx, %rax
>> shrq $63, %rax
>> shrq $2, %rdx
>> addl %eax, %edx
>> movslq %edx, %rax
>> vmovq %rax, %xmm5
>> movslq %ecx, %rax
>> vmovq %rax, %xmm6
>> vpunpcklqdq %xmm5, %xmm6, %xmm5 # xmm5 = xmm6[0],xmm5[0]
> AVX2 doesn't have integer vector division instructions and LLVM lowers divides by constants into (128 bit) multiplies. However, AVX2 doesn't have a way to get to the upper 64 bits of a 64x64->128 bit multiply either, so LLVM uses the scal...
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
...AAAAAAAAAAB
imulq %rcx
movq %rdx, %rax
shrq $63, %rax
sarq $2, %rdx
addq %rax, %rdx
vmovq %rdx, %xmm1
vmovq %xmm0, %rax
imulq %rcx
movq %rdx, %rax
shrq $63, %rax
sarq $2, %rdx
addq %rax, %rdx
vmovq %rdx, %xmm0
vpunpcklqdq %xmm1, %xmm0, %xmm1 # xmm1 = xmm0[0],xmm1[0]
vpxor %xmm4, %xmm1, %xmm0
vpcmpgtq %xmm6, %xmm0, %xmm0
vptest %xmm0, %xmm0
je .LBB582_49
Thanks,
Zhi
On Fri, Jul 24, 2015 at 10:16 AM, Philip Reames <listmail at philipreames.com>
wrote:
>
>
> On 07/24/2015 03:42...
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
...ddl %eax, %edx
> movslq %edx, %rax
> vmovq %rax, %xmm5
> movslq %ecx, %rax
> vmovq %rax, %xmm6
> vpunpcklqdq %xmm5, %xmm6, %xmm5 # xmm5 = xmm6[0],xmm5[0]
AVX2 doesn't have integer vector division instructions and LLVM lowers divides by constants into (128 bit) multiplies. However, AVX2 doesn't have a way to get to the upper 64 bits of a 64x64->128 bit multiply either, so LLVM uses the sc...
2015 Jul 24
1
[LLVMdev] SIMD for sdiv <2 x i64>
...shrq $63, %rax
> sarq $2, %rdx
> addq %rax, %rdx
> vmovq %rdx, %xmm1
> vmovq %xmm0, %rax
> imulq %rcx
> movq %rdx, %rax
> shrq $63, %rax
> sarq $2, %rdx
> addq %rax, %rdx
> vmovq %rdx, %xmm0
> vpunpcklqdq %xmm1, %xmm0, %xmm1 # xmm1 = xmm0[0],xmm1[0]
> vpxor %xmm4, %xmm1, %xmm0
> vpcmpgtq %xmm6, %xmm0, %xmm0
> vptest %xmm0, %xmm0
> je .LBB582_49
>
> Thanks,
> Zhi
>
> On Fri, Jul 24, 2015 at 10:16 AM, Philip Reames
> <listmail at philipreames.co...
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
...$32, %xmm4, %xmm4
vpaddq %xmm4, %xmm2, %xmm2
vpsrlq $32, %xmm0, %xmm4
vpmuludq %xmm7, %xmm4, %xmm4
vpsllq $32, %xmm4, %xmm4
vpaddq %xmm4, %xmm2, %xmm2
vpextrq $1, %xmm2, %rax
cltq
vmovq %rax, %xmm4
vmovq %xmm2, %rax
cltq
vmovq %rax, %xmm5
vpunpcklqdq %xmm4, %xmm5, %xmm4 # xmm4 = xmm5[0],xmm4[0]
vpcmpgtq %xmm3, %xmm4, %xmm3
vptest %xmm3, %xmm3
je .LBB10_66
# BB#5: # %for.body.preheader
vpaddq %xmm15, %xmm2, %xmm3
vpand %xmm15, %xmm3, %xmm3
vpaddq .LCPI10_1(%rip), %xmm3, %xmm8
v...
2015 Jun 24
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
Hi,
Is LLVM be able to generate code for the following code?
%mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type.
I am running it on a Haswell processor with LLVM-3.4.2. It seems that it
will generates really complicated code with vpaddq, vpmuludq, vpsllq,
vpsrlq.
Thanks,
Zhi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 9, 2014 at 11:39 PM, Chandler Carruth <chandlerc at google.com> wrote:
> Awesome, thanks for all the information!
>
> See below:
>
> On Tue, Sep 9, 2014 at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com>
> wrote:
>>
>> You have already mentioned how the new shuffle lowering is missing
>> some features; for example, you explicitly