Displaying 7 results from an estimated 7 matches for "vpsllq".
2015 Jun 24
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
Hi,
Is LLVM be able to generate code for the following code?
%mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type.
I am running it on a Haswell processor with LLVM-3.4.2. It seems that it
will generates really complicated code with vpaddq, vpmuludq, vpsllq,
vpsrlq.
Thanks,
Zhi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150624/3ac7c2f4/attachment.html>
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
...el %middle.block, label %vector.ph
Now the assembly for the above IR code is:
# BB#4: # %for.cond.preheader
vmovdqa 144(%rsp), %xmm0 # 16-byte Reload
vpmuludq %xmm7, %xmm0, %xmm2
vpsrlq $32, %xmm7, %xmm4
vpmuludq %xmm4, %xmm0, %xmm4
vpsllq $32, %xmm4, %xmm4
vpaddq %xmm4, %xmm2, %xmm2
vpsrlq $32, %xmm0, %xmm4
vpmuludq %xmm7, %xmm4, %xmm4
vpsllq $32, %xmm4, %xmm4
vpaddq %xmm4, %xmm2, %xmm2
vpextrq $1, %xmm2, %rax
cltq
vmovq %rax, %xmm4
vmovq %xmm2, %rax
cltq
vmovq %rax, %xmm5...
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
...5].
>>>>> it will now become [0,0,0,0,8,9,10,11]...Am I correct? Please explain me
>>>>> the purpose of this step.*
>>>>> * vpmuludq zmm15, zmm15, zmm2 ; similarly **dont understand the
>>>>> need for this step.*
>>>>> * vpsllq zmm15, zmm15, 32 ; **dont understand the need for this
>>>>> step*
>>>>> * vpaddq zmm14, zmm14, zmm3 ; *
>>>>> * vpaddq zmm14, zmm15, zmm14 ; **dont understand the need for this
>>>>> step*
>>>>>
>>>>
>>...
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
...%zextS39_D to i128
%mskS39_D = icmp ne i128 %BCS39_D, 0
br i1 %mskS39_D, label %if.then11, label %if.else
-------------------------------------------- Assembly
-----------------------------------------------------------------
# BB#3: # %if.then.i.i.i.i.i.i
vpsllq $3, %xmm0, %xmm0
vpextrq $1, %xmm0, %rbx
movq %rbx, %rdi
vmovaps %xmm2, 96(%rsp) # 16-byte Spill
vmovaps %xmm5, 64(%rsp) # 16-byte Spill
vmovdqa %xmm6, 16(%rsp) # 16-byte Spill
callq _Znam
movq %rax, 128(%rsp)
movq 16(%r12), %rsi...
2015 Jul 24
1
[LLVMdev] SIMD for sdiv <2 x i64>
...icmp ne i128 %BCS39_D, 0
> br i1 %mskS39_D, label %if.then11, label %if.else
>
> -------------------------------------------- Assembly
> -----------------------------------------------------------------
>
> # BB#3: # %if.then.i.i.i.i.i.i
> vpsllq $3, %xmm0, %xmm0
> vpextrq $1, %xmm0, %rbx
> movq %rbx, %rdi
> vmovaps %xmm2, 96(%rsp) # 16-byte Spill
> vmovaps %xmm5, 64(%rsp) # 16-byte Spill
> vmovdqa %xmm6, 16(%rsp) # 16-byte Spill
> callq _Znam
> movq %rax, 128...
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
On 07/24/2015 03:42 AM, Benjamin Kramer wrote:
>> On 24.07.2015, at 08:06, zhi chen <zchenhn at gmail.com> wrote:
>>
>> It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
...===========
>> --- llvm/trunk/test/CodeGen/X86/avx-arith.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/avx-arith.ll Tue Oct 15 18:33:07 2013
>> @@ -240,15 +240,15 @@ define <16 x i16> @vpmullw(<16 x i16> %i
>> ; CHECK-NEXT: vpmuludq %xmm
>> ; CHECK-NEXT: vpsllq $32, %xmm
>> ; CHECK-NEXT: vpaddq %xmm
>> -; CHECK-NEXT: vpmuludq %xmm
>> ; CHECK-NEXT: vpsrlq $32, %xmm
>> ; CHECK-NEXT: vpmuludq %xmm
>> ; CHECK-NEXT: vpsllq $32, %xmm
>> +; CHECK-NEXT: vpaddq %xmm
>> +; CHECK-NEXT: vpmuludq %xmm
>> ; CHECK-NEXT: vp...