Displaying 4 results from an estimated 4 matches for "vpmuludq".
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
...tcast <2 x i64> %sextS54_D to i128
%mskS54_D = icmp ne i128 %BCS54_D, 0
br i1 %mskS54_D, label %middle.block, label %vector.ph
Now the assembly for the above IR code is:
# BB#4: # %for.cond.preheader
vmovdqa 144(%rsp), %xmm0 # 16-byte Reload
vpmuludq %xmm7, %xmm0, %xmm2
vpsrlq $32, %xmm7, %xmm4
vpmuludq %xmm4, %xmm0, %xmm4
vpsllq $32, %xmm4, %xmm4
vpaddq %xmm4, %xmm2, %xmm2
vpsrlq $32, %xmm0, %xmm4
vpmuludq %xmm7, %xmm4, %xmm4
vpsllq $32, %xmm4, %xmm4
vpaddq %xmm4, %xmm2, %xmm2
vpextrq $1, %xmm...
2015 Jun 24
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
Hi,
Is LLVM be able to generate code for the following code?
%mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type.
I am running it on a Haswell processor with LLVM-3.4.2. It seems that it
will generates really complicated code with vpaddq, vpmuludq, vpsllq,
vpsrlq.
Thanks,
Zhi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150624/3ac7c2f4/attachment.html>
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You,
It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 =
[8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are
indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from
these locations. and zmm2 contains constant 4000. so,
vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000,
as for array b the stride is 4000.
zmm14= 3200, 3600, 40000, ............28000.
now as you said
vpsrlq zmm15, zmm10, 32 ; will shift zmm10(=zmm22) each 64 bit element by
32bit so
zmm15=? (can you compute the value of zmm15 here)?...
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
...================================================
>> --- llvm/trunk/test/CodeGen/X86/avx-arith.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/avx-arith.ll Tue Oct 15 18:33:07 2013
>> @@ -240,15 +240,15 @@ define <16 x i16> @vpmullw(<16 x i16> %i
>> ; CHECK-NEXT: vpmuludq %xmm
>> ; CHECK-NEXT: vpsllq $32, %xmm
>> ; CHECK-NEXT: vpaddq %xmm
>> -; CHECK-NEXT: vpmuludq %xmm
>> ; CHECK-NEXT: vpsrlq $32, %xmm
>> ; CHECK-NEXT: vpmuludq %xmm
>> ; CHECK-NEXT: vpsllq $32, %xmm
>> +; CHECK-NEXT: vpaddq %xmm
>> +; CHECK-NEXT: vpmu...