search for: vpsrlq

Displaying 6 results from an estimated 6 matches for "vpsrlq".

2015 Jun 24
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
Hi, Is LLVM be able to generate code for the following code? %mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type. I am running it on a Haswell processor with LLVM-3.4.2. It seems that it will generates really complicated code with vpaddq, vpmuludq, vpsllq, vpsrlq. Thanks, Zhi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150624/3ac7c2f4/attachment.html>
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
...i128 %mskS54_D = icmp ne i128 %BCS54_D, 0 br i1 %mskS54_D, label %middle.block, label %vector.ph Now the assembly for the above IR code is: # BB#4: # %for.cond.preheader vmovdqa 144(%rsp), %xmm0 # 16-byte Reload vpmuludq %xmm7, %xmm0, %xmm2 vpsrlq $32, %xmm7, %xmm4 vpmuludq %xmm4, %xmm0, %xmm4 vpsllq $32, %xmm4, %xmm4 vpaddq %xmm4, %xmm2, %xmm2 vpsrlq $32, %xmm0, %xmm4 vpmuludq %xmm7, %xmm4, %xmm4 vpsllq $32, %xmm4, %xmm4 vpaddq %xmm4, %xmm2, %xmm2 vpextrq $1, %xmm2, %rax cltq vmovq %rax,...
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
...indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14= 3200, 3600, 40000, ............28000. now as you said vpsrlq zmm15, zmm10, 32 ; will shift zmm10(=zmm22) each 64 bit element by 32bit so zmm15=? (can you compute the value of zmm15 here)? On Sat, Jul 1, 2017 at 4:45 AM, Craig Topper <craig.topper at gmail.com> wrote: > If you see a comment after an instruction that contains LCP in the &gt...
2016 Jan 18
2
Lets do a 1.3.2 release
Dave Yeo wrote: > Seems that the default binutils on OS/2 is too old to support AVX2, > attached patch works around this. Not the best solution as best would be > configure tests, but simple. Are you sure that these binutils support AVX and FMA? (Currently libFLAC doesn't contain AVX and FMA instructions). If they aren't supported then it's better to include them too into
2016 Jan 18
0
Lets do a 1.3.2 release
...ot much up on assembly, make[4]: Entering directory `K:/usr/local/src/flac/src/libFLAC' CC lpc_intrin_avx2.lo R:/tmp/ccwvrScM.s: Assembler messages: R:/tmp/ccwvrScM.s:495: Error: operand type mismatch for `vbroadcastss' ... R:/tmp/ccwvrScM.s:8773: Error: operand type mismatch for `vpsrlq' R:/tmp/ccwvrScM.s:8778: Error: no such instruction: `vpermd %ymm1,%ymm5,%ymm0' R:/tmp/ccwvrScM.s:8859: Error: operand type mismatch for `vpmovzxdq' ... Best to be safe so updated patch attached. I've also opened a ticket, http://trac.netlabs.org/rpm/ticket/165#ticket Dave -------...
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
.../X86/avx-arith.ll Tue Oct 15 18:33:07 2013 >> @@ -240,15 +240,15 @@ define <16 x i16> @vpmullw(<16 x i16> %i >> ; CHECK-NEXT: vpmuludq %xmm >> ; CHECK-NEXT: vpsllq $32, %xmm >> ; CHECK-NEXT: vpaddq %xmm >> -; CHECK-NEXT: vpmuludq %xmm >> ; CHECK-NEXT: vpsrlq $32, %xmm >> ; CHECK-NEXT: vpmuludq %xmm >> ; CHECK-NEXT: vpsllq $32, %xmm >> +; CHECK-NEXT: vpaddq %xmm >> +; CHECK-NEXT: vpmuludq %xmm >> ; CHECK-NEXT: vpsrlq $32, %xmm >> ; CHECK-NEXT: vpmuludq %xmm >> ; CHECK-NEXT: vpsllq $32, %xmm >> ; CHECK-NEXT...