Displaying 6 results from an estimated 6 matches for "vpsrlq".
2015 Jun 24
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
Hi,
Is LLVM be able to generate code for the following code?
%mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type.
I am running it on a Haswell processor with LLVM-3.4.2. It seems that it
will generates really complicated code with vpaddq, vpmuludq, vpsllq,
vpsrlq.
Thanks,
Zhi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150624/3ac7c2f4/attachment.html>
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
...i128
%mskS54_D = icmp ne i128 %BCS54_D, 0
br i1 %mskS54_D, label %middle.block, label %vector.ph
Now the assembly for the above IR code is:
# BB#4: # %for.cond.preheader
vmovdqa 144(%rsp), %xmm0 # 16-byte Reload
vpmuludq %xmm7, %xmm0, %xmm2
vpsrlq $32, %xmm7, %xmm4
vpmuludq %xmm4, %xmm0, %xmm4
vpsllq $32, %xmm4, %xmm4
vpaddq %xmm4, %xmm2, %xmm2
vpsrlq $32, %xmm0, %xmm4
vpmuludq %xmm7, %xmm4, %xmm4
vpsllq $32, %xmm4, %xmm4
vpaddq %xmm4, %xmm2, %xmm2
vpextrq $1, %xmm2, %rax
cltq
vmovq %rax,...
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
...indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from
these locations. and zmm2 contains constant 4000. so,
vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000,
as for array b the stride is 4000.
zmm14= 3200, 3600, 40000, ............28000.
now as you said
vpsrlq zmm15, zmm10, 32 ; will shift zmm10(=zmm22) each 64 bit element by
32bit so
zmm15=? (can you compute the value of zmm15 here)?
On Sat, Jul 1, 2017 at 4:45 AM, Craig Topper <craig.topper at gmail.com> wrote:
> If you see a comment after an instruction that contains LCP in the
>...
2016 Jan 18
2
Lets do a 1.3.2 release
Dave Yeo wrote:
> Seems that the default binutils on OS/2 is too old to support AVX2,
> attached patch works around this. Not the best solution as best would be
> configure tests, but simple.
Are you sure that these binutils support AVX and FMA? (Currently libFLAC
doesn't contain AVX and FMA instructions). If they aren't supported then
it's better to include them too into
2016 Jan 18
0
Lets do a 1.3.2 release
...ot
much up on assembly,
make[4]: Entering directory `K:/usr/local/src/flac/src/libFLAC'
CC lpc_intrin_avx2.lo
R:/tmp/ccwvrScM.s: Assembler messages:
R:/tmp/ccwvrScM.s:495: Error: operand type mismatch for `vbroadcastss'
...
R:/tmp/ccwvrScM.s:8773: Error: operand type mismatch for `vpsrlq'
R:/tmp/ccwvrScM.s:8778: Error: no such instruction: `vpermd
%ymm1,%ymm5,%ymm0'
R:/tmp/ccwvrScM.s:8859: Error: operand type mismatch for `vpmovzxdq'
...
Best to be safe so updated patch attached.
I've also opened a ticket, http://trac.netlabs.org/rpm/ticket/165#ticket
Dave
-------...
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
.../X86/avx-arith.ll Tue Oct 15 18:33:07 2013
>> @@ -240,15 +240,15 @@ define <16 x i16> @vpmullw(<16 x i16> %i
>> ; CHECK-NEXT: vpmuludq %xmm
>> ; CHECK-NEXT: vpsllq $32, %xmm
>> ; CHECK-NEXT: vpaddq %xmm
>> -; CHECK-NEXT: vpmuludq %xmm
>> ; CHECK-NEXT: vpsrlq $32, %xmm
>> ; CHECK-NEXT: vpmuludq %xmm
>> ; CHECK-NEXT: vpsllq $32, %xmm
>> +; CHECK-NEXT: vpaddq %xmm
>> +; CHECK-NEXT: vpmuludq %xmm
>> ; CHECK-NEXT: vpsrlq $32, %xmm
>> ; CHECK-NEXT: vpmuludq %xmm
>> ; CHECK-NEXT: vpsllq $32, %xmm
>> ; CHECK-NEXT...