search for: zmmword

Displaying 4 results from an estimated 4 matches for "zmmword".

Did you mean: xmmword
2017 Jun 24
4
AVX Scheduling and Parallelism
...are 2 vmov with different registers executed in parallel? it can be because each core has an AVX unit. does compiler exploit it? secondly i am generating assembly for intel and there are some offset like rip register or some constant addition in memory index. why is that so? eg.1 vmovdqu32 zmm0, zmmword ptr [rip + c] vpaddd zmm0, zmm0, zmmword ptr [rip + b] vmovdqu32 zmmword ptr [rip + a], zmm0 vmovdqu32 zmm0, zmmword ptr [rip + c+64] vpaddd zmm0, zmm0, zmmword ptr [rip + b+64] and eg. 2 mov rax, -393216 .p2align 4, 0x90 .LBB0_1: # %vector.body...
2017 Jun 25
2
AVX Scheduling and Parallelism
...different registers executed in parallel? it can be because each core has an AVX unit. does compiler exploit it? secondly i am generating assembly for intel and there are some offset like rip register or some constant addition in memory index. why is that so? eg.1 vmovdqu32 zmm0, zmmword ptr [rip + c] vpaddd zmm0, zmm0, zmmword ptr [rip + b] vmovdqu32 zmmword ptr [rip + a], zmm0 vmovdqu32 zmm0, zmmword ptr [rip + c+64] vpaddd zmm0, zmm0, zmmword ptr [rip + b+64] and eg. 2 mov rax, -393216...
2017 Jun 25
0
AVX Scheduling and Parallelism
Hi, Zvi, I agree. In the context of targeting the KNL, however, I'm a bit concerned about the addressing, and specifically, the size of the resulting encoding: > vmovdqu32 zmm0, zmmword ptr [rax + c+401280] ;load b[401280] in > zmm0 > > vpaddd zmm1, zmm1, zmmword ptr [rax + b+401344] > ; zmm1<-zmm1+b[401344] The KNL can only deliver 16 bytes per cycle from the icache to the decoder. Essentially all of the instructions in the lo...
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values w...