thr3ads.net - search: "zmmword"

Displaying 4 results from an estimated 4 matches for "zmmword".

Did you mean: xmmword

2017 Jun 24

AVX Scheduling and Parallelism

...are 2 vmov with different registers executed in parallel? it can be because each core has an AVX unit. does compiler exploit it? secondly i am generating assembly for intel and there are some offset like rip register or some constant addition in memory index. why is that so? eg.1 vmovdqu32 zmm0, zmmword ptr [rip + c] vpaddd zmm0, zmm0, zmmword ptr [rip + b] vmovdqu32 zmmword ptr [rip + a], zmm0 vmovdqu32 zmm0, zmmword ptr [rip + c+64] vpaddd zmm0, zmm0, zmmword ptr [rip + b+64] and eg. 2 mov rax, -393216 .p2align 4, 0x90 .LBB0_1: # %vector.body...

AVX Scheduling and Parallelism

2017 Jun 25

AVX Scheduling and Parallelism

...different registers executed in parallel? it can be because each core has an AVX unit. does compiler exploit it? secondly i am generating assembly for intel and there are some offset like rip register or some constant addition in memory index. why is that so? eg.1 vmovdqu32 zmm0, zmmword ptr [rip + c] vpaddd zmm0, zmm0, zmmword ptr [rip + b] vmovdqu32 zmmword ptr [rip + a], zmm0 vmovdqu32 zmm0, zmmword ptr [rip + c+64] vpaddd zmm0, zmm0, zmmword ptr [rip + b+64] and eg. 2 mov rax, -393216...

AVX Scheduling and Parallelism

2017 Jun 25

AVX Scheduling and Parallelism

Hi, Zvi, I agree. In the context of targeting the KNL, however, I'm a bit concerned about the addressing, and specifically, the size of the resulting encoding: > vmovdqu32 zmm0, zmmword ptr [rax + c+401280] ;load b[401280] in > zmm0 > > vpaddd zmm1, zmm1, zmmword ptr [rax + b+401344] > ; zmm1<-zmm1+b[401344] The KNL can only deliver 16 bytes per cycle from the icache to the decoder. Essentially all of the instructions in the lo...

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values w...

search for: zmmword