thr3ads.net - search: "vmovdqu32"

Displaying 3 results from an estimated 3 matches for "vmovdqu32".

2017 Jun 25

AVX Scheduling and Parallelism

...ike are 2 vmov with different registers executed in parallel? it can be because each core has an AVX unit. does compiler exploit it? secondly i am generating assembly for intel and there are some offset like rip register or some constant addition in memory index. why is that so? eg.1 vmovdqu32 zmm0, zmmword ptr [rip + c] vpaddd zmm0, zmm0, zmmword ptr [rip + b] vmovdqu32 zmmword ptr [rip + a], zmm0 vmovdqu32 zmm0, zmmword ptr [rip + c+64] vpaddd zmm0, zmm0, zmmword ptr [rip + b+64] and eg. 2 mov rax...

AVX Scheduling and Parallelism

2017 Jun 24

AVX Scheduling and Parallelism

...parallel? like are 2 vmov with different registers executed in parallel? it can be because each core has an AVX unit. does compiler exploit it? secondly i am generating assembly for intel and there are some offset like rip register or some constant addition in memory index. why is that so? eg.1 vmovdqu32 zmm0, zmmword ptr [rip + c] vpaddd zmm0, zmm0, zmmword ptr [rip + b] vmovdqu32 zmmword ptr [rip + a], zmm0 vmovdqu32 zmm0, zmmword ptr [rip + c+64] vpaddd zmm0, zmm0, zmmword ptr [rip + b+64] and eg. 2 mov rax, -393216 .p2align 4, 0x90 .LBB0_1: # %vector.body...

AVX Scheduling and Parallelism

2017 Jun 25

AVX Scheduling and Parallelism

Hi, Zvi, I agree. In the context of targeting the KNL, however, I'm a bit concerned about the addressing, and specifically, the size of the resulting encoding: > vmovdqu32 zmm0, zmmword ptr [rax + c+401280] ;load b[401280] in > zmm0 > > vpaddd zmm1, zmm1, zmmword ptr [rax + b+401344] > ; zmm1<-zmm1+b[401344] The KNL can only deliver 16 bytes per cycle from the icache to the decoder. Essentially all of the inst...

search for: vmovdqu32