thr3ads.net - search: "zmm1"

Displaying 8 results from an estimated 8 matches for "zmm1".

Did you mean: xmm1

2017 Jun 24

AVX Scheduling and Parallelism

Hello, After generating AVX code for large no of iterations i came to realize that it still uses only 2 registers zmm0 and zmm1 when the loop urnroll factor=1024, i wonder if this register allocation allows operations in parallel? Also i know all the elements within a single vector instruction are computed in parallel but does the elements of multiple instructions computed in parallel? like are 2 vmov with different regis...

AVX Scheduling and Parallelism

2017 Jun 25

AVX Scheduling and Parallelism

...ht cause problems by making the instruction encodings large. cc'ing some Intel folks for further comments. -Hal On 06/23/2017 09:02 PM, hameeza ahmed via llvm-dev wrote: Hello, After generating AVX code for large no of iterations i came to realize that it still uses only 2 registers zmm0 and zmm1 when the loop urnroll factor=1024, i wonder if this register allocation allows operations in parallel? Also i know all the elements within a single vector instruction are computed in parallel but does the elements of multiple instructions computed in parallel? like are 2 vmov with different regis...

AVX Scheduling and Parallelism

2017 Jun 25

AVX Scheduling and Parallelism

Hi, Zvi, I agree. In the context of targeting the KNL, however, I'm a bit concerned about the addressing, and specifically, the size of the resulting encoding: > vmovdqu32 zmm0, zmmword ptr [rax + c+401280] ;load b[401280] in > zmm0 > > vpaddd zmm1, zmm1, zmmword ptr [rax + b+401344] > ; zmm1<-zmm1+b[401344] The KNL can only deliver 16 bytes per cycle from the icache to the decoder. Essentially all of the instructions in the loop, as we seem to generate it, have 10-byte encodings: 10: 62 f1 7e 48 6f 80 00 vmovdqu...

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14= 3200, 3600, 40000, ............28000. now as you said vpsrlq zmm15, zmm10, 32 ; will shift zmm10(=zmm22) each 64 bit element by 32bit so zmm15=? (can you compute the value of zmm15 here)?...

[X86][AVX512] RFC: make i1 illegal in the Codegen

2017 Jan 24

[X86][AVX512] RFC: make i1 illegal in the Codegen

...%r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef) ret 8 x i32>%r } Can be lowered to # BB#0: kxnorw %k0, %k0, %k1 vpgatherqd (,%zmm1), %ymm0 {%k1} retq Legal vectors of i1's require support for BUILD_VECTOR(i1, i1, .., i1), i1 EXTRACT_VEC_ELEMENT (...) and INSERT_VEC_ELEMENT(i1, ...) , so making i1 legal seemed like a sensible decision, and this is the current state in the top of trunk. However, making i1 legal affe...

VBROADCAST Implementation Issues

2017 Aug 06

VBROADCAST Implementation Issues

...t;>>>>>>>>>>>>>>>> .long 1045220557 # float 0.200000003 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>>>>>&...

VBROADCAST Implementation Issues

2017 Aug 07

VBROADCAST Implementation Issues

...t;>>>>>>>>>>>> .long 1045220557 # float 0.200000003 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>>>>>>>>>&...

VBROADCAST Implementation Issues

2017 Aug 07

VBROADCAST Implementation Issues

...>>>>>>>>>>>>>> 0.200000003 >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> vbroadcastss zmm1, dword ptr [rip + .LCPI0_0] >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> vmulps zmm2, zmm2, zmm1 >>>>>>>&...

search for: zmm1