thr3ads.net - search: "p_2048b

Displaying 3 results from an estimated 3 matches for "p_2048b_vadd".

2017 Aug 26

...to enable it to use all 8? Also can i control the ordering like after R_0 can i use R_5 without changes in registerinfo.td? What changes are required here? either in scheduling or register allocation phases? P_2048B_LOAD_DWORD R_0, Pword ptr [rip + b] P_2048B_LOAD_DWORD R_1, Pword ptr [rip + c] P_2048B_VADD R_0, R_1, R_0 P_2048B_STORE_DWORD Pword ptr [rip + a], R_0 P_2048B_LOAD_DWORD R_0, Pword ptr [rip + b+2048] P_2048B_LOAD_DWORD R_1, Pword ptr [rip + c+2048] P_2048B_VADD R_0, R_1, R_0 P_2048B_STORE_DWORD Pword ptr [rip + a+2048], R_0 P_2048B_LOAD_DWORD R_0, Pword ptr [rip + b+4096] P_2048B_LOAD_DWO...

AVX Scheduling and Parallelism

2017 Jun 25

AVX Scheduling and Parallelism

Hi, Zvi, I agree. In the context of targeting the KNL, however, I'm a bit concerned about the addressing, and specifically, the size of the resulting encoding: > vmovdqu32 zmm0, zmmword ptr [rax + c+401280] ;load b[401280] in > zmm0 > > vpaddd zmm1, zmm1, zmmword ptr [rax + b+401344] > ; zmm1<-zmm1+b[401344] The KNL can only

AVX Scheduling and Parallelism

2017 Jun 25

AVX Scheduling and Parallelism

Hi Ahmed, >From what can be seen in the code snippet you provided, the reuse of XMM0 and XMM1 across loop-unroll instances does not inhibit instruction-level parallelism. Modern X86 processors use register renaming that can eliminate the dependencies in the instruction stream. In the example you provided, the processor should be able to identify the 2-vloads + vadd + vstore sequences as

search for: p_2048b_vadd