search for: p_2048b_load_dword

Displaying 3 results from an estimated 3 matches for "p_2048b_load_dword".

2017 Aug 26
2
Register Allocation and Scheduling Issues
..._2, R_3, R_4, R_5, R_6, R_7 But the generated assembly code only uses 2 registers. How to enable it to use all 8? Also can i control the ordering like after R_0 can i use R_5 without changes in registerinfo.td? What changes are required here? either in scheduling or register allocation phases? P_2048B_LOAD_DWORD R_0, Pword ptr [rip + b] P_2048B_LOAD_DWORD R_1, Pword ptr [rip + c] P_2048B_VADD R_0, R_1, R_0 P_2048B_STORE_DWORD Pword ptr [rip + a], R_0 P_2048B_LOAD_DWORD R_0, Pword ptr [rip + b+2048] P_2048B_LOAD_DWORD R_1, Pword ptr [rip + c+2048] P_2048B_VADD R_0, R_1, R_0 P_2048B_STORE_DWORD Pword ptr [ri...
2017 Jun 25
0
AVX Scheduling and Parallelism
Hi, Zvi, I agree. In the context of targeting the KNL, however, I'm a bit concerned about the addressing, and specifically, the size of the resulting encoding: > vmovdqu32 zmm0, zmmword ptr [rax + c+401280] ;load b[401280] in > zmm0 > > vpaddd zmm1, zmm1, zmmword ptr [rax + b+401344] > ; zmm1<-zmm1+b[401344] The KNL can only
2017 Jun 25
2
AVX Scheduling and Parallelism
Hi Ahmed, >From what can be seen in the code snippet you provided, the reuse of XMM0 and XMM1 across loop-unroll instances does not inhibit instruction-level parallelism. Modern X86 processors use register renaming that can eliminate the dependencies in the instruction stream. In the example you provided, the processor should be able to identify the 2-vloads + vadd + vstore sequences as