Displaying 3 results from an estimated 3 matches for "p_2048b_store_dword".
2017 Aug 26
2
Register Allocation and Scheduling Issues
...Also can i control the ordering like after R_0 can i use R_5
without changes in registerinfo.td?
What changes are required here? either in scheduling or register allocation
phases?
P_2048B_LOAD_DWORD R_0, Pword ptr [rip + b]
P_2048B_LOAD_DWORD R_1, Pword ptr [rip + c]
P_2048B_VADD R_0, R_1, R_0
P_2048B_STORE_DWORD Pword ptr [rip + a], R_0
P_2048B_LOAD_DWORD R_0, Pword ptr [rip + b+2048]
P_2048B_LOAD_DWORD R_1, Pword ptr [rip + c+2048]
P_2048B_VADD R_0, R_1, R_0
P_2048B_STORE_DWORD Pword ptr [rip + a+2048], R_0
P_2048B_LOAD_DWORD R_0, Pword ptr [rip + b+4096]
P_2048B_LOAD_DWORD R_1, Pword ptr [rip + c+4096]
P...
2017 Jun 25
0
AVX Scheduling and Parallelism
Hi, Zvi,
I agree. In the context of targeting the KNL, however, I'm a bit
concerned about the addressing, and specifically, the size of the
resulting encoding:
> vmovdqu32 zmm0, zmmword ptr [rax + c+401280] ;load b[401280] in
> zmm0
>
> vpaddd zmm1, zmm1, zmmword ptr [rax + b+401344]
> ; zmm1<-zmm1+b[401344]
The KNL can only
2017 Jun 25
2
AVX Scheduling and Parallelism
Hi Ahmed,
>From what can be seen in the code snippet you provided, the reuse of XMM0 and XMM1 across loop-unroll instances does not inhibit instruction-level parallelism.
Modern X86 processors use register renaming that can eliminate the dependencies in the instruction stream. In the example you provided, the processor should be able to identify the 2-vloads + vadd + vstore sequences as