Tom Stellard
2015-Apr-24 15:11 UTC
[LLVMdev] MISched: What does it mean when PressureChange objects are not valid?
Hi, I've been trying to debug an issue where the scheduler does not hide latency well and schedules an ALU instruction before a load. v_add_i32_e32 v0, s10, v0 <-- This should be scheduled after the load. s_load_dwordx4 s[0:3], s[8:9], 0x4 s_waitcnt lgkmcnt(0) buffer_load_format_xyzw v[0:3], v0, s[0:3], 0 idxen v_mov_b32_e32 v4, 0 s_waitcnt vmcnt(0) exp 15, 32, 0, 0, 0, v0, v1, v4, v4 s_endpgm The reason that v_add_i32_e32 is scheduled first is because the tryPressure call checking CriticalMax returns true. The CriticalMax PressureChange for v_add_i32_e32 is invalid, which gives it a higher rank than s_load_dwordx4. I'm wondering what it means to have an invalid PressureChange value for CriticalMax and why an invalid Pressure change is always scheduled first. For some more context here is some debug output from the machine scheduler: SU(3): %vreg20<def> = V_ADD_I32_e32 %vreg5, %vreg7, %EXEC<imp-use>, %VCC<imp-def,dead>; VGPR_32:%vreg20,%vreg7 SGPR_32:%vreg5 # preds left : 2 # succs left : 1 # rdefs left : 0 Latency : 1 Depth : 0 Height : 451 Predecessors: val SU(1): Latency=0 Reg=%vreg5 val SU(0): Latency=0 Reg=%vreg7 Successors: val SU(5): Latency=1 Reg=%vreg20 SU(4): %vreg13<def> = S_LOAD_DWORDX4_IMM %vreg4, 4; mem:LD16[%3(addrspace=2)](tbaa=<0x1631310>) SReg_128:%vreg13 SReg_64:%vreg4 # preds left : 1 # succs left : 1 # rdefs left : 0 Latency : 10 Depth : 0 Height : 460 Predecessors: val SU(2): Latency=0 Reg=%vreg4 Successors: val SU(5): Latency=10 Reg=%vreg13 ===tryCandidate( Cand = 3, tryCand = 4)==biasPhysRegCopy: tryPressure Execess: +tryPressure() TryCand = 4, Cand = 3 TryRank = 65535 CandRank = 65535 Both candidates affect the same set tryPressure CriticalMax: +tryPressure() TryCand = 4, Cand = 3 TryRank = 65535 CandRank = 65535 Both candidates affect the same set tryLatency: tryLess, getLatencyStallCycles: tryGreater, cluster: tryLess, getWeakLeft: tryPression CurrentMax: +tryPressure() TryCand = 4, Cand = 3 TryRank = 12 CandRank = 65535 tryGreater Pick Top REG-MAX Scheduling SU(3) %vreg20<def> = V_ADD_I32_e32 %vreg5, %vreg7, %EXEC<imp-use>, %VCC<imp-def,dead>; VGPR_32:%vreg20,%vreg7 %SGPR_32:%vreg5 Ready @0c HWVALU +1x3255u *** Max MOps 1 at cycle 0 Cycle: 1 TopQ.A TopQ.A @1c Retired: 1 Executed: 1c Critical: 1c, 1 MOps ExpectedLatency: 0c - Latency limited. TopQ.A: 6 4 SU(6) ORDER -Tom