Hi, Can anyone help me to understand the ScheduleDAGInstrs::buildSchedGraph() method? I find the handling of AliasChain is disturbing since: 1. A new alias chain add deps to all possibly aliasing SUs, and then clears those lists. 2. When AliasChain is present, the addChainDependency() method is called, but the target hook areMemAccessesTriviallyDisjoint() called inside MIsNeedChainEdge() allows this edge to be skipped. This means that I get a case where SU0 SU1, AliasChain /\ / \ // Aliasing memory accesses SU2 all SUs have memory operands, but the underlying objects vectors are empty. SU1 became AliasChain, added dep to SU2, and cleared the lists for aliasing memory accesses. SU0 has a dependency towards SU2, but towards SU1 it is trivially disjoint, and therefore it gets no dependency to SU1 and *neither to SU2*. The AliasChain concept is bypassed. I don't understand how it can first be assumed that an SU becomes AliasChain, and then an SU with lower NodeNum that may alias is allowed to skip its dep to the AliasChain? The BarrierChain is never skipped because addPred() is called directly, and I don't see how it is safe to skip the AliasChain for aliasing SUs, I think it should always be added? In other words, it may be that SU0 has a dep towards SU2 in the example, but not towards SU1, so therefore it is not safe to skip this dep. Calling addPred() instead of addChainDependency() for AliasChain should fix this, however it would lead to overly constrained DAGs compared to putting the edges where they actually should be (SU0<-SU2 instead of SU0<-SU1). Most thankful for any help, Jonas Paulsson -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141208/f26332e5/attachment.html>
Hi Jonas, Tom Stellard (cc'd) just recently fixed a bug whereby dependencies on the barrier chain would not be added when the SU had no underlying objects (r223717). Does this fix the problem you're highlighting? I've also cc'd Sanjin who developed the areMemAccessesTriviallyDisjoint() callback (which I'll note is a relatively-recent addition -- added on Sept. 8th of this year). In short, if r223717 does not resolve the problem, it sounds like you likely found a bug. What target are you using? I'm trying to get some idea of what your areMemAccessesTriviallyDisjoint() callback is doing. -Hal ----- Original Message -----> From: "Jonas Paulsson" <jonas.paulsson at ericsson.com> > To: hfinkel at anl.gov, llvmdev at cs.uiuc.edu > Cc: "Mattias Eriksson V" <mattias.v.eriksson at ericsson.com> > Sent: Monday, December 8, 2014 8:06:27 AM > Subject: ScheduleDAGInstrs.cpp > > Hi, > > Can anyone help me to understand the > ScheduleDAGInstrs::buildSchedGraph() method? > > > > I find the handling of AliasChain is disturbing since: > > > > 1. A new alias chain add deps to all possibly aliasing SUs, and then > clears those lists. > > 2. When AliasChain is present, the addChainDependency() method is > called, > > but the target hook areMemAccessesTriviallyDisjoint() called inside > > MIsNeedChainEdge() allows this edge to be skipped. > > > > This means that I get a case where > > > > SU0 > > > > SU1, AliasChain > > /\ > > / \ // Aliasing memory accesses > > SU2 > > > > all SUs have memory operands, but the underlying objects vectors are > empty. > > SU1 became AliasChain, added dep to SU2, and cleared the lists for > aliasing memory accesses. > > SU0 has a dependency towards SU2, but towards SU1 it is trivially > disjoint, and therefore it gets > > no dependency to SU1 and * neither to SU2 *. The AliasChain concept > is bypassed. > > > > I don’t understand how it can first be assumed that an SU becomes > AliasChain, and then an SU > > with lower NodeNum that may alias is allowed to skip its dep to the > AliasChain? > > The BarrierChain is never skipped because addPred() is called > directly, and I don’t see how it is > > safe to skip the AliasChain for aliasing SUs, I think it should > always be added? In other words, > > it may be that SU0 has a dep towards SU2 in the example, but not > towards SU1, so therefore it > > is not safe to skip this dep. > > > > Calling addPred() instead of addChainDependency() for AliasChain > should fix this, however it > > would lead to overly constrained DAGs compared to putting the edges > where they actually should > > be (SU0<-SU2 instead of SU0<-SU1). > > > > Most thankful for any help, > > > > Jonas Paulsson > > > >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Hi, Thank you for your reply. I have tried Tom Stellards patch, but it did not resolve my particular problem. I have tried to expose this problem on public target without success. I am still however fearing that the scheduler is incorrect. I have a (bit unoptimized) case where the LLVM I/R code looks something like %_tmp255 = load i64** %cse6.103 store i64 %_tmp250, i64* %_tmp255 %_tmp256 = load i64** %cse6.103 %_tmp257 = load i64* %_tmp256 which means the same address is loaded twice, first to store a value, and then to load it back. The MachineInstructions to load an i64 are first pseudos and are then split post regalloc and become SU(0): addrReg0 = load *fp(-50) // Two address registers with same value SU(1): addrReg1 = load *fp(-50) SU(2): store dataReg0:lo, *addrReg0(2) // Store a register in two parts SU(3): store dataReg0:hi, *addrReg0 SU(4): dataReg1:lo = load *addrReg1(2) // Load a register in two parts SU(5): dataReg1:hi = load *addrReg1 Since the addresses are loaded from memory, the underlying Objs become empty for all MIs. SU(4) and SU(5) get pushed to PendingLoads. SU(3) becomes AliasChain, and edges are added to SU(4) and SU(5). PendingLoads is cleared. SU(2) becomes AliasChain. This time TII->areMemAccessesTriviallyDisjoint() return true, since it can see that addrReg0 with and without offset are disjoint. This was not the case between SU(3) and SU(4), since they were different address registers (with the same value). The result is that there is no edge added between SU(2) and SU(3) and not to SU(4) since PendlingLoads was cleared earlier. SU(2) and SU(4) are however a store and a load against the same address. The problem is that TII->areMemAccessesTriviallyDisjoint() is incomplete and therefore inconsistent depending on if the two memory accesses are trivial to analyze or not. Since TII->areMemAccessesTriviallyDisjoint() are called when resetting AliasChain, I think it is necessary to call iterateChainSucs() on the old AliasChain, to make sure there are no SUs succeeding the old AliasChain (and are thus not in RejectMemNodes) that have a dependency on the new AliasChain. Or am I missing something? Very interested in your opinion. I am thinking of something like new_alias_chain: // Chain all possibly aliasing memory references through SU. if (AliasChain) { unsigned ChainLatency = 0; if (AliasChain->getInstr()->mayLoad()) ChainLatency = TrueMemOrderLatency; addChainDependency(AAForDep, MFI, SU, AliasChain, RejectMemNodes, ChainLatency); + + // Since TII->areMemAccessesTriviallyDisjoint() is not + // complete, iterate all chain successors of old + // AliasChain. Otherwise, if an address is stored in two + // registers, and only one of the registers is trivially + // disjoint to old AliasChain, a dependency might be missed. + // SU(0: Address A) -> old AliasChain (no edge) SU(2: Address A) + // SU(0) and SU(2) get a missed dependency. + SmallPtrSet<const SUnit*, 16> Visited; + unsigned Depth = 0; + for (SUnit::const_succ_iterator J = AliasChain->Succs.begin(), + JE = AliasChain->Succs.end(); J != JE; ++J) + if (J->isCtrl()) + iterateChainSucc (AA, MFI, SU, J->getSUnit(), + &ExitSU, &Depth, Visited); } AliasChain = SU; Best regards, Jonas Paulsson -----Original Message----- From: Hal Finkel [mailto:hfinkel at anl.gov] Sent: den 9 december 2014 19:43 To: Jonas Paulsson Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Sanjin Sijaric; Tom Stellard; Andrew Trick Subject: Re: ScheduleDAGInstrs.cpp Hi Jonas, Tom Stellard (cc'd) just recently fixed a bug whereby dependencies on the barrier chain would not be added when the SU had no underlying objects (r223717). Does this fix the problem you're highlighting? I've also cc'd Sanjin who developed the areMemAccessesTriviallyDisjoint() callback (which I'll note is a relatively-recent addition -- added on Sept. 8th of this year). In short, if r223717 does not resolve the problem, it sounds like you likely found a bug. What target are you using? I'm trying to get some idea of what your areMemAccessesTriviallyDisjoint() callback is doing. -Hal ----- Original Message -----> From: "Jonas Paulsson" <jonas.paulsson at ericsson.com> > To: hfinkel at anl.gov, llvmdev at cs.uiuc.edu > Cc: "Mattias Eriksson V" <mattias.v.eriksson at ericsson.com> > Sent: Monday, December 8, 2014 8:06:27 AM > Subject: ScheduleDAGInstrs.cpp > > Hi, > > Can anyone help me to understand the > ScheduleDAGInstrs::buildSchedGraph() method? > > > > I find the handling of AliasChain is disturbing since: > > > > 1. A new alias chain add deps to all possibly aliasing SUs, and then > clears those lists. > > 2. When AliasChain is present, the addChainDependency() method is > called, > > but the target hook areMemAccessesTriviallyDisjoint() called inside > > MIsNeedChainEdge() allows this edge to be skipped. > > > > This means that I get a case where > > > > SU0 > > > > SU1, AliasChain > > /\ > > / \ // Aliasing memory accesses > > SU2 > > > > all SUs have memory operands, but the underlying objects vectors are > empty. > > SU1 became AliasChain, added dep to SU2, and cleared the lists for > aliasing memory accesses. > > SU0 has a dependency towards SU2, but towards SU1 it is trivially > disjoint, and therefore it gets > > no dependency to SU1 and * neither to SU2 *. The AliasChain concept is > bypassed. > > > > I don’t understand how it can first be assumed that an SU becomes > AliasChain, and then an SU > > with lower NodeNum that may alias is allowed to skip its dep to the > AliasChain? > > The BarrierChain is never skipped because addPred() is called > directly, and I don’t see how it is > > safe to skip the AliasChain for aliasing SUs, I think it should always > be added? In other words, > > it may be that SU0 has a dep towards SU2 in the example, but not > towards SU1, so therefore it > > is not safe to skip this dep. > > > > Calling addPred() instead of addChainDependency() for AliasChain > should fix this, however it > > would lead to overly constrained DAGs compared to putting the edges > where they actually should > > be (SU0<-SU2 instead of SU0<-SU1). > > > > Most thankful for any help, > > > > Jonas Paulsson > > > >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Hello again, Sorry -- I think I found the problem somewhere else. I was a bit confused and missed the fact that adjustChainDeps() is called a few lines down and does just what I wanted :-) I would like to instead ask another question: Why is I->isCtrl() used in code like // Iterate over chain dependencies only. for (SUnit::const_succ_iterator I = SUb->Succs.begin(), E = SUb->Succs.end(); I != E; ++I) if (I->isCtrl()) iterateChainSucc (AA, MFI, SUa, I->getSUnit(), ExitSU, Depth, Visited); ? I thought only chain edges are relevant, and would instead use if (J->getKind() == SDep::Order) I got strange edges, from memory accesses to normal operation instructions that do not touch memory, because also anti and output edges are followed. Best regards, Jonas Paulsson -----Original Message----- From: Jonas Paulsson Sent: den 13 december 2014 14:47 To: 'Hal Finkel' Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Sanjin Sijaric; Tom Stellard; Andrew Trick Subject: RE: ScheduleDAGInstrs.cpp Hi, Thank you for your reply. I have tried Tom Stellards patch, but it did not resolve my particular problem. I have tried to expose this problem on public target without success. I am still however fearing that the scheduler is incorrect. I have a (bit unoptimized) case where the LLVM I/R code looks something like %_tmp255 = load i64** %cse6.103 store i64 %_tmp250, i64* %_tmp255 %_tmp256 = load i64** %cse6.103 %_tmp257 = load i64* %_tmp256 which means the same address is loaded twice, first to store a value, and then to load it back. The MachineInstructions to load an i64 are first pseudos and are then split post regalloc and become SU(0): addrReg0 = load *fp(-50) // Two address registers with same value SU(1): addrReg1 = load *fp(-50) SU(2): store dataReg0:lo, *addrReg0(2) // Store a register in two parts SU(3): store dataReg0:hi, *addrReg0 SU(4): dataReg1:lo = load *addrReg1(2) // Load a register in two parts SU(5): dataReg1:hi = load *addrReg1 Since the addresses are loaded from memory, the underlying Objs become empty for all MIs. SU(4) and SU(5) get pushed to PendingLoads. SU(3) becomes AliasChain, and edges are added to SU(4) and SU(5). PendingLoads is cleared. SU(2) becomes AliasChain. This time TII->areMemAccessesTriviallyDisjoint() return true, since it can see that addrReg0 with and without offset are disjoint. This was not the case between SU(3) and SU(4), since they were different address registers (with the same value). The result is that there is no edge added between SU(2) and SU(3) and not to SU(4) since PendlingLoads was cleared earlier. SU(2) and SU(4) are however a store and a load against the same address. The problem is that TII->areMemAccessesTriviallyDisjoint() is incomplete and therefore inconsistent depending on if the two memory accesses are trivial to analyze or not. Since TII->areMemAccessesTriviallyDisjoint() are called when resetting AliasChain, I think it is necessary to call iterateChainSucs() on the old AliasChain, to make sure there are no SUs succeeding the old AliasChain (and are thus not in RejectMemNodes) that have a dependency on the new AliasChain. Or am I missing something? Very interested in your opinion. I am thinking of something like new_alias_chain: // Chain all possibly aliasing memory references through SU. if (AliasChain) { unsigned ChainLatency = 0; if (AliasChain->getInstr()->mayLoad()) ChainLatency = TrueMemOrderLatency; addChainDependency(AAForDep, MFI, SU, AliasChain, RejectMemNodes, ChainLatency); + + // Since TII->areMemAccessesTriviallyDisjoint() is not + // complete, iterate all chain successors of old + // AliasChain. Otherwise, if an address is stored in two + // registers, and only one of the registers is trivially + // disjoint to old AliasChain, a dependency might be missed. + // SU(0: Address A) -> old AliasChain (no edge) SU(2: Address A) + // SU(0) and SU(2) get a missed dependency. + SmallPtrSet<const SUnit*, 16> Visited; + unsigned Depth = 0; + for (SUnit::const_succ_iterator J = AliasChain->Succs.begin(), + JE = AliasChain->Succs.end(); J != JE; ++J) + if (J->isCtrl()) + iterateChainSucc (AA, MFI, SU, J->getSUnit(), + &ExitSU, &Depth, Visited); } AliasChain = SU; Best regards, Jonas Paulsson -----Original Message----- From: Hal Finkel [mailto:hfinkel at anl.gov] Sent: den 9 december 2014 19:43 To: Jonas Paulsson Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Sanjin Sijaric; Tom Stellard; Andrew Trick Subject: Re: ScheduleDAGInstrs.cpp Hi Jonas, Tom Stellard (cc'd) just recently fixed a bug whereby dependencies on the barrier chain would not be added when the SU had no underlying objects (r223717). Does this fix the problem you're highlighting? I've also cc'd Sanjin who developed the areMemAccessesTriviallyDisjoint() callback (which I'll note is a relatively-recent addition -- added on Sept. 8th of this year). In short, if r223717 does not resolve the problem, it sounds like you likely found a bug. What target are you using? I'm trying to get some idea of what your areMemAccessesTriviallyDisjoint() callback is doing. -Hal ----- Original Message -----> From: "Jonas Paulsson" <jonas.paulsson at ericsson.com> > To: hfinkel at anl.gov, llvmdev at cs.uiuc.edu > Cc: "Mattias Eriksson V" <mattias.v.eriksson at ericsson.com> > Sent: Monday, December 8, 2014 8:06:27 AM > Subject: ScheduleDAGInstrs.cpp > > Hi, > > Can anyone help me to understand the > ScheduleDAGInstrs::buildSchedGraph() method? > > > > I find the handling of AliasChain is disturbing since: > > > > 1. A new alias chain add deps to all possibly aliasing SUs, and then > clears those lists. > > 2. When AliasChain is present, the addChainDependency() method is > called, > > but the target hook areMemAccessesTriviallyDisjoint() called inside > > MIsNeedChainEdge() allows this edge to be skipped. > > > > This means that I get a case where > > > > SU0 > > > > SU1, AliasChain > > /\ > > / \ // Aliasing memory accesses > > SU2 > > > > all SUs have memory operands, but the underlying objects vectors are > empty. > > SU1 became AliasChain, added dep to SU2, and cleared the lists for > aliasing memory accesses. > > SU0 has a dependency towards SU2, but towards SU1 it is trivially > disjoint, and therefore it gets > > no dependency to SU1 and * neither to SU2 *. The AliasChain concept is > bypassed. > > > > I don’t understand how it can first be assumed that an SU becomes > AliasChain, and then an SU > > with lower NodeNum that may alias is allowed to skip its dep to the > AliasChain? > > The BarrierChain is never skipped because addPred() is called > directly, and I don’t see how it is > > safe to skip the AliasChain for aliasing SUs, I think it should always > be added? In other words, > > it may be that SU0 has a dep towards SU2 in the example, but not > towards SU1, so therefore it > > is not safe to skip this dep. > > > > Calling addPred() instead of addChainDependency() for AliasChain > should fix this, however it > > would lead to overly constrained DAGs compared to putting the edges > where they actually should > > be (SU0<-SU2 instead of SU0<-SU1). > > > > Most thankful for any help, > > > > Jonas Paulsson > > > >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory