Jonas Paulsson
2015-Feb-10 14:54 UTC
[LLVMdev] [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph().
Hi, I have committed the patch now (svn id 228686). Regarding the commenting you requested, I attach a patch. Feel free to make changes. I found it difficult to explain what the code does in isolated places, and thus kept my commenting quite short. This makes me feel like the code needs a bit of refactorization to make it more simple and understandable. Looking at the possibility of refactorization / redesign, I wonder what are the main strong points of this implementation right now, in your opinion? The mapping to underlying objects looks nice to me, but I am not sure I understand why to go through the trouble of clearing lists and using a set of RejectMemNodes with adjustChainDeps(). It is very complex code, and I wonder what is gained in the end here. Does anyone have any thoughts about it? Is it to handle huge regions with tons of memory accesses well? Regarding the other two patches: The bugfix I just commited makes sure that adjustChainDeps() is called on an SU regardless of MayAlias. It occoured to me that RejectMemNodes contains a mix of SUs from both Alias and NonAlias sets, and that the separation of the two domains is lost. I became worried that some performance might therefore be lost because of my bugfix. Since I was not sure how big this impact might be, I made a fix for it quickly just to illustrate what the problem is (one of the previously attached patches). Basically, when an SU's underlying objects are analyzed, the results are remembered by putting the SU into one or both of the (added) sets AliasMemUseDefMIs and NonAliasMemUseDefMIs. Later, MIsNeedChainEdge() can return false, if they were originally in different domains: // A NonAliasing node cannot alias an AliasingNode. This is the case // where MIa had only non-aliasing underlying objects and MIb had // only aliasing underlying objects, or vice versa. if ((AliasMemUseDefMIs.count(MIa) && !NonAliasMemUseDefMIs.count(MIa) && !AliasMemUseDefMIs.count(MIb) && NonAliasMemUseDefMIs.count(MIb)) || (!AliasMemUseDefMIs.count(MIa) && NonAliasMemUseDefMIs.count(MIa) && AliasMemUseDefMIs.count(MIb) && !NonAliasMemUseDefMIs.count(MIb))) return false; This should then be an improvement to MIsNeedChainEdge(), although I am not sure if there is a more elegant solution, which I invite you to suggest. I also supplied a patch with a lot of ugly code that had the original purpose of finding a test case on an in-tree-target for the bug I had discovered. You could ignore that patch now, I think. ( An assert would trigger if there was - with the bugfix in place - an edge inserted in the NonAlias case, which would not have been done before the bugfix.) I attach the patch for MIsNeedChainEdge() again, and would appreciate some feedback. /Jonas Paulsson -----Original Message----- From: Hal Finkel [mailto:hfinkel at anl.gov] Sent: den 7 februari 2015 02:07 To: Jonas Paulsson Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei Larin; Patrik Hägglund H; Sanjin Sijaric; Andrew Trick (atrick at apple.com); llvm-commits at cs.uiuc.edu Subject: Re: [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph(). ----- Original Message -----> From: "Jonas Paulsson" <jonas.paulsson at ericsson.com> > To: "Hal Finkel" <hfinkel at anl.gov>, "Sanjin Sijaric" <ssijaric at codeaurora.org>, "Andrew Trick (atrick at apple.com)" > <atrick at apple.com>, llvm-commits at cs.uiuc.edu > Cc: "Mattias Eriksson V" <mattias.v.eriksson at ericsson.com>, llvmdev at cs.uiuc.edu, "Tom Stellard" > <thomas.stellard at amd.com>, "Sergei Larin" <slarin at codeaurora.org>, "Patrik Hägglund H" > <patrik.h.hagglund at ericsson.com> > Sent: Friday, January 30, 2015 4:23:50 AM > Subject: [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph(). > > Hi, > > I have revisited the issue in buildSchedGraph() I talked about > previously, and attached a few patches. The first tries to fix the > issue, and the other two try to illustrate associated issues, emerged > from applying it. > > Is it OK to commit the first patch? > > [PATCH] Bugfix for missed dependency from store to load in > buildSchedGraph(). > > Bugfix for missed dependency from store to load in > buildSchedGraph(). > > Background: When handling underlying objects for a store, the > vector > of previous mem uses, mapped to the same Value, is afterwards > cleared > (regardless of ThisMayAlias). This means that during handling of > the > next store using the same Value, adjustChainDeps() must be > called, > otherwise a dependency might be missed.Yes, LGTM. However, I'd like an extensive comment added to that part of the code explaining what is going on. That function has far too little in the way of commentary (RejectMemNodes itself even has no description). It is crucial that we change this. As far as I'm concerned, a comment that is as long and explanatory as this patch description is perfectly appropriate. [I say this admitting to having touched this function myself]. Also, I'd really prefer you use reviews.llvm.org for these patches in the future, and upload them with full context (http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface), it makes it much easier to see what is going on. Regarding the other two patches, can you please explain what is going on? It is not obvious to me from the patches. Thanks again, Hal> > For example, three spill/reload (NonAliasing) memory accesses > using > the same Value 'a', with different offsets: > > SU(2): store @a > SU(1): store @a, Offset:1 > SU(0): load @a > > In this case we have: > > * SU(1) does not need a dep against SU(0). Therefore,SU(0) ends > up in > RejectMemNodes and is removed from the mem-uses list > (AliasMemUses > or NonAliasMemUses), as this list is cleared. > > * SU(2) needs a dep against SU(0). Therefore, SU(2) must check > RejectMemNodes by calling adjustChainDeps(). > > Previously, for store SUs, adjustChainDeps() was only called if > MayAlias was true, missing the S(2) to S(0) dependency in the > case > above. The fix is to always call adjustChainDeps(), regardless of > MayAlias, since this applies both for AliasMemUses and > NonAliasMemUses. > > No test case found for any in-tree target. > > Jonas Paulsson > > -----Original Message----- > From: Jonas Paulsson > Sent: den 8 januari 2015 16:01 > To: 'Hal Finkel'; Sanjin Sijaric > Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei > Larin; Andrew Trick; llvm-commits at cs.uiuc.edu > Subject: [PATCH] Call adjustChainDeps() always when handling a store. > > Hi, > > here is a patch on a problem I described in my previous mail (number > 1) for your review. In short, I think adjustChainDeps() must be called > regardless of MayAlias while handling a store. > > This problem was found on an out-of-tree target, and no test case can > be provided for an official target. > > Let me know if I can commit this patch, or if you have any comments. > > /Jonas > > PS. Sanjin, regarding TII->areMemAccessesTriviallyDisjoint(), my > target does the same things as you mentioned: register / offset based > analysis, both pre/post RA. It is adding very little improvement over > AA, it seems. I don't have any failing test cases right now, so it may > be that this patch actually fixed my problem. > However, right now it is confusing to have the whole algorithm written > around memory operands and at the same time allow register / offsets > analyzis in the midst of things. It would be good to rewrite that part > like you explained. What are your current thoughts about this? > > -----Original Message----- > From: Hal Finkel [mailto:hfinkel at anl.gov] > Sent: den 23 december 2014 18:11 > To: Sanjin Sijaric > Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei > Larin; Jonas Paulsson; Andrew Trick > Subject: Re: ScheduleDAGInstrs.cpp > > ----- Original Message ----- > > From: "Sanjin Sijaric" <ssijaric at codeaurora.org> > > To: "Jonas Paulsson" <jonas.paulsson at ericsson.com>, "Andrew Trick" > > <atrick at apple.com> > > Cc: "Hal Finkel" <hfinkel at anl.gov>, "Mattias Eriksson V" > > <mattias.v.eriksson at ericsson.com>, llvmdev at cs.uiuc.edu, "Tom > > Stellard" > > <thomas.stellard at amd.com>, "Sergei Larin" <slarin at codeaurora.org> > > Sent: Friday, December 19, 2014 1:51:44 PM > > Subject: RE: ScheduleDAGInstrs.cpp > > > > Hi Jonas, > > > > How is your target implementing areMemAccessesTriviallyDisjoint? > > The > > callback is there so that we don't get into the situation where the > > call to isIdentifiedObject (which is called from > > isUnsafeMemoryObject from MIsNeedChainEdge) results in the edge > > being added between two memory locations that the target can easily > > prove are different (e.g based on base register + index + offset, > > etc). > > > > Are you seeing the problem during pre-RA scheduling or post-RA > > scheduling? > > > > I think we may be able to get rid of > > areMemAccessesTriviallyDisjoint, and let the AA code in > > MIsNeedChainEdgee handle it. This works for me if I don't call > > isIdentifiedObject from isUnsafeMemoryObject if we are using AA. > > Currently, the aliasing code won't get a chance to run in some cases > > as isUnsafeMemoryObject returns true, resulting in > > the edge being added. Maybe someone can shed some light as to why > > the call to isIdentifedMemoryObject in isUnsafeMemoryObject is > > needed if we are using AA. > > I think that it might not be needed. Here's my thinking: > > There is a call to getUnderlyingObjectsForInstr, and that needs to be > there because of how the calling code groups the resulting SUs by > underlying object. Specifically, ScheduleDAGInstrs::buildSchedGraph > first gets the underlying objects for a value, and then only calls > addChainDependency on SUs that share one of these underlying objects. > The reason why we insist on having an identified underlying object is > so that we know we're not being non-uniformly limited by the depth > constraint. I mean the following. Assume I have two globals (@a and > @b), and I have a third value formed using a combination of them: %c = > select i1 %something, @a, @b -- so I have > > SU(0): underlying objects: @a > SU(1): underlying objects: @b > SU(2): underlying objects: @a and @b > > now imagine that our getUnderlyingObjectsForInstr had a trivial depth > limit, and we did not check that the underlying objects returned were > identified objects, then we might have: > > > SU(0): underlying objects: @a > SU(1): underlying objects: @b > SU(2): underlying objects: %c > > and we might never give AA the chance to conclude that %c and @a or @b > might alias (because addChainDependency might never be called on > SU(0) <-> SU(2) because @a != %c). > > To prevent this problem, we clear the underlying objects list in > getUnderlyingObjectsForInstr if any of the underlying objects are not > identified (so that we can fall back to the conservative behavior). > > Now in isUnsafeMemoryObject we have the same check, so that when > addChainDependency is called on an SU with a cleared underlying > objects list (which we did specifically to get conservative behavior), > we always get that conservative behavior. However, this check seems > unnecessary (except, perhaps, as noted in the next paragraph). When > not using AA, by the time we get to MIsNeedChainEdge, we're already > destined to return true from MIsNeedChainEdge (except for load/load > edges). When using AA, we might not return true (depending on what AA > returns), but the AA check is specific to the two instructions being > queried, and AA understands how to handle selects/phis/etc. in a > reasonable way. > > Nevertheless, I'm somewhat worried about a subtlety here: I believe > that we cannot fail to addChainDependency on the AliasChain because AA > is smarter about underlying objects because the AliasChain actually > can represent many other dependencies. So maybe when we call > addChainDependency with SU on the AliasChain, we need to always pass > nullptr for AA. > > -Hal > > > > > Thanks, > > Sanjin > > > > -----Original Message----- > > From: Jonas Paulsson [mailto:jonas.paulsson at ericsson.com] > > Sent: Friday, December 19, 2014 6:51 AM > > To: Andrew Trick > > Cc: Hal Finkel; Mattias Eriksson V; llvmdev at cs.uiuc.edu; Sanjin > > Sijaric; Tom Stellard; Sergei Larin > > Subject: ScheduleDAGInstrs.cpp > > > > Hi, > > > > I write again regarding buildSchedGraph(), as I am still not happy > > about things there. > > > > I have found at least two examples which do not work out: > > > > 1) > > > > SU(2) Store "Value A" > > > > SU(1) Store "Value A" > > > > SU(0) Load "Value A" > > > > If MIsNeedChainEdge() returns false for SU(0) and SU(1), SU(0) is > > inserted into RejectedMemNodes and removed from its MemUses SU list, > > as this list is cleared. Therefore SU(2) must be handled with > > adjustChainDeps(), because it needs an edge from SU(0). > > For some reason adjustChainDeps() was only called for may-aliasing > > stores. I think this is wrong, as a store will clear the MemUses SU > > list also in the non-aliasing case. > > > > If MIsNeedChainEdge() returns true for SU(0) and SU(1), what happens > > if MIsNeedChainEdge() returns false between SU(2) and SU(1)? The > > dependency against SU(0) will not be checked, because it is not in > > RejectMemNodes, nor in the MemUses SU list. > > > > 2) > > > > SU(2) Load "Value A" > > > > SU(1) Store "Value A" > > > > SU(0) Store "Value A" > > > > If not using alias analysis, then the MemDefs list is cleared after > > *maybe* having inserted an edge from SU(0) to SU(1) with > > addChainDependency(). If there was not an edge inserted, then SU(2) > > must get a chance to check its deps against SU(0) by use of > > adjustChainDeps(). Again, this is only done if MayAlias is true. > > > > I suspect therefore that areMemAccessesTriviallyDisjoint() is > > currently used in a very dangerous way, since it might get lucky in > > just one of two (equivalent) queries, and a dependency might be > > missed later on as seen above. > > > > Or am I missing something here? > > > > Senjin, I am working on an out-of-tree target. The problem I had was > > the first scenario in #1, and I fixed it by moving the > > adjustChainDeps() call to be called also in the non-aliasing case. > > > > Another question I have is regarding adjustChainDeps(). If > > MIsNeedChainEdge() returns true, is it still then always necessary > > to continue recursively with successors? If this is a latency issue, > > perhaps it would be ok to do "continue" after adding the pred if the > > latency is non-zero? Right now this seems to be always zero, or one > > in the store-load case per the comment on line 856. > > > > On line 947: "A store to a specific PseudoSourceValue" should > > probably say "... a specific Value / PseudoSourceValue", right? The > > same goes for line 1029. > > > > /Jonas > > > > > > > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Two-comments-in-ScheduleDAGInstrs.cpp.patch Type: application/octet-stream Size: 1481 bytes Desc: 0001-Two-comments-in-ScheduleDAGInstrs.cpp.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150210/26b7e2df/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-Initial-Illustratory-fix-for-avoiding-NonAlias-Alias.patch Type: application/octet-stream Size: 4152 bytes Desc: 0002-Initial-Illustratory-fix-for-avoiding-NonAlias-Alias.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150210/26b7e2df/attachment-0001.obj>
Andrew Trick
2015-Feb-10 21:12 UTC
[LLVMdev] [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph().
> On Feb 10, 2015, at 6:54 AM, Jonas Paulsson <jonas.paulsson at ericsson.com> wrote: > > Looking at the possibility of refactorization / redesign, I wonder what are the main strong > points of this implementation right now, in your opinion? The mapping to underlying objects > looks nice to me, but I am not sure I understand why to go through the trouble of clearing > lists and using a set of RejectMemNodes with adjustChainDeps(). It is very complex code, and > I wonder what is gained in the end here. Does anyone have any thoughts about it? Is it to handle > huge regions with tons of memory accesses well?Since you’ve taken the time to understand the algorithm, and are actively benchmarking, it would be great to take advantage of the opportunity to rewrite. Feel free to modify even the core DAG builder implementation, which hasn’t really changed since inception--way before my time. The alias-aware scheduling has been around long enough that it makes sense to redesign the core DAG builder code around it, rather than bolt it on. I agree, in its current form, the AA-aware logic is too hard to follow. Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150210/538f5cb7/attachment.html>
Jonas Paulsson
2015-Feb-11 17:40 UTC
[LLVMdev] [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph().
Hi, I would be happy to give it a try :-) The fact that AA was added at a later point explains the situation a bit, as much fewer SUs should end up in RejectMemNodes without it. RejectMemNodes is bad in that it mixes all the SUs together again, after having gone through the work of separating them by analyzing their underlying objects. It is also very confusing to have two "stages" of dependency checking, first by mapping an SU to one or more Values, and then to still "check everything" that may have been missed. I would like to try to remove RejectMemNodes (and adjustChainDeps() and iterateChainSucc()) and then simply not clear the MemUses list (while handling a store). “If an SU gets a Value mapping, keep it”. If that list grows bigger than a certain limit, intelligent alias querying could stop, just as it stops now in iterateChainSucc when *Depth > 200. But it would then at least be done against the right set of SUs, not against "all SUs that sometime did not need an edge". What do you think about this, anyone? / Jonas Paulsson From: Andrew Trick [mailto:atrick at apple.com] Sent: den 10 februari 2015 22:12 To: Jonas Paulsson Cc: Hal Finkel; Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei Larin; Patrik Hägglund H; Sanjin Sijaric; llvm commits Subject: Re: [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph(). On Feb 10, 2015, at 6:54 AM, Jonas Paulsson <jonas.paulsson at ericsson.com<mailto:jonas.paulsson at ericsson.com>> wrote: Looking at the possibility of refactorization / redesign, I wonder what are the main strong points of this implementation right now, in your opinion? The mapping to underlying objects looks nice to me, but I am not sure I understand why to go through the trouble of clearing lists and using a set of RejectMemNodes with adjustChainDeps(). It is very complex code, and I wonder what is gained in the end here. Does anyone have any thoughts about it? Is it to handle huge regions with tons of memory accesses well? Since you’ve taken the time to understand the algorithm, and are actively benchmarking, it would be great to take advantage of the opportunity to rewrite. Feel free to modify even the core DAG builder implementation, which hasn’t really changed since inception--way before my time. The alias-aware scheduling has been around long enough that it makes sense to redesign the core DAG builder code around it, rather than bolt it on. I agree, in its current form, the AA-aware logic is too hard to follow. Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150211/cf789582/attachment.html>
Apparently Analagous Threads
- [LLVMdev] [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph().
- [LLVMdev] [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph().
- [LLVMdev] ScheduleDAGInstrs.cpp
- [LLVMdev] ScheduleDAGInstrs.cpp
- [LLVMdev] ScheduleDAGInstrs.cpp