Jonas Paulsson
2015-Jan-30  10:23 UTC
[LLVMdev] [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph().
Hi,
I have revisited the issue in buildSchedGraph() I talked about previously, and
attached a few patches. The first tries to fix the issue, and the other two try
to illustrate associated issues, emerged from applying it.
Is it OK to commit the first patch?
    [PATCH] Bugfix for missed dependency from store to load in
buildSchedGraph().
    Bugfix for missed dependency from store to load in buildSchedGraph().
    Background: When handling underlying objects for a store, the vector
    of previous mem uses, mapped to the same Value, is afterwards cleared
    (regardless of ThisMayAlias). This means that during handling of the
    next store using the same Value, adjustChainDeps() must be called,
    otherwise a dependency might be missed.
    For example, three spill/reload (NonAliasing) memory accesses using
    the same Value 'a', with different offsets:
        SU(2): store  @a
        SU(1): store  @a, Offset:1
        SU(0): load   @a
    In this case we have:
    * SU(1) does not need a dep against SU(0). Therefore,SU(0) ends up in
      RejectMemNodes and is removed from the mem-uses list (AliasMemUses
      or NonAliasMemUses), as this list is cleared.
    * SU(2) needs a dep against SU(0). Therefore, SU(2) must check
      RejectMemNodes by calling adjustChainDeps().
    Previously, for store SUs, adjustChainDeps() was only called if
    MayAlias was true, missing the S(2) to S(0) dependency in the case
    above. The fix is to always call adjustChainDeps(), regardless of
    MayAlias, since this applies both for AliasMemUses and
    NonAliasMemUses.
    No test case found for any in-tree target.
Jonas Paulsson
-----Original Message-----
From: Jonas Paulsson 
Sent: den 8 januari 2015 16:01
To: 'Hal Finkel'; Sanjin Sijaric
Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei Larin;
Andrew Trick; llvm-commits at cs.uiuc.edu
Subject: [PATCH] Call adjustChainDeps() always when handling a store.
Hi,
here is a patch on a problem I described in my previous mail (number 1) for your
review. In short, I think adjustChainDeps() must be called regardless of
MayAlias while handling a store.
This problem was found on an out-of-tree target, and no test case can be
provided for an official target.
Let me know if I can commit this patch, or if you have any comments.
/Jonas
PS. Sanjin, regarding TII->areMemAccessesTriviallyDisjoint(), my target does
the same things as you mentioned: register / offset based analysis, both
pre/post RA. It is adding very little improvement over AA, it seems. I don't
have any failing test cases right now, so it may be that this patch actually
fixed my problem. However, right now it is confusing to have the whole algorithm
written around memory operands and at the same time allow register / offsets
analyzis in the midst of things. It would be good to rewrite that part like you
explained. What are your current thoughts about this?
-----Original Message-----
From: Hal Finkel [mailto:hfinkel at anl.gov]
Sent: den 23 december 2014 18:11
To: Sanjin Sijaric
Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei Larin;
Jonas Paulsson; Andrew Trick
Subject: Re: ScheduleDAGInstrs.cpp
----- Original Message -----> From: "Sanjin Sijaric" <ssijaric at codeaurora.org>
> To: "Jonas Paulsson" <jonas.paulsson at ericsson.com>,
"Andrew Trick"
> <atrick at apple.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Mattias
Eriksson V"
> <mattias.v.eriksson at ericsson.com>, llvmdev at cs.uiuc.edu,
"Tom Stellard"
> <thomas.stellard at amd.com>, "Sergei Larin" <slarin at
codeaurora.org>
> Sent: Friday, December 19, 2014 1:51:44 PM
> Subject: RE: ScheduleDAGInstrs.cpp
> 
> Hi Jonas,
> 
> How is your target implementing areMemAccessesTriviallyDisjoint?  The 
> callback is there so that we don't get into the situation where the 
> call to isIdentifiedObject (which is called from isUnsafeMemoryObject 
> from MIsNeedChainEdge) results in the edge being added between two 
> memory locations that the target can easily prove are different (e.g 
> based on base register + index + offset, etc).
> 
> Are you seeing the problem during pre-RA scheduling or post-RA 
> scheduling?
> 
> I think we may be able to get rid of areMemAccessesTriviallyDisjoint, 
> and let the AA code in MIsNeedChainEdgee handle it.  This works for me 
> if I don't call isIdentifiedObject from isUnsafeMemoryObject if we are 
> using AA.  Currently, the aliasing code won't get a chance to run in 
> some cases as isUnsafeMemoryObject returns true, resulting in
> the edge being added.   Maybe someone can shed some light as to why
> the call to isIdentifedMemoryObject in isUnsafeMemoryObject is needed 
> if we are using AA.
I think that it might not be needed. Here's my thinking:
There is a call to getUnderlyingObjectsForInstr, and that needs to be there
because of how the calling code groups the resulting SUs by underlying object.
Specifically, ScheduleDAGInstrs::buildSchedGraph first gets the underlying
objects for a value, and then only calls addChainDependency on SUs that share
one of these underlying objects. The reason why we insist on having an
identified underlying object is so that we know we're not being
non-uniformly limited by the depth constraint. I mean the following. Assume I
have two globals (@a and @b), and I have a third value formed using a
combination of them: %c = select i1 %something, @a, @b -- so I have
 SU(0): underlying objects: @a
 SU(1): underlying objects: @b
 SU(2): underlying objects: @a and @b
now imagine that our getUnderlyingObjectsForInstr had a trivial depth limit, and
we did not check that the underlying objects returned were identified objects,
then we might have:
 SU(0): underlying objects: @a
 SU(1): underlying objects: @b
 SU(2): underlying objects: %c
and we might never give AA the chance to conclude that %c and @a or @b might
alias (because addChainDependency might never be called on SU(0) <-> SU(2)
because @a != %c).
To prevent this problem, we clear the underlying objects list in
getUnderlyingObjectsForInstr if any of the underlying objects are not identified
(so that we can fall back to the conservative behavior).
Now in isUnsafeMemoryObject we have the same check, so that when
addChainDependency is called on an SU with a cleared underlying objects list
(which we did specifically to get conservative behavior), we always get that
conservative behavior. However, this check seems unnecessary (except, perhaps,
as noted in the next paragraph). When not using AA, by the time we get to
MIsNeedChainEdge, we're already destined to return true from
MIsNeedChainEdge (except for load/load edges). When using AA, we might not
return true (depending on what AA returns), but the AA check is specific to the
two instructions being queried, and AA understands how to handle
selects/phis/etc. in a reasonable way.
Nevertheless, I'm somewhat worried about a subtlety here: I believe that we
cannot fail to addChainDependency on the AliasChain because AA is smarter about
underlying objects because the AliasChain actually can represent many other
dependencies. So maybe when we call addChainDependency with SU on the
AliasChain, we need to always pass nullptr for AA.
 -Hal
> 
> Thanks,
> Sanjin
> 
> -----Original Message-----
> From: Jonas Paulsson [mailto:jonas.paulsson at ericsson.com]
> Sent: Friday, December 19, 2014 6:51 AM
> To: Andrew Trick
> Cc: Hal Finkel; Mattias Eriksson V; llvmdev at cs.uiuc.edu; Sanjin 
> Sijaric; Tom Stellard; Sergei Larin
> Subject: ScheduleDAGInstrs.cpp
> 
> Hi,
> 
> I write again regarding buildSchedGraph(), as I am still not happy 
> about things there.
> 
> I have found at least two examples which do not work out:
> 
> 1)
> 
> SU(2) Store "Value A"
> 
> SU(1) Store "Value A"
> 
> SU(0) Load  "Value A"
> 
> If MIsNeedChainEdge() returns false for SU(0) and SU(1), SU(0) is 
> inserted into RejectedMemNodes and removed from its MemUses SU list, 
> as this list is cleared. Therefore SU(2) must be handled with 
> adjustChainDeps(), because it needs an edge from SU(0).
> For some reason adjustChainDeps() was only called for may-aliasing 
> stores. I think this is wrong, as a store will clear the MemUses SU 
> list also in the non-aliasing case.
> 
> If MIsNeedChainEdge() returns true for SU(0) and SU(1), what happens 
> if MIsNeedChainEdge() returns false between SU(2) and SU(1)? The 
> dependency against SU(0) will not be checked, because it is not in 
> RejectMemNodes, nor in the MemUses SU list.
> 
> 2)
> 
> SU(2) Load  "Value A"
> 
> SU(1) Store "Value A"
> 
> SU(0) Store "Value A"
> 
> If not using alias analysis, then the MemDefs list is cleared after
> *maybe* having inserted an edge from SU(0) to SU(1) with 
> addChainDependency(). If there was not an edge inserted, then SU(2) 
> must get a chance to check its deps against SU(0) by use of 
> adjustChainDeps(). Again, this is only done if MayAlias is true.
> 
> I suspect therefore that areMemAccessesTriviallyDisjoint() is 
> currently used in a very dangerous way, since it might get lucky in 
> just one of two (equivalent) queries, and a dependency might be missed 
> later on as seen above.
> 
> Or am I missing something here?
> 
> Senjin, I am working on an out-of-tree target. The problem I had was 
> the first scenario in #1, and I fixed it by moving the
> adjustChainDeps() call to be called also in the non-aliasing case.
> 
> Another question I have is regarding adjustChainDeps(). If
> MIsNeedChainEdge() returns true, is it still then always necessary to 
> continue recursively with successors? If this is a latency issue, 
> perhaps it would be ok to do "continue" after adding the pred if
the
> latency is non-zero? Right now this seems to be always zero, or one in 
> the store-load case per the comment on line 856.
> 
> On line 947: "A store to a specific PseudoSourceValue" should
probably
> say "... a specific Value / PseudoSourceValue", right? The same
goes
> for line 1029.
> 
> /Jonas
> 
> 
> 
> 
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Bugfix-for-missed-dependency-from-store-to-load-in-b.patch
Type: application/octet-stream
Size: 2354 bytes
Desc: 0001-Bugfix-for-missed-dependency-from-store-to-load-in-b.patch
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150130/59aecbd5/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Assert-added-to-trigger-on-edges-between-NonAlias-an.patch
Type: application/octet-stream
Size: 5040 bytes
Desc: 0002-Assert-added-to-trigger-on-edges-between-NonAlias-an.patch
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150130/59aecbd5/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-Initial-Illustratory-fix-for-avoiding-NonAlias-Alias.patch
Type: application/octet-stream
Size: 4152 bytes
Desc: 0003-Initial-Illustratory-fix-for-avoiding-NonAlias-Alias.patch
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150130/59aecbd5/attachment-0002.obj>
Hal Finkel
2015-Feb-07  01:06 UTC
[LLVMdev] [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph().
----- Original Message -----> From: "Jonas Paulsson" <jonas.paulsson at ericsson.com> > To: "Hal Finkel" <hfinkel at anl.gov>, "Sanjin Sijaric" <ssijaric at codeaurora.org>, "Andrew Trick (atrick at apple.com)" > <atrick at apple.com>, llvm-commits at cs.uiuc.edu > Cc: "Mattias Eriksson V" <mattias.v.eriksson at ericsson.com>, llvmdev at cs.uiuc.edu, "Tom Stellard" > <thomas.stellard at amd.com>, "Sergei Larin" <slarin at codeaurora.org>, "Patrik Hägglund H" > <patrik.h.hagglund at ericsson.com> > Sent: Friday, January 30, 2015 4:23:50 AM > Subject: [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph(). > > Hi, > > I have revisited the issue in buildSchedGraph() I talked about > previously, and attached a few patches. The first tries to fix the > issue, and the other two try to illustrate associated issues, > emerged from applying it. > > Is it OK to commit the first patch? > > [PATCH] Bugfix for missed dependency from store to load in > buildSchedGraph(). > > Bugfix for missed dependency from store to load in > buildSchedGraph(). > > Background: When handling underlying objects for a store, the > vector > of previous mem uses, mapped to the same Value, is afterwards > cleared > (regardless of ThisMayAlias). This means that during handling of > the > next store using the same Value, adjustChainDeps() must be > called, > otherwise a dependency might be missed.Yes, LGTM. However, I'd like an extensive comment added to that part of the code explaining what is going on. That function has far too little in the way of commentary (RejectMemNodes itself even has no description). It is crucial that we change this. As far as I'm concerned, a comment that is as long and explanatory as this patch description is perfectly appropriate. [I say this admitting to having touched this function myself]. Also, I'd really prefer you use reviews.llvm.org for these patches in the future, and upload them with full context (http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface), it makes it much easier to see what is going on. Regarding the other two patches, can you please explain what is going on? It is not obvious to me from the patches. Thanks again, Hal> > For example, three spill/reload (NonAliasing) memory accesses > using > the same Value 'a', with different offsets: > > SU(2): store @a > SU(1): store @a, Offset:1 > SU(0): load @a > > In this case we have: > > * SU(1) does not need a dep against SU(0). Therefore,SU(0) ends > up in > RejectMemNodes and is removed from the mem-uses list > (AliasMemUses > or NonAliasMemUses), as this list is cleared. > > * SU(2) needs a dep against SU(0). Therefore, SU(2) must check > RejectMemNodes by calling adjustChainDeps(). > > Previously, for store SUs, adjustChainDeps() was only called if > MayAlias was true, missing the S(2) to S(0) dependency in the > case > above. The fix is to always call adjustChainDeps(), regardless of > MayAlias, since this applies both for AliasMemUses and > NonAliasMemUses. > > No test case found for any in-tree target. > > Jonas Paulsson > > -----Original Message----- > From: Jonas Paulsson > Sent: den 8 januari 2015 16:01 > To: 'Hal Finkel'; Sanjin Sijaric > Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei > Larin; Andrew Trick; llvm-commits at cs.uiuc.edu > Subject: [PATCH] Call adjustChainDeps() always when handling a store. > > Hi, > > here is a patch on a problem I described in my previous mail (number > 1) for your review. In short, I think adjustChainDeps() must be > called regardless of MayAlias while handling a store. > > This problem was found on an out-of-tree target, and no test case can > be provided for an official target. > > Let me know if I can commit this patch, or if you have any comments. > > /Jonas > > PS. Sanjin, regarding TII->areMemAccessesTriviallyDisjoint(), my > target does the same things as you mentioned: register / offset > based analysis, both pre/post RA. It is adding very little > improvement over AA, it seems. I don't have any failing test cases > right now, so it may be that this patch actually fixed my problem. > However, right now it is confusing to have the whole algorithm > written around memory operands and at the same time allow register / > offsets analyzis in the midst of things. It would be good to rewrite > that part like you explained. What are your current thoughts about > this? > > -----Original Message----- > From: Hal Finkel [mailto:hfinkel at anl.gov] > Sent: den 23 december 2014 18:11 > To: Sanjin Sijaric > Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei > Larin; Jonas Paulsson; Andrew Trick > Subject: Re: ScheduleDAGInstrs.cpp > > ----- Original Message ----- > > From: "Sanjin Sijaric" <ssijaric at codeaurora.org> > > To: "Jonas Paulsson" <jonas.paulsson at ericsson.com>, "Andrew Trick" > > <atrick at apple.com> > > Cc: "Hal Finkel" <hfinkel at anl.gov>, "Mattias Eriksson V" > > <mattias.v.eriksson at ericsson.com>, llvmdev at cs.uiuc.edu, "Tom > > Stellard" > > <thomas.stellard at amd.com>, "Sergei Larin" <slarin at codeaurora.org> > > Sent: Friday, December 19, 2014 1:51:44 PM > > Subject: RE: ScheduleDAGInstrs.cpp > > > > Hi Jonas, > > > > How is your target implementing areMemAccessesTriviallyDisjoint? > > The > > callback is there so that we don't get into the situation where the > > call to isIdentifiedObject (which is called from > > isUnsafeMemoryObject > > from MIsNeedChainEdge) results in the edge being added between two > > memory locations that the target can easily prove are different > > (e.g > > based on base register + index + offset, etc). > > > > Are you seeing the problem during pre-RA scheduling or post-RA > > scheduling? > > > > I think we may be able to get rid of > > areMemAccessesTriviallyDisjoint, > > and let the AA code in MIsNeedChainEdgee handle it. This works for > > me > > if I don't call isIdentifiedObject from isUnsafeMemoryObject if we > > are > > using AA. Currently, the aliasing code won't get a chance to run > > in > > some cases as isUnsafeMemoryObject returns true, resulting in > > the edge being added. Maybe someone can shed some light as to why > > the call to isIdentifedMemoryObject in isUnsafeMemoryObject is > > needed > > if we are using AA. > > I think that it might not be needed. Here's my thinking: > > There is a call to getUnderlyingObjectsForInstr, and that needs to be > there because of how the calling code groups the resulting SUs by > underlying object. Specifically, ScheduleDAGInstrs::buildSchedGraph > first gets the underlying objects for a value, and then only calls > addChainDependency on SUs that share one of these underlying > objects. The reason why we insist on having an identified underlying > object is so that we know we're not being non-uniformly limited by > the depth constraint. I mean the following. Assume I have two > globals (@a and @b), and I have a third value formed using a > combination of them: %c = select i1 %something, @a, @b -- so I have > > SU(0): underlying objects: @a > SU(1): underlying objects: @b > SU(2): underlying objects: @a and @b > > now imagine that our getUnderlyingObjectsForInstr had a trivial depth > limit, and we did not check that the underlying objects returned > were identified objects, then we might have: > > > SU(0): underlying objects: @a > SU(1): underlying objects: @b > SU(2): underlying objects: %c > > and we might never give AA the chance to conclude that %c and @a or > @b might alias (because addChainDependency might never be called on > SU(0) <-> SU(2) because @a != %c). > > To prevent this problem, we clear the underlying objects list in > getUnderlyingObjectsForInstr if any of the underlying objects are > not identified (so that we can fall back to the conservative > behavior). > > Now in isUnsafeMemoryObject we have the same check, so that when > addChainDependency is called on an SU with a cleared underlying > objects list (which we did specifically to get conservative > behavior), we always get that conservative behavior. However, this > check seems unnecessary (except, perhaps, as noted in the next > paragraph). When not using AA, by the time we get to > MIsNeedChainEdge, we're already destined to return true from > MIsNeedChainEdge (except for load/load edges). When using AA, we > might not return true (depending on what AA returns), but the AA > check is specific to the two instructions being queried, and AA > understands how to handle selects/phis/etc. in a reasonable way. > > Nevertheless, I'm somewhat worried about a subtlety here: I believe > that we cannot fail to addChainDependency on the AliasChain because > AA is smarter about underlying objects because the AliasChain > actually can represent many other dependencies. So maybe when we > call addChainDependency with SU on the AliasChain, we need to always > pass nullptr for AA. > > -Hal > > > > > Thanks, > > Sanjin > > > > -----Original Message----- > > From: Jonas Paulsson [mailto:jonas.paulsson at ericsson.com] > > Sent: Friday, December 19, 2014 6:51 AM > > To: Andrew Trick > > Cc: Hal Finkel; Mattias Eriksson V; llvmdev at cs.uiuc.edu; Sanjin > > Sijaric; Tom Stellard; Sergei Larin > > Subject: ScheduleDAGInstrs.cpp > > > > Hi, > > > > I write again regarding buildSchedGraph(), as I am still not happy > > about things there. > > > > I have found at least two examples which do not work out: > > > > 1) > > > > SU(2) Store "Value A" > > > > SU(1) Store "Value A" > > > > SU(0) Load "Value A" > > > > If MIsNeedChainEdge() returns false for SU(0) and SU(1), SU(0) is > > inserted into RejectedMemNodes and removed from its MemUses SU > > list, > > as this list is cleared. Therefore SU(2) must be handled with > > adjustChainDeps(), because it needs an edge from SU(0). > > For some reason adjustChainDeps() was only called for may-aliasing > > stores. I think this is wrong, as a store will clear the MemUses SU > > list also in the non-aliasing case. > > > > If MIsNeedChainEdge() returns true for SU(0) and SU(1), what > > happens > > if MIsNeedChainEdge() returns false between SU(2) and SU(1)? The > > dependency against SU(0) will not be checked, because it is not in > > RejectMemNodes, nor in the MemUses SU list. > > > > 2) > > > > SU(2) Load "Value A" > > > > SU(1) Store "Value A" > > > > SU(0) Store "Value A" > > > > If not using alias analysis, then the MemDefs list is cleared after > > *maybe* having inserted an edge from SU(0) to SU(1) with > > addChainDependency(). If there was not an edge inserted, then SU(2) > > must get a chance to check its deps against SU(0) by use of > > adjustChainDeps(). Again, this is only done if MayAlias is true. > > > > I suspect therefore that areMemAccessesTriviallyDisjoint() is > > currently used in a very dangerous way, since it might get lucky in > > just one of two (equivalent) queries, and a dependency might be > > missed > > later on as seen above. > > > > Or am I missing something here? > > > > Senjin, I am working on an out-of-tree target. The problem I had > > was > > the first scenario in #1, and I fixed it by moving the > > adjustChainDeps() call to be called also in the non-aliasing case. > > > > Another question I have is regarding adjustChainDeps(). If > > MIsNeedChainEdge() returns true, is it still then always necessary > > to > > continue recursively with successors? If this is a latency issue, > > perhaps it would be ok to do "continue" after adding the pred if > > the > > latency is non-zero? Right now this seems to be always zero, or one > > in > > the store-load case per the comment on line 856. > > > > On line 947: "A store to a specific PseudoSourceValue" should > > probably > > say "... a specific Value / PseudoSourceValue", right? The same > > goes > > for line 1029. > > > > /Jonas > > > > > > > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Jonas Paulsson
2015-Feb-10  14:54 UTC
[LLVMdev] [PATCH] Bugfix for missed dependency from store to load in buildSchedGraph().
Hi,
I have committed the patch now (svn id 228686).
Regarding the commenting you requested, I attach a patch. Feel free to make
changes.
I found it difficult to explain what the code does in isolated places, and thus
kept my
commenting quite short. This makes me feel like the code needs a bit of
refactorization
to make it more simple and understandable.
Looking at the possibility of refactorization / redesign, I wonder what are the
main strong
points of this implementation right now, in your opinion? The mapping to
underlying objects
looks nice to me, but I am not sure I understand why to go through the trouble
of clearing
lists and using a set of RejectMemNodes with adjustChainDeps(). It is very
complex code, and
I wonder what is gained in the end here. Does anyone have any thoughts about it?
Is it to handle
huge regions with tons of memory accesses well?
Regarding the other two patches:
The bugfix I just commited makes sure that adjustChainDeps() is called on an SU
regardless of
MayAlias. It occoured to me that RejectMemNodes contains a mix of SUs from both
Alias and
NonAlias sets, and that the separation of the two domains is lost. I became
worried
that some performance might therefore be lost because of my bugfix.
Since I was not sure how big this impact might be, I made a fix for it quickly
just to illustrate
what the problem is (one of the previously attached patches). Basically, when an
SU's underlying
objects are analyzed, the results are remembered by putting the SU into one or
both of the (added)
sets AliasMemUseDefMIs and NonAliasMemUseDefMIs. Later, MIsNeedChainEdge() can
return
false, if they were originally in different domains:
  // A NonAliasing node cannot alias an AliasingNode. This is the case
  // where MIa had only non-aliasing underlying objects and MIb had
  // only aliasing underlying objects, or vice versa.
  if ((AliasMemUseDefMIs.count(MIa) && !NonAliasMemUseDefMIs.count(MIa)
&&
       !AliasMemUseDefMIs.count(MIb) && NonAliasMemUseDefMIs.count(MIb))
||
      (!AliasMemUseDefMIs.count(MIa) && NonAliasMemUseDefMIs.count(MIa)
&&
       AliasMemUseDefMIs.count(MIb) &&
!NonAliasMemUseDefMIs.count(MIb)))
    return false;
This should then be an improvement to MIsNeedChainEdge(), although I am not sure
if there is
a more elegant solution, which I invite you to suggest.
I also supplied a patch with a lot of ugly code that had the original purpose of
finding a test case
on an in-tree-target for the bug I had discovered. You could ignore that patch
now, I think.
( An assert would trigger if there was - with the bugfix in place - an edge
inserted in the NonAlias case,
which would not have been done before the bugfix.)
I attach the patch for MIsNeedChainEdge() again,  and would appreciate some
feedback.
/Jonas Paulsson
-----Original Message-----
From: Hal Finkel [mailto:hfinkel at anl.gov] 
Sent: den 7 februari 2015 02:07
To: Jonas Paulsson
Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei Larin;
Patrik Hägglund H; Sanjin Sijaric; Andrew Trick (atrick at apple.com);
llvm-commits at cs.uiuc.edu
Subject: Re: [PATCH] Bugfix for missed dependency from store to load in
buildSchedGraph().
----- Original Message -----> From: "Jonas Paulsson" <jonas.paulsson at ericsson.com>
> To: "Hal Finkel" <hfinkel at anl.gov>, "Sanjin
Sijaric" <ssijaric at codeaurora.org>, "Andrew Trick (atrick at
apple.com)"
> <atrick at apple.com>, llvm-commits at cs.uiuc.edu
> Cc: "Mattias Eriksson V" <mattias.v.eriksson at
ericsson.com>, llvmdev at cs.uiuc.edu, "Tom Stellard"
> <thomas.stellard at amd.com>, "Sergei Larin" <slarin at
codeaurora.org>, "Patrik Hägglund H"
> <patrik.h.hagglund at ericsson.com>
> Sent: Friday, January 30, 2015 4:23:50 AM
> Subject: [PATCH] Bugfix for missed dependency from store to load in
buildSchedGraph().
> 
> Hi,
> 
> I have revisited the issue in buildSchedGraph() I talked about 
> previously, and attached a few patches. The first tries to fix the 
> issue, and the other two try to illustrate associated issues, emerged 
> from applying it.
> 
> Is it OK to commit the first patch?
> 
>     [PATCH] Bugfix for missed dependency from store to load in
>     buildSchedGraph().
> 
>     Bugfix for missed dependency from store to load in
>     buildSchedGraph().
> 
>     Background: When handling underlying objects for a store, the
>     vector
>     of previous mem uses, mapped to the same Value, is afterwards
>     cleared
>     (regardless of ThisMayAlias). This means that during handling of
>     the
>     next store using the same Value, adjustChainDeps() must be
>     called,
>     otherwise a dependency might be missed.
Yes, LGTM. However, I'd like an extensive comment added to that part of the
code explaining what is going on. That function has far too little in the way of
commentary (RejectMemNodes itself even has no description). It is crucial that
we change this. As far as I'm concerned, a comment that is as long and
explanatory as this patch description is perfectly appropriate. [I say this
admitting to having touched this function myself].
Also, I'd really prefer you use reviews.llvm.org for these patches in the
future, and upload them with full context
(http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface),
it makes it much easier to see what is going on.
Regarding the other two patches, can you please explain what is going on? It is
not obvious to me from the patches.
Thanks again,
Hal
> 
>     For example, three spill/reload (NonAliasing) memory accesses
>     using
>     the same Value 'a', with different offsets:
> 
>         SU(2): store  @a
>         SU(1): store  @a, Offset:1
>         SU(0): load   @a
> 
>     In this case we have:
> 
>     * SU(1) does not need a dep against SU(0). Therefore,SU(0) ends
>     up in
>       RejectMemNodes and is removed from the mem-uses list
>       (AliasMemUses
>       or NonAliasMemUses), as this list is cleared.
> 
>     * SU(2) needs a dep against SU(0). Therefore, SU(2) must check
>       RejectMemNodes by calling adjustChainDeps().
> 
>     Previously, for store SUs, adjustChainDeps() was only called if
>     MayAlias was true, missing the S(2) to S(0) dependency in the
>     case
>     above. The fix is to always call adjustChainDeps(), regardless of
>     MayAlias, since this applies both for AliasMemUses and
>     NonAliasMemUses.
> 
>     No test case found for any in-tree target.
> 
> Jonas Paulsson
> 
> -----Original Message-----
> From: Jonas Paulsson
> Sent: den 8 januari 2015 16:01
> To: 'Hal Finkel'; Sanjin Sijaric
> Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei 
> Larin; Andrew Trick; llvm-commits at cs.uiuc.edu
> Subject: [PATCH] Call adjustChainDeps() always when handling a store.
> 
> Hi,
> 
> here is a patch on a problem I described in my previous mail (number
> 1) for your review. In short, I think adjustChainDeps() must be called 
> regardless of MayAlias while handling a store.
> 
> This problem was found on an out-of-tree target, and no test case can 
> be provided for an official target.
> 
> Let me know if I can commit this patch, or if you have any comments.
> 
> /Jonas
> 
> PS. Sanjin, regarding TII->areMemAccessesTriviallyDisjoint(), my 
> target does the same things as you mentioned: register / offset based 
> analysis, both pre/post RA. It is adding very little improvement over 
> AA, it seems. I don't have any failing test cases right now, so it may 
> be that this patch actually fixed my problem.
> However, right now it is confusing to have the whole algorithm written 
> around memory operands and at the same time allow register / offsets 
> analyzis in the midst of things. It would be good to rewrite that part 
> like you explained. What are your current thoughts about this?
> 
> -----Original Message-----
> From: Hal Finkel [mailto:hfinkel at anl.gov]
> Sent: den 23 december 2014 18:11
> To: Sanjin Sijaric
> Cc: Mattias Eriksson V; llvmdev at cs.uiuc.edu; Tom Stellard; Sergei 
> Larin; Jonas Paulsson; Andrew Trick
> Subject: Re: ScheduleDAGInstrs.cpp
> 
> ----- Original Message -----
> > From: "Sanjin Sijaric" <ssijaric at codeaurora.org>
> > To: "Jonas Paulsson" <jonas.paulsson at ericsson.com>,
"Andrew Trick"
> > <atrick at apple.com>
> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "Mattias
Eriksson V"
> > <mattias.v.eriksson at ericsson.com>, llvmdev at cs.uiuc.edu,
"Tom
> > Stellard"
> > <thomas.stellard at amd.com>, "Sergei Larin"
<slarin at codeaurora.org>
> > Sent: Friday, December 19, 2014 1:51:44 PM
> > Subject: RE: ScheduleDAGInstrs.cpp
> > 
> > Hi Jonas,
> > 
> > How is your target implementing areMemAccessesTriviallyDisjoint?
> >  The
> > callback is there so that we don't get into the situation where
the
> > call to isIdentifiedObject (which is called from 
> > isUnsafeMemoryObject from MIsNeedChainEdge) results in the edge 
> > being added between two memory locations that the target can easily 
> > prove are different (e.g based on base register + index + offset, 
> > etc).
> > 
> > Are you seeing the problem during pre-RA scheduling or post-RA 
> > scheduling?
> > 
> > I think we may be able to get rid of 
> > areMemAccessesTriviallyDisjoint, and let the AA code in 
> > MIsNeedChainEdgee handle it.  This works for me if I don't call 
> > isIdentifiedObject from isUnsafeMemoryObject if we are using AA.  
> > Currently, the aliasing code won't get a chance to run in some
cases
> > as isUnsafeMemoryObject returns true, resulting in
> > the edge being added.   Maybe someone can shed some light as to why
> > the call to isIdentifedMemoryObject in isUnsafeMemoryObject is 
> > needed if we are using AA.
> 
> I think that it might not be needed. Here's my thinking:
> 
> There is a call to getUnderlyingObjectsForInstr, and that needs to be 
> there because of how the calling code groups the resulting SUs by 
> underlying object. Specifically, ScheduleDAGInstrs::buildSchedGraph
> first gets the underlying objects for a value, and then only calls 
> addChainDependency on SUs that share one of these underlying objects. 
> The reason why we insist on having an identified underlying object is 
> so that we know we're not being non-uniformly limited by the depth 
> constraint. I mean the following. Assume I have two globals (@a and 
> @b), and I have a third value formed using a combination of them: %c = 
> select i1 %something, @a, @b -- so I have
> 
>  SU(0): underlying objects: @a
>  SU(1): underlying objects: @b
>  SU(2): underlying objects: @a and @b
> 
> now imagine that our getUnderlyingObjectsForInstr had a trivial depth 
> limit, and we did not check that the underlying objects returned were 
> identified objects, then we might have:
> 
> 
>  SU(0): underlying objects: @a
>  SU(1): underlying objects: @b
>  SU(2): underlying objects: %c
> 
> and we might never give AA the chance to conclude that %c and @a or @b 
> might alias (because addChainDependency might never be called on
> SU(0) <-> SU(2) because @a != %c).
> 
> To prevent this problem, we clear the underlying objects list in 
> getUnderlyingObjectsForInstr if any of the underlying objects are not 
> identified (so that we can fall back to the conservative behavior).
> 
> Now in isUnsafeMemoryObject we have the same check, so that when 
> addChainDependency is called on an SU with a cleared underlying 
> objects list (which we did specifically to get conservative behavior), 
> we always get that conservative behavior. However, this check seems 
> unnecessary (except, perhaps, as noted in the next paragraph). When 
> not using AA, by the time we get to MIsNeedChainEdge, we're already 
> destined to return true from MIsNeedChainEdge (except for load/load 
> edges). When using AA, we might not return true (depending on what AA 
> returns), but the AA check is specific to the two instructions being 
> queried, and AA understands how to handle selects/phis/etc. in a 
> reasonable way.
> 
> Nevertheless, I'm somewhat worried about a subtlety here: I believe 
> that we cannot fail to addChainDependency on the AliasChain because AA 
> is smarter about underlying objects because the AliasChain actually 
> can represent many other dependencies. So maybe when we call 
> addChainDependency with SU on the AliasChain, we need to always pass 
> nullptr for AA.
> 
>  -Hal
> 
> > 
> > Thanks,
> > Sanjin
> > 
> > -----Original Message-----
> > From: Jonas Paulsson [mailto:jonas.paulsson at ericsson.com]
> > Sent: Friday, December 19, 2014 6:51 AM
> > To: Andrew Trick
> > Cc: Hal Finkel; Mattias Eriksson V; llvmdev at cs.uiuc.edu; Sanjin 
> > Sijaric; Tom Stellard; Sergei Larin
> > Subject: ScheduleDAGInstrs.cpp
> > 
> > Hi,
> > 
> > I write again regarding buildSchedGraph(), as I am still not happy 
> > about things there.
> > 
> > I have found at least two examples which do not work out:
> > 
> > 1)
> > 
> > SU(2) Store "Value A"
> > 
> > SU(1) Store "Value A"
> > 
> > SU(0) Load  "Value A"
> > 
> > If MIsNeedChainEdge() returns false for SU(0) and SU(1), SU(0) is 
> > inserted into RejectedMemNodes and removed from its MemUses SU list, 
> > as this list is cleared. Therefore SU(2) must be handled with 
> > adjustChainDeps(), because it needs an edge from SU(0).
> > For some reason adjustChainDeps() was only called for may-aliasing 
> > stores. I think this is wrong, as a store will clear the MemUses SU 
> > list also in the non-aliasing case.
> > 
> > If MIsNeedChainEdge() returns true for SU(0) and SU(1), what happens 
> > if MIsNeedChainEdge() returns false between SU(2) and SU(1)? The 
> > dependency against SU(0) will not be checked, because it is not in 
> > RejectMemNodes, nor in the MemUses SU list.
> > 
> > 2)
> > 
> > SU(2) Load  "Value A"
> > 
> > SU(1) Store "Value A"
> > 
> > SU(0) Store "Value A"
> > 
> > If not using alias analysis, then the MemDefs list is cleared after
> > *maybe* having inserted an edge from SU(0) to SU(1) with 
> > addChainDependency(). If there was not an edge inserted, then SU(2) 
> > must get a chance to check its deps against SU(0) by use of 
> > adjustChainDeps(). Again, this is only done if MayAlias is true.
> > 
> > I suspect therefore that areMemAccessesTriviallyDisjoint() is 
> > currently used in a very dangerous way, since it might get lucky in 
> > just one of two (equivalent) queries, and a dependency might be 
> > missed later on as seen above.
> > 
> > Or am I missing something here?
> > 
> > Senjin, I am working on an out-of-tree target. The problem I had was 
> > the first scenario in #1, and I fixed it by moving the
> > adjustChainDeps() call to be called also in the non-aliasing case.
> > 
> > Another question I have is regarding adjustChainDeps(). If
> > MIsNeedChainEdge() returns true, is it still then always necessary 
> > to continue recursively with successors? If this is a latency issue, 
> > perhaps it would be ok to do "continue" after adding the
pred if the
> > latency is non-zero? Right now this seems to be always zero, or one 
> > in the store-load case per the comment on line 856.
> > 
> > On line 947: "A store to a specific PseudoSourceValue"
should
> > probably say "... a specific Value / PseudoSourceValue",
right? The
> > same goes for line 1029.
> > 
> > /Jonas
> > 
> > 
> > 
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Two-comments-in-ScheduleDAGInstrs.cpp.patch
Type: application/octet-stream
Size: 1481 bytes
Desc: 0001-Two-comments-in-ScheduleDAGInstrs.cpp.patch
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150210/26b7e2df/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Initial-Illustratory-fix-for-avoiding-NonAlias-Alias.patch
Type: application/octet-stream
Size: 4152 bytes
Desc: 0002-Initial-Illustratory-fix-for-avoiding-NonAlias-Alias.patch
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150210/26b7e2df/attachment-0001.obj>