Jakub (Kuba) Kuderski via llvm-dev
2021-Sep-21 15:14 UTC
[llvm-dev] DominatorTree, JumpThreading and EarlyCSE non-determinism
> > Although, that still doesn’t solve the potential problem (fault?) that the > nodes aren’t inserted in those vectors with child nodes in the first place. >> And neither does it help EarlyCSE to find the optimal order of iterating > through child nodes (to eliminate all possible PHI-nodes in my example). > I.e. the outcome from EarlyCSE would still depend on the sorting order for > the DomTree with such a solution. But maybe that just is how the algorithm > in EarlyCSE is supposed to work.Can you explain these two? I'm not very familiar with either JT or EarlyCSE and don't follow. I was thinking about something like this: - Iterate over the whole function to create a mapping BB -> idx - If a pass depends on the order of children, add a helper function like reorder(tree_node_range, bb_to_idx_map) and wrap all calls to children with that - If a pass depends on DFS numbers, use the BB -> idx map if possible, or have a helper function that fixes up dfs numbers based on the map - If it's inconvenient to implement the same helpers in multiple places in the codebase, add a function to DT that will actually change the children order based on a given comparison function On Tue, Sep 21, 2021 at 11:03 AM Björn Pettersson A < bjorn.a.pettersson at ericsson.com> wrote:> Hi Jakub, > > > > I imagine sorting the children would solve the problem with > non-determinism for the test case (just like it helps to > invalidate/recalculate the DomTree before EarlyCSE). I haven’t tested it > though (I’m not quite sure how to get the basic block order number quickly > for a BasicBlock). > > > > Although, that still doesn’t solve the potential problem (fault?) that the > nodes aren’t inserted in those vectors with child nodes in the first place. > > > > And neither does it help EarlyCSE to find the optimal order of iterating > through child nodes (to eliminate all possible PHI-nodes in my example). > I.e. the outcome from EarlyCSE would still depend on the sorting order for > the DomTree with such a solution. But maybe that just is how the algorithm > in EarlyCSE is supposed to work. > > > > /Björn > > > > *From:* Jakub (Kuba) Kuderski <kubakuderski at gmail.com> > *Sent:* den 21 september 2021 16:27 > *To:* Björn Pettersson A <bjorn.a.pettersson at ericsson.com> > *Cc:* Roman Lebedev <lebedev.ri at gmail.com>; llvm-dev < > llvm-dev at lists.llvm.org>; Johan Ringström <johan.ringstrom at ericsson.com> > *Subject:* Re: [llvm-dev] DominatorTree, JumpThreading and EarlyCSE > non-determinism > > > > Hi Björn, > > From the perspective of DomTree, node children are generally unordered. I > think a pass can be sensitive to DT order by either directly relying on > tree children order or on DFS numbers. > If that's the case, one workaround would be to re-order children based on > the block order in the parent. If that's a common enough requirement, we > could add it to DT. > > Would that solve the problem? > -Jakub > > > > On Tue, Sep 21, 2021 at 9:33 AM Björn Pettersson A via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Yes, the IR looks the same out from jump-threading for this specific input > (at least when comparing output from -print-after-all). > > It is the order of the DominatorTree updates that I suspect differ a bit > (at least according to the -debug printouts), giving slightly different > node orders in the DominatorTree. And I have no idea if JumpThreading can > be blamed for that or if it is the Lazy strategy in the DomTreeUpdater. > > I guess I'll file a PR for this. But I'm still not quite sure what the > rules are here (if both passes are wrong, or which one to blame), and what > to expect more generally in situation like this one. > > /Björn > > > -----Original Message----- > > From: Roman Lebedev <lebedev.ri at gmail.com> > > Sent: den 21 september 2021 15:15 > > To: Björn Pettersson A <bjorn.a.pettersson at ericsson.com> > > Cc: llvm-dev <llvm-dev at lists.llvm.org>; Johan Ringström > > <johan.ringstrom at ericsson.com> > > Subject: Re: [llvm-dev] DominatorTree, JumpThreading and EarlyCSE non- > > determinism > > > > Does JumpThreading produce exactly the same IR in all of the situations? > > But even if it does, this does seem like a bug. > > > > Roman > > > > On Tue, Sep 21, 2021 at 4:10 PM Björn Pettersson A via llvm-dev > > <llvm-dev at lists.llvm.org> wrote: > > > > > > Hello llvm-dev, > > > > > > Recently I've been debugging a non-determinism problem. > > > > > > I've managed to narrow it down to running: > > > > > > opt -passes='function(jump-threading,print<domtree>,early-cse)' > > > > > > > > > From debug printouts (also adding -debug to the cmd line) it looks like > > jump-threading is doing doing dominator tree updates in a > non-deterministic > > order (when comparing different runs). > > > > > > I randomly get either of these two DominatorTrees in the print<domtree> > > printouts: > > > > > > DominatorTree for function: g > > > =============================-------------------------------- > > > Inorder Dominator Tree: DFSNumbers invalid: 0 slow queries. > > > [1] %entry {4294967295,4294967295} [0] > > > [2] %for.cond1 {4294967295,4294967295} [1] > > > [3] %if.then {4294967295,4294967295} [2] > > > [4] %for.cond5.preheader {4294967295,4294967295} [3] > > > [5] %cleanup {4294967295,4294967295} [4] > > > [6] %cleanup16 {4294967295,4294967295} [5] > > > [7] %unreachable {4294967295,4294967295} [6] > > > [7] %for.end21 {4294967295,4294967295} [6] > > > [5] %for.body7 {4294967295,4294967295} [4] > > > [6] %for.inc {4294967295,4294967295} [5] > > > [5] %return {4294967295,4294967295} [4] > > > [3] %cleanup16.thread {4294967295,4294967295} [2] > > > [3] %for.inc19 {4294967295,4294967295} [2] > > > [2] %for.cond {4294967295,4294967295} [1] > > > Roots: %entry > > > > > > DominatorTree for function: g > > > =============================-------------------------------- > > > Inorder Dominator Tree: DFSNumbers invalid: 0 slow queries. > > > [1] %entry {4294967295,4294967295} [0] > > > [2] %for.cond1 {4294967295,4294967295} [1] > > > [3] %for.inc19 {4294967295,4294967295} [2] > > > [3] %if.then {4294967295,4294967295} [2] > > > [4] %for.cond5.preheader {4294967295,4294967295} [3] > > > [5] %cleanup {4294967295,4294967295} [4] > > > [6] %cleanup16 {4294967295,4294967295} [5] > > > [7] %unreachable {4294967295,4294967295} [6] > > > [7] %for.end21 {4294967295,4294967295} [6] > > > [5] %for.body7 {4294967295,4294967295} [4] > > > [6] %for.inc {4294967295,4294967295} [5] > > > [5] %return {4294967295,4294967295} [4] > > > [3] %cleanup16.thread {4294967295,4294967295} [2] > > > [2] %for.cond {4294967295,4294967295} [1] > > > Roots: %entry > > > > > > > > > I think both trees are correct, the nodes are just in a different > order. > > This can also be seen if I add invalidate<domtree> before print<domtree> > to > > force a recalculation. Then it will look like this instead (same content, > > but the level 3 nodes printed in yet another order): > > > > > > DominatorTree for function: g > > > =============================-------------------------------- > > > Inorder Dominator Tree: DFSNumbers invalid: 0 slow queries. > > > [1] %entry {4294967295,4294967295} [0] > > > [2] %for.cond1 {4294967295,4294967295} [1] > > > [3] %cleanup16.thread {4294967295,4294967295} [2] > > > [3] %for.inc19 {4294967295,4294967295} [2] > > > [3] %if.then {4294967295,4294967295} [2] > > > [4] %for.cond5.preheader {4294967295,4294967295} [3] > > > [5] %cleanup {4294967295,4294967295} [4] > > > [6] %cleanup16 {4294967295,4294967295} [5] > > > [7] %unreachable {4294967295,4294967295} [6] > > > [7] %for.end21 {4294967295,4294967295} [6] > > > [5] %return {4294967295,4294967295} [4] > > > [5] %for.body7 {4294967295,4294967295} [4] > > > [6] %for.inc {4294967295,4294967295} [5] > > > [2] %for.cond {4294967295,4294967295} [1] > > > Roots: %entry > > > > > > > > > *** Question one *** > > > Maybe it is OK from a correctness point-of-view that JumpThreading > isn't > > producing the same node order every time (given same input), even though > it > > might be a bit confusing when debugging. Or should this be seen as a bug? > > > > > > > > > Next problem/question is related to EarlyCSE. It looks like the output > > from EarlyCSE depends on the node order in the dominator tree. So > depending > > on if JumpThreading has produced the first or second of the DominatorTree > > structures above we might end up eliminating one more/less PHI node > during > > EarlyCSE. > > > > > > *** Question two *** > > > Should be seen as a bug? Or are passes in general sensitive to how > > analysis information is produced (such as the node order in a dominator > > tree) and thus being allowed to produce better/worse code depending on > such > > things? > > > > > > > > > To summarize: > > > JumpThreading is non-deterministic but produces semantically > "equivalent" > > results. > > > EarlyCSE is deterministic, but the result depends on order of nodes in > > the DominatorTree. > > > Those two things together gives a non-deterministic IR result. And I > > figure that should be seen as a bug. Just not sure if the fault is in > > DominatorTreeUpdater, JumpThreading or EarlyCSE (or if it should be seen > as > > multiple faults). > > > > > > Regards, > > > Björn > > > _______________________________________________ > > > LLVM Developers mailing list > > > llvm-dev at lists.llvm.org > > > https://protect2.fireeye.com/v1/url?k=dfb69fe9-802da6ec-dfb6df72- > > 86959e472243-b94d5bb9ef523712&q=1&e=3ff5c3d9-dda7-472c-aef6- > > a705903687cb&u=https%3A%2F%2Flists.llvm.org > <https://protect2.fireeye.com/v1/url?k=3721e336-68bad973-3721a3ad-867b36d1634c-355d724705f0f488&q=1&e=de75eae6-7545-4fa3-a900-922cb7fd03cc&u=http%3A%2F%2F2flists.llvm.org%2F> > %2Fcgi- > > bin%2Fmailman%2Flistinfo%2Fllvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <https://protect2.fireeye.com/v1/url?k=d909ae33-86929476-d909eea8-867b36d1634c-f7988c4d918d2be0&q=1&e=de75eae6-7545-4fa3-a900-922cb7fd03cc&u=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev> > > > > > -- > > Jakub Kuderski >-- Jakub Kuderski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210921/4de7f208/attachment.html>
Björn Pettersson A via llvm-dev
2021-Sep-21 20:30 UTC
[llvm-dev] DominatorTree, JumpThreading and EarlyCSE non-determinism
>> Although, that still doesn’t solve the potential problem (fault?) that the nodes aren’t inserted in those vectors with child nodes in the first place. >> And neither does it help EarlyCSE to find the optimal order of iterating through child nodes (to eliminate all possible PHI-nodes in my example). I.e. the outcome from EarlyCSE would still depend on the sorting order for the DomTree with such a solution. But maybe that just is how the algorithm in EarlyCSE is supposed to work.> Can you explain these two? I'm not very familiar with either JT or EarlyCSE and don't follow.With the first one I was thinking that there wouldn’t be any need to sort child nodes if the DomTree updates made by JumpThreading were made in a deterministic order. I.e. sorting post updating the dominator tree would only hide the actual problem, at least if the idea is that the vector with child nodes is sorted by the insertion order. After some more debugging I’ve found that the culprit here seem to be that llvm::MergeBasicBlockIntoOnlyPred (in Local.cpp) is using a SmallPtrSet for the PredsOfPredBB temporary list. And then the Updates sent to DomTreeUpdater ends up in a non-deterministic order based memory allocations. If changing from SmallPtrSet to SmallVector, then the updates will be deterministic. (I’ll prepare a patch for that!) My point/question related to EarlyCSE is that the transformation done by that pass seem to depend on the internal order of nodes inside the DominatorTree. And I wonder if that is a known fact, or if it should be considered as a fault. So at the moment, the order in which DominatorTree is updated by a previous pass could impact if EarlyCSE is doing a certain transform or not. And if for example modifying the pipeline slightly to invalidate the DominatorTree analysis before EarlyCSE, then I might get different IR in the output from EarlyCSE. Feels a bit strange (and unlucky) IMO that not only the input IR to EarlyCSE will impact the result, but also how the DominatorTree is calculated. So for EarlyCSE, it could be possible to make it less sensitive to the DominatorTree internal structure/sorting by adding some kind of sorting by basic block order when processing the dom tree nodes. But making EarlyCSE insensitive to how nodes are ordered in the DominatorTree would only make sure that the EarlyCSE transformation are applied in a given order. But apparently we get different results depending on in which order we process the nodes. And that is what I meant by talking about “optimal order”. I don’t know if EarlyCSE should run twice or something (such as processing nodes in forward+reverse order) to make sure we do not miss out on transformation that would have been applied if using a different node processing order. Is that making sense? /Björn From: Jakub (Kuba) Kuderski <kubakuderski at gmail.com> Sent: den 21 september 2021 17:14 To: Björn Pettersson A <bjorn.a.pettersson at ericsson.com> Cc: Roman Lebedev <lebedev.ri at gmail.com>; llvm-dev <llvm-dev at lists.llvm.org>; Johan Ringström <johan.ringstrom at ericsson.com> Subject: Re: [llvm-dev] DominatorTree, JumpThreading and EarlyCSE non-determinism Although, that still doesn’t solve the potential problem (fault?) that the nodes aren’t inserted in those vectors with child nodes in the first place. And neither does it help EarlyCSE to find the optimal order of iterating through child nodes (to eliminate all possible PHI-nodes in my example). I.e. the outcome from EarlyCSE would still depend on the sorting order for the DomTree with such a solution. But maybe that just is how the algorithm in EarlyCSE is supposed to work. Can you explain these two? I'm not very familiar with either JT or EarlyCSE and don't follow. I was thinking about something like this: - Iterate over the whole function to create a mapping BB -> idx - If a pass depends on the order of children, add a helper function like reorder(tree_node_range, bb_to_idx_map) and wrap all calls to children with that - If a pass depends on DFS numbers, use the BB -> idx map if possible, or have a helper function that fixes up dfs numbers based on the map - If it's inconvenient to implement the same helpers in multiple places in the codebase, add a function to DT that will actually change the children order based on a given comparison function On Tue, Sep 21, 2021 at 11:03 AM Björn Pettersson A <bjorn.a.pettersson at ericsson.com<mailto:bjorn.a.pettersson at ericsson.com>> wrote: Hi Jakub, I imagine sorting the children would solve the problem with non-determinism for the test case (just like it helps to invalidate/recalculate the DomTree before EarlyCSE). I haven’t tested it though (I’m not quite sure how to get the basic block order number quickly for a BasicBlock). Although, that still doesn’t solve the potential problem (fault?) that the nodes aren’t inserted in those vectors with child nodes in the first place. And neither does it help EarlyCSE to find the optimal order of iterating through child nodes (to eliminate all possible PHI-nodes in my example). I.e. the outcome from EarlyCSE would still depend on the sorting order for the DomTree with such a solution. But maybe that just is how the algorithm in EarlyCSE is supposed to work. /Björn From: Jakub (Kuba) Kuderski <kubakuderski at gmail.com<mailto:kubakuderski at gmail.com>> Sent: den 21 september 2021 16:27 To: Björn Pettersson A <bjorn.a.pettersson at ericsson.com<mailto:bjorn.a.pettersson at ericsson.com>> Cc: Roman Lebedev <lebedev.ri at gmail.com<mailto:lebedev.ri at gmail.com>>; llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>; Johan Ringström <johan.ringstrom at ericsson.com<mailto:johan.ringstrom at ericsson.com>> Subject: Re: [llvm-dev] DominatorTree, JumpThreading and EarlyCSE non-determinism Hi Björn, From the perspective of DomTree, node children are generally unordered. I think a pass can be sensitive to DT order by either directly relying on tree children order or on DFS numbers. If that's the case, one workaround would be to re-order children based on the block order in the parent. If that's a common enough requirement, we could add it to DT. Would that solve the problem? -Jakub On Tue, Sep 21, 2021 at 9:33 AM Björn Pettersson A via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Yes, the IR looks the same out from jump-threading for this specific input (at least when comparing output from -print-after-all). It is the order of the DominatorTree updates that I suspect differ a bit (at least according to the -debug printouts), giving slightly different node orders in the DominatorTree. And I have no idea if JumpThreading can be blamed for that or if it is the Lazy strategy in the DomTreeUpdater. I guess I'll file a PR for this. But I'm still not quite sure what the rules are here (if both passes are wrong, or which one to blame), and what to expect more generally in situation like this one. /Björn> -----Original Message----- > From: Roman Lebedev <lebedev.ri at gmail.com<mailto:lebedev.ri at gmail.com>> > Sent: den 21 september 2021 15:15 > To: Björn Pettersson A <bjorn.a.pettersson at ericsson.com<mailto:bjorn.a.pettersson at ericsson.com>> > Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>; Johan Ringström > <johan.ringstrom at ericsson.com<mailto:johan.ringstrom at ericsson.com>> > Subject: Re: [llvm-dev] DominatorTree, JumpThreading and EarlyCSE non- > determinism > > Does JumpThreading produce exactly the same IR in all of the situations? > But even if it does, this does seem like a bug. > > Roman > > On Tue, Sep 21, 2021 at 4:10 PM Björn Pettersson A via llvm-dev > <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: > > > > Hello llvm-dev, > > > > Recently I've been debugging a non-determinism problem. > > > > I've managed to narrow it down to running: > > > > opt -passes='function(jump-threading,print<domtree>,early-cse)' > > > > > > From debug printouts (also adding -debug to the cmd line) it looks like > jump-threading is doing doing dominator tree updates in a non-deterministic > order (when comparing different runs). > > > > I randomly get either of these two DominatorTrees in the print<domtree> > printouts: > > > > DominatorTree for function: g > > =============================-------------------------------- > > Inorder Dominator Tree: DFSNumbers invalid: 0 slow queries. > > [1] %entry {4294967295,4294967295} [0] > > [2] %for.cond1 {4294967295,4294967295} [1] > > [3] %if.then {4294967295,4294967295} [2] > > [4] %for.cond5.preheader {4294967295,4294967295} [3] > > [5] %cleanup {4294967295,4294967295} [4] > > [6] %cleanup16 {4294967295,4294967295} [5] > > [7] %unreachable {4294967295,4294967295} [6] > > [7] %for.end21 {4294967295,4294967295} [6] > > [5] %for.body7 {4294967295,4294967295} [4] > > [6] %for.inc {4294967295,4294967295} [5] > > [5] %return {4294967295,4294967295} [4] > > [3] %cleanup16.thread {4294967295,4294967295} [2] > > [3] %for.inc19 {4294967295,4294967295} [2] > > [2] %for.cond {4294967295,4294967295} [1] > > Roots: %entry > > > > DominatorTree for function: g > > =============================-------------------------------- > > Inorder Dominator Tree: DFSNumbers invalid: 0 slow queries. > > [1] %entry {4294967295,4294967295} [0] > > [2] %for.cond1 {4294967295,4294967295} [1] > > [3] %for.inc19 {4294967295,4294967295} [2] > > [3] %if.then {4294967295,4294967295} [2] > > [4] %for.cond5.preheader {4294967295,4294967295} [3] > > [5] %cleanup {4294967295,4294967295} [4] > > [6] %cleanup16 {4294967295,4294967295} [5] > > [7] %unreachable {4294967295,4294967295} [6] > > [7] %for.end21 {4294967295,4294967295} [6] > > [5] %for.body7 {4294967295,4294967295} [4] > > [6] %for.inc {4294967295,4294967295} [5] > > [5] %return {4294967295,4294967295} [4] > > [3] %cleanup16.thread {4294967295,4294967295} [2] > > [2] %for.cond {4294967295,4294967295} [1] > > Roots: %entry > > > > > > I think both trees are correct, the nodes are just in a different order. > This can also be seen if I add invalidate<domtree> before print<domtree> to > force a recalculation. Then it will look like this instead (same content, > but the level 3 nodes printed in yet another order): > > > > DominatorTree for function: g > > =============================-------------------------------- > > Inorder Dominator Tree: DFSNumbers invalid: 0 slow queries. > > [1] %entry {4294967295,4294967295} [0] > > [2] %for.cond1 {4294967295,4294967295} [1] > > [3] %cleanup16.thread {4294967295,4294967295} [2] > > [3] %for.inc19 {4294967295,4294967295} [2] > > [3] %if.then {4294967295,4294967295} [2] > > [4] %for.cond5.preheader {4294967295,4294967295} [3] > > [5] %cleanup {4294967295,4294967295} [4] > > [6] %cleanup16 {4294967295,4294967295} [5] > > [7] %unreachable {4294967295,4294967295} [6] > > [7] %for.end21 {4294967295,4294967295} [6] > > [5] %return {4294967295,4294967295} [4] > > [5] %for.body7 {4294967295,4294967295} [4] > > [6] %for.inc {4294967295,4294967295} [5] > > [2] %for.cond {4294967295,4294967295} [1] > > Roots: %entry > > > > > > *** Question one *** > > Maybe it is OK from a correctness point-of-view that JumpThreading isn't > producing the same node order every time (given same input), even though it > might be a bit confusing when debugging. Or should this be seen as a bug? > > > > > > Next problem/question is related to EarlyCSE. It looks like the output > from EarlyCSE depends on the node order in the dominator tree. So depending > on if JumpThreading has produced the first or second of the DominatorTree > structures above we might end up eliminating one more/less PHI node during > EarlyCSE. > > > > *** Question two *** > > Should be seen as a bug? Or are passes in general sensitive to how > analysis information is produced (such as the node order in a dominator > tree) and thus being allowed to produce better/worse code depending on such > things? > > > > > > To summarize: > > JumpThreading is non-deterministic but produces semantically "equivalent" > results. > > EarlyCSE is deterministic, but the result depends on order of nodes in > the DominatorTree. > > Those two things together gives a non-deterministic IR result. And I > figure that should be seen as a bug. Just not sure if the fault is in > DominatorTreeUpdater, JumpThreading or EarlyCSE (or if it should be seen as > multiple faults). > > > > Regards, > > Björn > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> > > https://protect2.fireeye.com/v1/url?k=dfb69fe9-802da6ec-dfb6df72- > 86959e472243-b94d5bb9ef523712&q=1&e=3ff5c3d9-dda7-472c-aef6- > a705903687cb&u=https%3A%2F%2Flists.llvm.org<https://protect2.fireeye.com/v1/url?k=3721e336-68bad973-3721a3ad-867b36d1634c-355d724705f0f488&q=1&e=de75eae6-7545-4fa3-a900-922cb7fd03cc&u=http%3A%2F%2F2flists.llvm.org%2F>%2Fcgi- > bin%2Fmailman%2Flistinfo%2Fllvm-dev_______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://protect2.fireeye.com/v1/url?k=d909ae33-86929476-d909eea8-867b36d1634c-f7988c4d918d2be0&q=1&e=de75eae6-7545-4fa3-a900-922cb7fd03cc&u=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev> -- Jakub Kuderski -- Jakub Kuderski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210921/f75181bd/attachment.html>