Ariel Ben-Yehuda via llvm-dev
2017-Nov-15 13:15 UTC
[llvm-dev] CFG normalization: avoiding `br i1 false`
> I'm not necessarily sympathetic to the idea of adding another canonicalization pass only for this purpose.The problem is that as you said, SimplifyCfg does all sorts of stuff, and I suspect is not the fastest pass in the world. Also, in the case that annoys me, there is an LCSSA pass in the middle, and I suspect it would be a better idea to only do the LCSSA etc. transform again if no changes were made, which I suspect is the most common case. On Fri, Nov 10, 2017 at 1:53 AM, Davide Italiano <davide at freebsd.org> wrote:> On Thu, Nov 9, 2017 at 3:38 PM, Ariel Ben-Yehuda via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Hi, >> >> I was looking at Rust programs which are poorly optimized by LLVM, and >> one occasional factor is that LLVM allows for long-lived `br i1 false` >> instructions. >> >> The problem is that if a propagation pass discovers the condition of a >> branch, the branch itself will not be eliminated until a SimpllfyCfg >> pass is reached, of which there are few in the pipeline. >> >> One example is https://github.com/rust-lang/rust/issues/44041 - the >> branch conditions are found by loop unrolling, but the `br i1 false` >> instruction leaks to codegen. >> >> Another example is https://github.com/rust-lang/rust/issues/45466. >> >> In that case, induction variable simplification discovers that an >> integer overflow check is unnecessary, but the remaining `br i1 false` >> still blocks the following LoopIdiomRecognize pass and prevents the >> loop from being optimized to a memset. >> > > SimplifyCFG is the correct pass for this kind of transformation. I > don't think it's entirely unreasonable to run it more often, but the > compile time cost needs to be taken in account. > > I'm not necessarily sympathetic to the idea of adding another canonicalization > pass only for this purpose. > > Side note: > IMHO, at this point SimplifyCFG is doing even more than it should > (including, e.g. sinking of variable from "almost empty" BBs, i.e. BBs > which have only a single instruction & terminator, if I recall > correctly). If anything, I'd rather split the sinking logic and other operations > to a different pass, rather than moving simplifying proven unconditional > branches elsewhere :) > > > Also, FWIW, there has been a recent effort to split SimplifyCFG in multiple > variants (with several cl::opt flags associated to enable specific > transformations). > > >> While it is possible to have point-fixes for both problems, this might >> be widespread enough problem that a more general solution might be >> better - for example, to have some sort of canonicalization that >> automatically removes these branches in some situations (this requires >> things such as the dominator tree to be rebuilt, so it shouldn't be >> done literally everywhere, but it can be done fairly often). >> > > Now that LLVM has an incremental dominator API it should be more > feasible to run SimplifyCFG more times without recomputing the > dominator all the time. I haven't checked whether there are other > expensive analyses that need to be preserved. > > Thanks, > > -- > Davide > > "There are no solved problems; there are only problems that are more > or less solved" -- Henri Poincare
Davide Italiano via llvm-dev
2017-Nov-15 18:03 UTC
[llvm-dev] CFG normalization: avoiding `br i1 false`
On Wed, Nov 15, 2017 at 5:15 AM, Ariel Ben-Yehuda <ariel.byd at gmail.com> wrote:>> I'm not necessarily sympathetic to the idea of adding another canonicalization pass only for this purpose. > > > The problem is that as you said, SimplifyCfg does all sorts of stuff, > and I suspect is not the fastest pass in the world. >I've not seen it showing up in a profile myself, although I'm not sure whether it will be true in the future. Sinking, e.g., could be moved in a separate pass (in fact, there's one already, disabled by default).> Also, in the case that annoys me, there is an LCSSA pass in the > middle, and I suspect it would be a better idea to only do the LCSSA > etc. transform again if no changes were made, which I suspect is the > most common case. >LCSSA should be linear. Currently it's not because it uses the SSA updater which computes dominance frontiers, so it might get O(N^3). The alternative SSAupdater proposed solves this problem using the novel algorithm from Braun et al. linked in the review https://reviews.llvm.org/D28934 It still has the problem that if leaves redundant phis around at the beginning of the block, e.g. %patatino1 = phi [ %tinky, %somebb, %winky, %someotherbb ] %patatino2 = phi [ %tinky, %somebb, %winky, %someotherbb ] The current SSA updater pays a (sometimes high) cost to perform this deduplication/removal (actually, checks whether there's already a PHI, and if not inserts one). -- Davide
Philip Reames via llvm-dev
2017-Nov-29 17:48 UTC
[llvm-dev] CFG normalization: avoiding `br i1 false`
There's already a LoopSimplifyCFG which is a loop-pass (and thus can iterate with other loop passes) and eliminates trivial branches. Having a simple pass which just strips unreachable blocks and converts conditional branches to unconditional ones while updating appropriate analyzes (LoopInfo, DomTree, LCSSA, etc..) seems very reasonable. This could also be a utility function called from the end of other passes. The hard bit is the analysis preservation. A good place to copy from might be the recent loop-unswitch work Chandler did. Philip On 11/15/2017 10:03 AM, Davide Italiano via llvm-dev wrote:> On Wed, Nov 15, 2017 at 5:15 AM, Ariel Ben-Yehuda <ariel.byd at gmail.com> wrote: >>> I'm not necessarily sympathetic to the idea of adding another canonicalization pass only for this purpose. >> >> The problem is that as you said, SimplifyCfg does all sorts of stuff, >> and I suspect is not the fastest pass in the world. >> > I've not seen it showing up in a profile myself, although I'm not sure > whether it will be true in the future. > Sinking, e.g., could be moved in a separate pass (in fact, there's one > already, disabled by default). > >> Also, in the case that annoys me, there is an LCSSA pass in the >> middle, and I suspect it would be a better idea to only do the LCSSA >> etc. transform again if no changes were made, which I suspect is the >> most common case. >> > LCSSA should be linear. Currently it's not because it uses the SSA > updater which computes dominance frontiers, so it might get O(N^3). > The alternative SSAupdater proposed solves this problem using the > novel algorithm from Braun et al. linked in the review > https://reviews.llvm.org/D28934 > It still has the problem that if leaves redundant phis around at the > beginning of the block, e.g. > > %patatino1 = phi [ %tinky, %somebb, %winky, %someotherbb ] > %patatino2 = phi [ %tinky, %somebb, %winky, %someotherbb ] > > The current SSA updater pays a (sometimes high) cost to perform this > deduplication/removal (actually, checks whether there's already a PHI, > and if not inserts one). > > -- > Davide > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev