Ariel Ben-Yehuda via llvm-dev
2017-Nov-09 23:38 UTC
[llvm-dev] CFG normalization: avoiding `br i1 false`
Hi, I was looking at Rust programs which are poorly optimized by LLVM, and one occasional factor is that LLVM allows for long-lived `br i1 false` instructions. The problem is that if a propagation pass discovers the condition of a branch, the branch itself will not be eliminated until a SimpllfyCfg pass is reached, of which there are few in the pipeline. One example is https://github.com/rust-lang/rust/issues/44041 - the branch conditions are found by loop unrolling, but the `br i1 false` instruction leaks to codegen. Another example is https://github.com/rust-lang/rust/issues/45466. In that case, induction variable simplification discovers that an integer overflow check is unnecessary, but the remaining `br i1 false` still blocks the following LoopIdiomRecognize pass and prevents the loop from being optimized to a memset. While it is possible to have point-fixes for both problems, this might be widespread enough problem that a more general solution might be better - for example, to have some sort of canonicalization that automatically removes these branches in some situations (this requires things such as the dominator tree to be rebuilt, so it shouldn't be done literally everywhere, but it can be done fairly often). What are your opinions? Thanks, - Ariel
Davide Italiano via llvm-dev
2017-Nov-09 23:53 UTC
[llvm-dev] CFG normalization: avoiding `br i1 false`
On Thu, Nov 9, 2017 at 3:38 PM, Ariel Ben-Yehuda via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hi, > > I was looking at Rust programs which are poorly optimized by LLVM, and > one occasional factor is that LLVM allows for long-lived `br i1 false` > instructions. > > The problem is that if a propagation pass discovers the condition of a > branch, the branch itself will not be eliminated until a SimpllfyCfg > pass is reached, of which there are few in the pipeline. > > One example is https://github.com/rust-lang/rust/issues/44041 - the > branch conditions are found by loop unrolling, but the `br i1 false` > instruction leaks to codegen. > > Another example is https://github.com/rust-lang/rust/issues/45466. > > In that case, induction variable simplification discovers that an > integer overflow check is unnecessary, but the remaining `br i1 false` > still blocks the following LoopIdiomRecognize pass and prevents the > loop from being optimized to a memset. >SimplifyCFG is the correct pass for this kind of transformation. I don't think it's entirely unreasonable to run it more often, but the compile time cost needs to be taken in account. I'm not necessarily sympathetic to the idea of adding another canonicalization pass only for this purpose. Side note: IMHO, at this point SimplifyCFG is doing even more than it should (including, e.g. sinking of variable from "almost empty" BBs, i.e. BBs which have only a single instruction & terminator, if I recall correctly). If anything, I'd rather split the sinking logic and other operations to a different pass, rather than moving simplifying proven unconditional branches elsewhere :) Also, FWIW, there has been a recent effort to split SimplifyCFG in multiple variants (with several cl::opt flags associated to enable specific transformations).> While it is possible to have point-fixes for both problems, this might > be widespread enough problem that a more general solution might be > better - for example, to have some sort of canonicalization that > automatically removes these branches in some situations (this requires > things such as the dominator tree to be rebuilt, so it shouldn't be > done literally everywhere, but it can be done fairly often). >Now that LLVM has an incremental dominator API it should be more feasible to run SimplifyCFG more times without recomputing the dominator all the time. I haven't checked whether there are other expensive analyses that need to be preserved. Thanks, -- Davide "There are no solved problems; there are only problems that are more or less solved" -- Henri Poincare
Ariel Ben-Yehuda via llvm-dev
2017-Nov-15 13:15 UTC
[llvm-dev] CFG normalization: avoiding `br i1 false`
> I'm not necessarily sympathetic to the idea of adding another canonicalization pass only for this purpose.The problem is that as you said, SimplifyCfg does all sorts of stuff, and I suspect is not the fastest pass in the world. Also, in the case that annoys me, there is an LCSSA pass in the middle, and I suspect it would be a better idea to only do the LCSSA etc. transform again if no changes were made, which I suspect is the most common case. On Fri, Nov 10, 2017 at 1:53 AM, Davide Italiano <davide at freebsd.org> wrote:> On Thu, Nov 9, 2017 at 3:38 PM, Ariel Ben-Yehuda via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Hi, >> >> I was looking at Rust programs which are poorly optimized by LLVM, and >> one occasional factor is that LLVM allows for long-lived `br i1 false` >> instructions. >> >> The problem is that if a propagation pass discovers the condition of a >> branch, the branch itself will not be eliminated until a SimpllfyCfg >> pass is reached, of which there are few in the pipeline. >> >> One example is https://github.com/rust-lang/rust/issues/44041 - the >> branch conditions are found by loop unrolling, but the `br i1 false` >> instruction leaks to codegen. >> >> Another example is https://github.com/rust-lang/rust/issues/45466. >> >> In that case, induction variable simplification discovers that an >> integer overflow check is unnecessary, but the remaining `br i1 false` >> still blocks the following LoopIdiomRecognize pass and prevents the >> loop from being optimized to a memset. >> > > SimplifyCFG is the correct pass for this kind of transformation. I > don't think it's entirely unreasonable to run it more often, but the > compile time cost needs to be taken in account. > > I'm not necessarily sympathetic to the idea of adding another canonicalization > pass only for this purpose. > > Side note: > IMHO, at this point SimplifyCFG is doing even more than it should > (including, e.g. sinking of variable from "almost empty" BBs, i.e. BBs > which have only a single instruction & terminator, if I recall > correctly). If anything, I'd rather split the sinking logic and other operations > to a different pass, rather than moving simplifying proven unconditional > branches elsewhere :) > > > Also, FWIW, there has been a recent effort to split SimplifyCFG in multiple > variants (with several cl::opt flags associated to enable specific > transformations). > > >> While it is possible to have point-fixes for both problems, this might >> be widespread enough problem that a more general solution might be >> better - for example, to have some sort of canonicalization that >> automatically removes these branches in some situations (this requires >> things such as the dominator tree to be rebuilt, so it shouldn't be >> done literally everywhere, but it can be done fairly often). >> > > Now that LLVM has an incremental dominator API it should be more > feasible to run SimplifyCFG more times without recomputing the > dominator all the time. I haven't checked whether there are other > expensive analyses that need to be preserved. > > Thanks, > > -- > Davide > > "There are no solved problems; there are only problems that are more > or less solved" -- Henri Poincare