thr3ads.net - llvm dev - [llvm-dev] CFG normalization: avoiding `br i1 false` [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Ariel Ben-Yehuda via llvm-dev

2017-Nov-15 13:15 UTC

[llvm-dev] CFG normalization: avoiding `br i1 false`

> I'm not necessarily sympathetic to the idea of adding another
canonicalization pass only for this purpose.

The problem is that as you said, SimplifyCfg does all sorts of stuff,
and I suspect is not the fastest pass in the world.

Also, in the case that annoys me, there is an LCSSA pass in the
middle, and I suspect it would be a better idea to only do the LCSSA
etc. transform again if no changes were made, which I suspect is the
most common case.

On Fri, Nov 10, 2017 at 1:53 AM, Davide Italiano <davide at freebsd.org>
wrote:> On Thu, Nov 9, 2017 at 3:38 PM, Ariel Ben-Yehuda via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Hi,
>>
>> I was looking at Rust programs which are poorly optimized by LLVM, and
>> one occasional factor is that LLVM allows for long-lived `br i1 false`
>> instructions.
>>
>> The problem is that if a propagation pass discovers the condition of a
>> branch, the branch itself will not be eliminated until a SimpllfyCfg
>> pass is reached, of which there are few in the pipeline.
>>
>> One example is https://github.com/rust-lang/rust/issues/44041 - the
>> branch conditions are found by loop unrolling, but the `br i1 false`
>> instruction leaks to codegen.
>>
>> Another example is https://github.com/rust-lang/rust/issues/45466.
>>
>> In that case, induction variable simplification discovers that an
>> integer overflow check is unnecessary, but the remaining `br i1 false`
>> still blocks the following LoopIdiomRecognize pass and prevents the
>> loop from being optimized to a memset.
>>
>
> SimplifyCFG is the correct pass for this kind of transformation. I
> don't think it's entirely unreasonable to run it more often, but
the
> compile time cost needs to be taken in account.
>
> I'm not necessarily sympathetic to the idea of adding another
canonicalization
> pass only for this purpose.
>
> Side note:
> IMHO, at this point SimplifyCFG is doing even more than it should
> (including, e.g. sinking of variable from "almost empty" BBs,
i.e. BBs
> which have only a single instruction & terminator, if I recall
> correctly). If anything, I'd rather split the sinking logic and other
operations
> to a different pass, rather than moving simplifying proven unconditional
> branches elsewhere :)
>
>
> Also, FWIW, there has been a recent effort to split SimplifyCFG in multiple
> variants (with several cl::opt flags associated to enable specific
> transformations).
>
>
>> While it is possible to have point-fixes for both problems, this might
>> be widespread enough problem that a more general solution might be
>> better - for example, to have some sort of canonicalization that
>> automatically removes these branches in some situations (this requires
>> things such as the dominator tree to be rebuilt, so it shouldn't be
>> done literally everywhere, but it can be done fairly often).
>>
>
> Now that LLVM has an incremental dominator API it should be more
> feasible to run SimplifyCFG more times without recomputing the
> dominator all the time. I haven't checked whether there are other
> expensive analyses that need to be preserved.
>
> Thanks,
>
> --
> Davide
>
> "There are no solved problems; there are only problems that are more
> or less solved" -- Henri Poincare

Davide Italiano via llvm-dev

2017-Nov-15 18:03 UTC

head link

[llvm-dev] CFG normalization: avoiding `br i1 false`

On Wed, Nov 15, 2017 at 5:15 AM, Ariel Ben-Yehuda <ariel.byd at gmail.com>
wrote:>> I'm not necessarily sympathetic to the idea of adding another
canonicalization pass only for this purpose.
>
>
> The problem is that as you said, SimplifyCfg does all sorts of stuff,
> and I suspect is not the fastest pass in the world.
>
I've not seen it showing up in a profile myself, although I'm not sure
whether it will be true in the future.
Sinking, e.g., could be moved in a separate pass (in fact, there's one
already, disabled by default).
> Also, in the case that annoys me, there is an LCSSA pass in the
> middle, and I suspect it would be a better idea to only do the LCSSA
> etc. transform again if no changes were made, which I suspect is the
> most common case.
>
LCSSA should be linear. Currently it's not because it uses the SSA
updater which computes dominance frontiers, so it might get O(N^3).
The alternative SSAupdater proposed solves this problem using the
novel algorithm from Braun et al. linked in the review
https://reviews.llvm.org/D28934
It still has the problem that if leaves redundant phis around at the
beginning of the block, e.g.

%patatino1 = phi [ %tinky, %somebb, %winky, %someotherbb ]
%patatino2 = phi [ %tinky, %somebb, %winky, %someotherbb ]

The current SSA updater pays a (sometimes high) cost to perform this
deduplication/removal (actually, checks whether there's already a PHI,
and if not inserts one).

--
Davide

Philip Reames via llvm-dev

2017-Nov-29 17:48 UTC

head link

[llvm-dev] CFG normalization: avoiding `br i1 false`

There's already a LoopSimplifyCFG which is a loop-pass (and thus can 
iterate with other loop passes) and eliminates trivial branches.  Having 
a simple pass which just strips unreachable blocks and converts 
conditional branches to unconditional ones while updating appropriate 
analyzes (LoopInfo, DomTree, LCSSA, etc..) seems very reasonable.  This 
could also be a utility function called from the end of other passes.  
The hard bit is the analysis preservation.  A good place to copy from 
might be the recent loop-unswitch work Chandler did.

Philip


On 11/15/2017 10:03 AM, Davide Italiano via llvm-dev
wrote:> On Wed, Nov 15, 2017 at 5:15 AM, Ariel Ben-Yehuda <ariel.byd at
gmail.com> wrote:
>>> I'm not necessarily sympathetic to the idea of adding another
canonicalization pass only for this purpose.
>>
>> The problem is that as you said, SimplifyCfg does all sorts of stuff,
>> and I suspect is not the fastest pass in the world.
>>
> I've not seen it showing up in a profile myself, although I'm not
sure
> whether it will be true in the future.
> Sinking, e.g., could be moved in a separate pass (in fact, there's one
> already, disabled by default).
>
>> Also, in the case that annoys me, there is an LCSSA pass in the
>> middle, and I suspect it would be a better idea to only do the LCSSA
>> etc. transform again if no changes were made, which I suspect is the
>> most common case.
>>
> LCSSA should be linear. Currently it's not because it uses the SSA
> updater which computes dominance frontiers, so it might get O(N^3).
> The alternative SSAupdater proposed solves this problem using the
> novel algorithm from Braun et al. linked in the review
> https://reviews.llvm.org/D28934
> It still has the problem that if leaves redundant phis around at the
> beginning of the block, e.g.
>
> %patatino1 = phi [ %tinky, %somebb, %winky, %someotherbb ]
> %patatino2 = phi [ %tinky, %somebb, %winky, %someotherbb ]
>
> The current SSA updater pays a (sometimes high) cost to perform this
> deduplication/removal (actually, checks whether there's already a PHI,
> and if not inserts one).
>
> --
> Davide
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Nov 2017 - CFG normalization: avoiding `br i1 false`

[llvm-dev] CFG normalization: avoiding `br i1 false`

[llvm-dev] CFG normalization: avoiding `br i1 false`

[llvm-dev] CFG normalization: avoiding `br i1 false`

Possibly Parallel Threads