thr3ads.net - llvm dev - [llvm-dev] CFG normalization: avoiding `br i1 false` [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Ariel Ben-Yehuda via llvm-dev

2017-Nov-09 23:38 UTC

[llvm-dev] CFG normalization: avoiding `br i1 false`

Hi,

I was looking at Rust programs which are poorly optimized by LLVM, and
one occasional factor is that LLVM allows for long-lived `br i1 false`
instructions.

The problem is that if a propagation pass discovers the condition of a
branch, the branch itself will not be eliminated until a SimpllfyCfg
pass is reached, of which there are few in the pipeline.

One example is https://github.com/rust-lang/rust/issues/44041 - the
branch conditions are found by loop unrolling, but the `br i1 false`
instruction leaks to codegen.

Another example is https://github.com/rust-lang/rust/issues/45466.

In that case, induction variable simplification discovers that an
integer overflow check is unnecessary, but the remaining `br i1 false`
still blocks the following LoopIdiomRecognize pass and prevents the
loop from being optimized to a memset.

While it is possible to have point-fixes for both problems, this might
be widespread enough problem that a more general solution might be
better - for example, to have some sort of canonicalization that
automatically removes these branches in some situations (this requires
things such as the dominator tree to be rebuilt, so it shouldn't be
done literally everywhere, but it can be done fairly often).

What are your opinions?

Thanks,
 - Ariel

Davide Italiano via llvm-dev

2017-Nov-09 23:53 UTC

head link

[llvm-dev] CFG normalization: avoiding `br i1 false`

On Thu, Nov 9, 2017 at 3:38 PM, Ariel Ben-Yehuda via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Hi,
>
> I was looking at Rust programs which are poorly optimized by LLVM, and
> one occasional factor is that LLVM allows for long-lived `br i1 false`
> instructions.
>
> The problem is that if a propagation pass discovers the condition of a
> branch, the branch itself will not be eliminated until a SimpllfyCfg
> pass is reached, of which there are few in the pipeline.
>
> One example is https://github.com/rust-lang/rust/issues/44041 - the
> branch conditions are found by loop unrolling, but the `br i1 false`
> instruction leaks to codegen.
>
> Another example is https://github.com/rust-lang/rust/issues/45466.
>
> In that case, induction variable simplification discovers that an
> integer overflow check is unnecessary, but the remaining `br i1 false`
> still blocks the following LoopIdiomRecognize pass and prevents the
> loop from being optimized to a memset.
>
SimplifyCFG is the correct pass for this kind of transformation. I
don't think it's entirely unreasonable to run it more often, but the
compile time cost needs to be taken in account.

I'm not necessarily sympathetic to the idea of adding another
canonicalization
pass only for this purpose.

Side note:
IMHO, at this point SimplifyCFG is doing even more than it should
(including, e.g. sinking of variable from "almost empty" BBs, i.e. BBs
which have only a single instruction & terminator, if I recall
correctly). If anything, I'd rather split the sinking logic and other
operations
to a different pass, rather than moving simplifying proven unconditional
branches elsewhere :)


Also, FWIW, there has been a recent effort to split SimplifyCFG in multiple
variants (with several cl::opt flags associated to enable specific
transformations).

> While it is possible to have point-fixes for both problems, this might
> be widespread enough problem that a more general solution might be
> better - for example, to have some sort of canonicalization that
> automatically removes these branches in some situations (this requires
> things such as the dominator tree to be rebuilt, so it shouldn't be
> done literally everywhere, but it can be done fairly often).
>
Now that LLVM has an incremental dominator API it should be more
feasible to run SimplifyCFG more times without recomputing the
dominator all the time. I haven't checked whether there are other
expensive analyses that need to be preserved.

Thanks,

-- 
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare

Ariel Ben-Yehuda via llvm-dev

2017-Nov-15 13:15 UTC

head link

[llvm-dev] CFG normalization: avoiding `br i1 false`

> I'm not necessarily sympathetic to the idea of adding another
canonicalization pass only for this purpose.

The problem is that as you said, SimplifyCfg does all sorts of stuff,
and I suspect is not the fastest pass in the world.

Also, in the case that annoys me, there is an LCSSA pass in the
middle, and I suspect it would be a better idea to only do the LCSSA
etc. transform again if no changes were made, which I suspect is the
most common case.

On Fri, Nov 10, 2017 at 1:53 AM, Davide Italiano <davide at freebsd.org>
wrote:> On Thu, Nov 9, 2017 at 3:38 PM, Ariel Ben-Yehuda via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Hi,
>>
>> I was looking at Rust programs which are poorly optimized by LLVM, and
>> one occasional factor is that LLVM allows for long-lived `br i1 false`
>> instructions.
>>
>> The problem is that if a propagation pass discovers the condition of a
>> branch, the branch itself will not be eliminated until a SimpllfyCfg
>> pass is reached, of which there are few in the pipeline.
>>
>> One example is https://github.com/rust-lang/rust/issues/44041 - the
>> branch conditions are found by loop unrolling, but the `br i1 false`
>> instruction leaks to codegen.
>>
>> Another example is https://github.com/rust-lang/rust/issues/45466.
>>
>> In that case, induction variable simplification discovers that an
>> integer overflow check is unnecessary, but the remaining `br i1 false`
>> still blocks the following LoopIdiomRecognize pass and prevents the
>> loop from being optimized to a memset.
>>
>
> SimplifyCFG is the correct pass for this kind of transformation. I
> don't think it's entirely unreasonable to run it more often, but
the
> compile time cost needs to be taken in account.
>
> I'm not necessarily sympathetic to the idea of adding another
canonicalization
> pass only for this purpose.
>
> Side note:
> IMHO, at this point SimplifyCFG is doing even more than it should
> (including, e.g. sinking of variable from "almost empty" BBs,
i.e. BBs
> which have only a single instruction & terminator, if I recall
> correctly). If anything, I'd rather split the sinking logic and other
operations
> to a different pass, rather than moving simplifying proven unconditional
> branches elsewhere :)
>
>
> Also, FWIW, there has been a recent effort to split SimplifyCFG in multiple
> variants (with several cl::opt flags associated to enable specific
> transformations).
>
>
>> While it is possible to have point-fixes for both problems, this might
>> be widespread enough problem that a more general solution might be
>> better - for example, to have some sort of canonicalization that
>> automatically removes these branches in some situations (this requires
>> things such as the dominator tree to be rebuilt, so it shouldn't be
>> done literally everywhere, but it can be done fairly often).
>>
>
> Now that LLVM has an incremental dominator API it should be more
> feasible to run SimplifyCFG more times without recomputing the
> dominator all the time. I haven't checked whether there are other
> expensive analyses that need to be preserved.
>
> Thanks,
>
> --
> Davide
>
> "There are no solved problems; there are only problems that are more
> or less solved" -- Henri Poincare

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Nov 2017 - CFG normalization: avoiding `br i1 false`

[llvm-dev] CFG normalization: avoiding `br i1 false`

[llvm-dev] CFG normalization: avoiding `br i1 false`

[llvm-dev] CFG normalization: avoiding `br i1 false`

Apparently Analagous Threads