thr3ads.net - llvm dev - [llvm-dev] InstCombine, graph rewriting, Equality saturation [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2017-Sep-11 16:48 UTC

[llvm-dev] InstCombine, graph rewriting, Equality saturation

On 09/11/2017 10:50 AM, Sean Silva via llvm-dev wrote:>
>
> On Mon, Sep 11, 2017 at 8:14 AM, Daniel Neilson via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>
>     Just thinking out loud…. I’m really not familiar with the vast
>     majority of what instcombine does, so please excuse me if my
>     naiveté is too obvious. I can’t help but notice all of the uses of
>     ‘and’ in Daniel B’s description of what instcombine is right now:
>
>     > On Sep 8, 2017, at 11:27 PM, Daniel Berlin via llvm-dev
>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>     >
>     > FWIW, Before getting to "how to do it", I think
InstCombine
>     needs an actual goal.
>     > Right now it's "do a bunch of instruction combination and
>     canonicalization and random kinds of semi-local optimization with
>     some weird analysis and other stuff in the middle.
>     > Iterate this as hard as we can"
>     > Nobody can really draw any real dividing line for what should be
>     there or not except based on how easy or expensive it is to shove
>     it in.
>     > That's a recipe for pass creep.
>
>     This makes me wonder… is it sensible to be talking about splitting
>     up instcombine into multiple separate passes? Would such a thing
>     even be possible? For example, split by functionality into
>     separate passes that each do one of:
>     * instruction combinations
>     * canonicalization
>     * semi-local optimizations
>     * etc…
>
The problem is that all of these are really the same thing. Almost all 
canonicalizations are also simplifications, the common underlying factor 
being that they're mostly-local transformations that likely enable other 
optimizations.
>
>      Or something like that.
>
>      As separate passes, each would probably have a natural way to be
>     implemented effectively and those implementations might vary.
>
>
> One obstacle to that is that currently instcombine has an internal 
> fixed-point iteration that it does.
>
> So when splitting it into separate passes we would need to either add 
> fixed-point iteration to the pass manager running the separate 
> instcombine passes (extending the pass management in this way is 
> doable and has been explored in the past, e.g. 
> https://www.youtube.com/watch?v=c7iP43an5_Q ) or demonstrate/measure 
> that we don't regress without the fixed-point iteration across 
> separate instcombine passes.
I agree, but I'll add that I view the fixed-point iteration here to be 
an asset. It increases the robustness and consistency of the compiler, 
and allows later passes to depend on the canonicalization (*) without 
worrying about phase-ordering effects (**).

(*) In my experience, the largest problem we have in this regard is our 
lack of documentation on what our canonical form is.
(**) To be clear, we still have phase-ordering effects within 
InstCombine. Using a better system for dealing with the rewriting rules, 
I hope, will help. Nevertheless, these effects are much rarer than if we 
ran InstCombine only as discrete passe.

I'd like to see us do more fixed-point iteration in our optimization 
pipeline, and our past work has shown this would be practical (i.e., 
only a few iterations will be needed in practice), but even that won't 
remove the advantages of using a fixed-point iteration within InstCombine.

Regardless, I think that if we had a model for what InstCombine did 
(i.e., as a graph-rewriting system), then it would be clear what could 
be part of that system and what couldn't. Otherwise, I think that the 
real challenge here is figuring out the underlying structural 
foundations to make the process efficient and sound.

  -Hal
>
> -- Sean Silva
>
>
>     -Daniel
>
>     ---
>     Daniel Neilson, Ph.D.
>     Azul Systems
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170911/23ecd0fb/attachment.html>

Daniel Berlin via llvm-dev

2017-Sep-11 17:49 UTC

head link

[llvm-dev] InstCombine, graph rewriting, Equality saturation

On Mon, Sep 11, 2017 at 9:48 AM, Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> On 09/11/2017 10:50 AM, Sean Silva via llvm-dev wrote:
>
>
>
> On Mon, Sep 11, 2017 at 8:14 AM, Daniel Neilson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>> Just thinking out loud…. I’m really not familiar with the vast majority
>> of what instcombine does, so please excuse me if my naiveté is too
obvious.
>> I can’t help but notice all of the uses of ‘and’ in Daniel B’s
description
>> of what instcombine is right now:
>>
>> > On Sep 8, 2017, at 11:27 PM, Daniel Berlin via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >
>> > FWIW, Before getting to "how to do it", I think
InstCombine needs an
>> actual goal.
>> > Right now it's "do a bunch of instruction combination and
>> canonicalization and random kinds of semi-local optimization with some
>> weird analysis and other stuff in the middle.
>> > Iterate this as hard as we can"
>> > Nobody can really draw any real dividing line for what should be
there
>> or not except based on how easy or expensive it is to shove it in.
>> > That's a recipe for pass creep.
>>
>> This makes me wonder… is it sensible to be talking about splitting up
>> instcombine into multiple separate passes? Would such a thing even be
>> possible? For example, split by functionality into separate passes that
>> each do one of:
>> * instruction combinations
>> * canonicalization
>> * semi-local optimizations
>> * etc…
>>
>
> The problem is that all of these are really the same thing. Almost all
> canonicalizations are also simplifications, the common underlying factor
> being that they're mostly-local transformations that likely enable
other
> optimizations.
>
>
>>  Or something like that.
>>
>>  As separate passes, each would probably have a natural way to be
>> implemented effectively and those implementations might vary.
>>
>
> One obstacle to that is that currently instcombine has an internal
> fixed-point iteration that it does.
>
> So when splitting it into separate passes we would need to either add
> fixed-point iteration to the pass manager running the separate instcombine
> passes (extending the pass management in this way is doable and has been
> explored in the past, e.g. https://www.youtube.com/watch?v=c7iP43an5_Q )
> or demonstrate/measure that we don't regress without the fixed-point
> iteration across separate instcombine passes.
>
>
> I agree, but I'll add that I view the fixed-point iteration here to be
an
> asset. It increases the robustness and consistency of the compiler, and
> allows later passes to depend on the canonicalization (*) without worrying
> about phase-ordering effects (**).
>FWIW: I completely agree.  At least as you  are using it here, which is in
the sense of "system is built to apply a set of transformation. Fixed point
iteration ensures that all such transformations that can occur, do, even if
they are only exposed as a result of earlier transformations".

In that sense, a vast majority of our passes do or should be using fixed
point iteration.
The usual concern is cost, and it requires careful engineering (as you say
below) to ensure you are doing this in an effective manner.  IE ad-hoc
solutions work but rarely scale, it requires a good understanding of theory
and engineering to get something that will scale really well and work well

Integrating multiple transformations/capabilities into a single pass is
much trickier than one would expect.  You can easily screw stuff up by
doing things you think might only improve things.

A simple example from value numbering https://pastebin.com/Tp11bfCa

Note that if we ignore unreachable edges at first and just shove it in the
algorithm, we no longer get optimal answers in certain cases.  Even though
we only changed exactly one initial state, and we're trying to make things
better!
This is fixable, but requires a lot of thought to see that this will happen
in the first case.

(the reason in this case is because the algorithm relies on the TOP state
to resolve cycles. There is no perfect comes-before iteration ordering in
this case.  By changing the initial state that fed the cycle, without
resetting that cycle to TOP, it can no longer resolve the cycle to the
optimal answer. )


> I'd like to see us do more fixed-point iteration in our optimization
> pipeline, and our past work has shown this would be practical (i.e., only a
> few iterations will be needed in practice), but even that won't remove
the
> advantages of using a fixed-point iteration within InstCombine.
>
Completely agree on this
>
> Regardless, I think that if we had a model for what InstCombine did (i.e.,
> as a graph-rewriting system), then it would be clear what could be part of
> that system and what couldn't. Otherwise, I think that the real
challenge
> here is figuring out the underlying structural foundations to make the
> process efficient and sound.
>
Also completely agree.

>
>  -Hal
>
>
> -- Sean Silva
>
>
>>
>> -Daniel
>>
>> ---
>> Daniel Neilson, Ph.D.
>> Azul Systems
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170911/0cbab4c4/attachment.html>

Sean Silva via llvm-dev

2017-Sep-11 20:32 UTC

head link

[llvm-dev] InstCombine, graph rewriting, Equality saturation

On Mon, Sep 11, 2017 at 9:48 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>
> On 09/11/2017 10:50 AM, Sean Silva via llvm-dev wrote:
>
>
>
> On Mon, Sep 11, 2017 at 8:14 AM, Daniel Neilson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>> Just thinking out loud…. I’m really not familiar with the vast majority
>> of what instcombine does, so please excuse me if my naiveté is too
obvious.
>> I can’t help but notice all of the uses of ‘and’ in Daniel B’s
description
>> of what instcombine is right now:
>>
>> > On Sep 8, 2017, at 11:27 PM, Daniel Berlin via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >
>> > FWIW, Before getting to "how to do it", I think
InstCombine needs an
>> actual goal.
>> > Right now it's "do a bunch of instruction combination and
>> canonicalization and random kinds of semi-local optimization with some
>> weird analysis and other stuff in the middle.
>> > Iterate this as hard as we can"
>> > Nobody can really draw any real dividing line for what should be
there
>> or not except based on how easy or expensive it is to shove it in.
>> > That's a recipe for pass creep.
>>
>> This makes me wonder… is it sensible to be talking about splitting up
>> instcombine into multiple separate passes? Would such a thing even be
>> possible? For example, split by functionality into separate passes that
>> each do one of:
>> * instruction combinations
>> * canonicalization
>> * semi-local optimizations
>> * etc…
>>
>
> The problem is that all of these are really the same thing. Almost all
> canonicalizations are also simplifications, the common underlying factor
> being that they're mostly-local transformations that likely enable
other
> optimizations.
>
>
>>  Or something like that.
>>
>>  As separate passes, each would probably have a natural way to be
>> implemented effectively and those implementations might vary.
>>
>
> One obstacle to that is that currently instcombine has an internal
> fixed-point iteration that it does.
>
> So when splitting it into separate passes we would need to either add
> fixed-point iteration to the pass manager running the separate instcombine
> passes (extending the pass management in this way is doable and has been
> explored in the past, e.g. https://www.youtube.com/watch?v=c7iP43an5_Q )
> or demonstrate/measure that we don't regress without the fixed-point
> iteration across separate instcombine passes.
>
>
> I agree, but I'll add that I view the fixed-point iteration here to be
an
> asset. It increases the robustness and consistency of the compiler, and
> allows later passes to depend on the canonicalization (*) without worrying
> about phase-ordering effects (**).
>
> (*) In my experience, the largest problem we have in this regard is our
> lack of documentation on what our canonical form is.
> (**) To be clear, we still have phase-ordering effects within InstCombine.
> Using a better system for dealing with the rewriting rules, I hope, will
> help. Nevertheless, these effects are much rarer than if we ran InstCombine
> only as discrete passe.
>
> I'd like to see us do more fixed-point iteration in our optimization
> pipeline, and our past work has shown this would be practical (i.e., only a
> few iterations will be needed in practice), but even that won't remove
the
> advantages of using a fixed-point iteration within InstCombine.
>
> Regardless, I think that if we had a model for what InstCombine did (i.e.,
> as a graph-rewriting system), then it would be clear what could be part of
> that system and what couldn't. Otherwise, I think that the real
challenge
> here is figuring out the underlying structural foundations to make the
> process efficient and sound.
>
As a data point for this, I did a quick walkthrough of the top-level of
instcombine, and one thing that stuck out to me is that it tries to sink
instructions in certain "easy" scenarios. That doesn't fit a graph
rewriting system.

As a side note, TryToSinkInstruction does a whole-BB scan to check safety
for sinking loads, which will go quadratic (in the number of loads) on an
input like:

```
void use(int);
void foo(int *p, int *q, bool b) {
int p0 = p[0];
int p1 = p[1];
int p2 = p[2];
int p3 = p[3];
if (b) {
use(p0);
use(p1);
use(p2);
use(p3);
}
}
```
(script attached)

The commit where it was introduced (https://reviews.llvm.org/rL18677 /
https://reviews.llvm.org/rL18692 yes that's from 2004) indicated that it
actually caught a bunch of cases in SPEC, but it isn't clear that it's
really providing much value integrated into the fixed-point iteration
compared to an appropriate standalone global code motion pass (which would
not have the quadratic complexity and would be more general to boot).

-- Sean Silva

>
>
>  -Hal
>
>
> -- Sean Silva
>
>
>>
>> -Daniel
>>
>> ---
>> Daniel Neilson, Ph.D.
>> Azul Systems
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170911/8f13fe0e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: instcombine_quadratic_load_sinking.py
Type: text/x-python
Size: 477 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170911/8f13fe0e/attachment.py>

Craig Topper via llvm-dev

2017-Sep-11 20:50 UTC

head link

[llvm-dev] InstCombine, graph rewriting, Equality saturation

The sinking in InstCombine also has a bogus single use check on the front
of it. Why does it matter how many uses there are as long as they are all
in the same basic block?

~Craig

On Mon, Sep 11, 2017 at 1:32 PM, Sean Silva via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
>
> On Mon, Sep 11, 2017 at 9:48 AM, Hal Finkel <hfinkel at anl.gov>
wrote:
>
>>
>> On 09/11/2017 10:50 AM, Sean Silva via llvm-dev wrote:
>>
>>
>>
>> On Mon, Sep 11, 2017 at 8:14 AM, Daniel Neilson via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>> Just thinking out loud…. I’m really not familiar with the vast
majority
>>> of what instcombine does, so please excuse me if my naiveté is too
obvious.
>>> I can’t help but notice all of the uses of ‘and’ in Daniel B’s
description
>>> of what instcombine is right now:
>>>
>>> > On Sep 8, 2017, at 11:27 PM, Daniel Berlin via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> >
>>> > FWIW, Before getting to "how to do it", I think
InstCombine needs an
>>> actual goal.
>>> > Right now it's "do a bunch of instruction combination
and
>>> canonicalization and random kinds of semi-local optimization with
some
>>> weird analysis and other stuff in the middle.
>>> > Iterate this as hard as we can"
>>> > Nobody can really draw any real dividing line for what should
be there
>>> or not except based on how easy or expensive it is to shove it in.
>>> > That's a recipe for pass creep.
>>>
>>> This makes me wonder… is it sensible to be talking about splitting
up
>>> instcombine into multiple separate passes? Would such a thing even
be
>>> possible? For example, split by functionality into separate passes
that
>>> each do one of:
>>> * instruction combinations
>>> * canonicalization
>>> * semi-local optimizations
>>> * etc…
>>>
>>
>> The problem is that all of these are really the same thing. Almost all
>> canonicalizations are also simplifications, the common underlying
factor
>> being that they're mostly-local transformations that likely enable
other
>> optimizations.
>>
>>
>>>  Or something like that.
>>>
>>>  As separate passes, each would probably have a natural way to be
>>> implemented effectively and those implementations might vary.
>>>
>>
>> One obstacle to that is that currently instcombine has an internal
>> fixed-point iteration that it does.
>>
>> So when splitting it into separate passes we would need to either add
>> fixed-point iteration to the pass manager running the separate
instcombine
>> passes (extending the pass management in this way is doable and has
been
>> explored in the past, e.g. https://www.youtube.com/watch?v=c7iP43an5_Q
)
>> or demonstrate/measure that we don't regress without the
fixed-point
>> iteration across separate instcombine passes.
>>
>>
>> I agree, but I'll add that I view the fixed-point iteration here to
be an
>> asset. It increases the robustness and consistency of the compiler, and
>> allows later passes to depend on the canonicalization (*) without
worrying
>> about phase-ordering effects (**).
>>
>> (*) In my experience, the largest problem we have in this regard is our
>> lack of documentation on what our canonical form is.
>> (**) To be clear, we still have phase-ordering effects within
>> InstCombine. Using a better system for dealing with the rewriting
rules, I
>> hope, will help. Nevertheless, these effects are much rarer than if we
ran
>> InstCombine only as discrete passe.
>>
>> I'd like to see us do more fixed-point iteration in our
optimization
>> pipeline, and our past work has shown this would be practical (i.e.,
only a
>> few iterations will be needed in practice), but even that won't
remove the
>> advantages of using a fixed-point iteration within InstCombine.
>>
>> Regardless, I think that if we had a model for what InstCombine did
>> (i.e., as a graph-rewriting system), then it would be clear what could
be
>> part of that system and what couldn't. Otherwise, I think that the
real
>> challenge here is figuring out the underlying structural foundations to
>> make the process efficient and sound.
>>
>
> As a data point for this, I did a quick walkthrough of the top-level of
> instcombine, and one thing that stuck out to me is that it tries to sink
> instructions in certain "easy" scenarios. That doesn't fit a
graph
> rewriting system.
>
> As a side note, TryToSinkInstruction does a whole-BB scan to check safety
> for sinking loads, which will go quadratic (in the number of loads) on an
> input like:
>
> ```
> void use(int);
> void foo(int *p, int *q, bool b) {
> int p0 = p[0];
> int p1 = p[1];
> int p2 = p[2];
> int p3 = p[3];
> if (b) {
> use(p0);
> use(p1);
> use(p2);
> use(p3);
> }
> }
> ```
> (script attached)
>
> The commit where it was introduced (https://reviews.llvm.org/rL18677 /
> https://reviews.llvm.org/rL18692 yes that's from 2004) indicated that
it
> actually caught a bunch of cases in SPEC, but it isn't clear that
it's
> really providing much value integrated into the fixed-point iteration
> compared to an appropriate standalone global code motion pass (which would
> not have the quadratic complexity and would be more general to boot).
>
> -- Sean Silva
>
>
>>
>>
>>  -Hal
>>
>>
>> -- Sean Silva
>>
>>
>>>
>>> -Daniel
>>>
>>> ---
>>> Daniel Neilson, Ph.D.
>>> Azul Systems
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170911/d8786de1/attachment-0001.html>

David Chisnall via llvm-dev

2017-Sep-12 08:33 UTC

head link

[llvm-dev] InstCombine, graph rewriting, Equality saturation

On 11 Sep 2017, at 21:32, Sean Silva via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> 
> As a data point for this, I did a quick walkthrough of the top-level of
instcombine, and one thing that stuck out to me is that it tries to sink
instructions in certain "easy" scenarios. That doesn't fit a graph
rewriting system.
The sinking in instcombine is not always a win.  We hit a case recently where
instcombine made a hot loop in a benchmark significantly slower by moving an
address-space cast instruction into a loop as part of its gep of addrspacecast
to addrspacecast of gep transformation.  On our architecture, this translates to
an extra instruction in a hot loop for the address-space cast *and* means that
we then can’t use the complex addressing modes for the loads and stores because
we have an address space cast in between the GEP and the memory op.  We end up
increasing the number of instruction in the loop by around 20%, with a
corresponding decrease in performance.

My somewhat long-winded point is that I’m not convinced that InstCombine is
doing sufficient reasoning when it moves instructions between basic blocks to
determine that the moves actually make sense.  In this benchmark, this one
misoptimisation offset all of the other gains from InstCombine and simply
disabling it gave us better code.

I don’t know how many other examples of this there are, but it would probably be
good to have an InstCombine that ran on single basic blocks (or sequences with
trivial flow control) and did folding that was always a win and something
separate that’s more path-aware (and probably doesn’t run at O1).

David

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Sep 2017 - InstCombine, graph rewriting, Equality saturation

[llvm-dev] InstCombine, graph rewriting, Equality saturation

[llvm-dev] InstCombine, graph rewriting, Equality saturation

[llvm-dev] InstCombine, graph rewriting, Equality saturation

[llvm-dev] InstCombine, graph rewriting, Equality saturation

[llvm-dev] InstCombine, graph rewriting, Equality saturation

Possibly Parallel Threads