thr3ads.net - llvm dev - [llvm-dev] InstCombine, graph rewriting, Equality saturation [Sep 2017]

If this information is useful, please help other people find it:
Share via:

David Chisnall via llvm-dev

2017-Sep-12 08:33 UTC

[llvm-dev] InstCombine, graph rewriting, Equality saturation

On 11 Sep 2017, at 21:32, Sean Silva via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> 
> As a data point for this, I did a quick walkthrough of the top-level of
instcombine, and one thing that stuck out to me is that it tries to sink
instructions in certain "easy" scenarios. That doesn't fit a graph
rewriting system.
The sinking in instcombine is not always a win.  We hit a case recently where
instcombine made a hot loop in a benchmark significantly slower by moving an
address-space cast instruction into a loop as part of its gep of addrspacecast
to addrspacecast of gep transformation.  On our architecture, this translates to
an extra instruction in a hot loop for the address-space cast *and* means that
we then can’t use the complex addressing modes for the loads and stores because
we have an address space cast in between the GEP and the memory op.  We end up
increasing the number of instruction in the loop by around 20%, with a
corresponding decrease in performance.

My somewhat long-winded point is that I’m not convinced that InstCombine is
doing sufficient reasoning when it moves instructions between basic blocks to
determine that the moves actually make sense.  In this benchmark, this one
misoptimisation offset all of the other gains from InstCombine and simply
disabling it gave us better code.

I don’t know how many other examples of this there are, but it would probably be
good to have an InstCombine that ran on single basic blocks (or sequences with
trivial flow control) and did folding that was always a win and something
separate that’s more path-aware (and probably doesn’t run at O1).

David

Craig Topper via llvm-dev

2017-Sep-12 15:31 UTC

head link

[llvm-dev] InstCombine, graph rewriting, Equality saturation

This sounds less like the explicit sinking code where InstCombine tries to
analyzer users and just moves already existing instructions. That code
shouldn't be able to move into loops because it checks that the successor
its moving into only has a single predecessor.

I think this case is a consequence of InstCombine creating new instructions
in the basic block that the parent instruction of the transform lives in.

~Craig

On Tue, Sep 12, 2017 at 1:33 AM, David Chisnall via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 11 Sep 2017, at 21:32, Sean Silva via llvm-dev <llvm-dev at
lists.llvm.org>
> wrote:
> >
> > As a data point for this, I did a quick walkthrough of the top-level
of
> instcombine, and one thing that stuck out to me is that it tries to sink
> instructions in certain "easy" scenarios. That doesn't fit a
graph
> rewriting system.
>
> The sinking in instcombine is not always a win.  We hit a case recently
> where instcombine made a hot loop in a benchmark significantly slower by
> moving an address-space cast instruction into a loop as part of its gep of
> addrspacecast to addrspacecast of gep transformation.  On our architecture,
> this translates to an extra instruction in a hot loop for the address-space
> cast *and* means that we then can’t use the complex addressing modes for
> the loads and stores because we have an address space cast in between the
> GEP and the memory op.  We end up increasing the number of instruction in
> the loop by around 20%, with a corresponding decrease in performance.
>
> My somewhat long-winded point is that I’m not convinced that InstCombine
> is doing sufficient reasoning when it moves instructions between basic
> blocks to determine that the moves actually make sense.  In this benchmark,
> this one misoptimisation offset all of the other gains from InstCombine and
> simply disabling it gave us better code.
>
> I don’t know how many other examples of this there are, but it would
> probably be good to have an InstCombine that ran on single basic blocks (or
> sequences with trivial flow control) and did folding that was always a win
> and something separate that’s more path-aware (and probably doesn’t run at
> O1).
>
> David
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170912/27bd3768/attachment.html>

Davide Italiano via llvm-dev

2017-Sep-12 18:15 UTC

head link

[llvm-dev] InstCombine, graph rewriting, Equality saturation

On Sep 12, 2017 1:34 AM, "David Chisnall via llvm-dev" <
llvm-dev at lists.llvm.org> wrote:

The sinking in instcombine is not always a win.  We hit a case recently
where instcombine made a hot loop in a benchmark significantly slower by
moving an address-space cast instruction into a loop as part of its gep of
addrspacecast to addrspacecast of gep transformation.  On our architecture,
this translates to an extra instruction in a hot loop for the address-space
cast *and* means that we then can’t use the complex addressing modes for
the loads and stores because we have an address space cast in between the
GEP and the memory op.  We end up increasing the number of instruction in
the loop by around 20%, with a corresponding decrease in performance.

My somewhat long-winded point is that I’m not convinced that InstCombine is
doing sufficient reasoning when it moves instructions between basic blocks
to determine that the moves actually make sense.


My somewhat long-winded point is instead that IC shouldn't even bother
trying to sink. I understand it may eventually expose other optimizations
but it doesn't seem to be worth the complexity added. As you say, the
heuristic is pretty simple (fold instructions to blocks with a single
predecessor, IIRC, but don't quote me on that). There are cases where it
regresses, and I'm not surprised as sinking is not done after a proper
analysis (as, e.g. the one GVNSink or LICM are doing). You could probably
catch some cases teaching IC about profile metadata when taking sink
decisions, but that sounds like adding even more madness.

--
Davide
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170912/877e80dc/attachment.html>

Craig Topper via llvm-dev

2017-Sep-12 18:37 UTC

head link

[llvm-dev] InstCombine, graph rewriting, Equality saturation

There's the explicit sinking code in the main worklist processing loop
around TryToSinkInstruction. I don't believe that can move code into a loop
since it can only move into a successor block with no predecessors. So I
don't believe that's what happened in the addressspace cast case.

Then there's accidental sinking that occurs when InstCombine looks at two
or more instructions with some in another basic block and combines them to
two or more instructions. In that case all of the combined instructions
will be created in the block that contains the instruction that started the
combine since that's going to be the insert location for the IRBuilder. I
think that's what happened in the addressspace cast case. I don't know
how
to fix this without qualifying every transform that produces mutliple
instructions with a check that all the original instructions started in the
same basic block.

~Craig

On Tue, Sep 12, 2017 at 11:15 AM, Davide Italiano via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
>
> On Sep 12, 2017 1:34 AM, "David Chisnall via llvm-dev" <
> llvm-dev at lists.llvm.org> wrote:
>
> The sinking in instcombine is not always a win.  We hit a case recently
> where instcombine made a hot loop in a benchmark significantly slower by
> moving an address-space cast instruction into a loop as part of its gep of
> addrspacecast to addrspacecast of gep transformation.  On our architecture,
> this translates to an extra instruction in a hot loop for the address-space
> cast *and* means that we then can’t use the complex addressing modes for
> the loads and stores because we have an address space cast in between the
> GEP and the memory op.  We end up increasing the number of instruction in
> the loop by around 20%, with a corresponding decrease in performance.
>
> My somewhat long-winded point is that I’m not convinced that InstCombine
> is doing sufficient reasoning when it moves instructions between basic
> blocks to determine that the moves actually make sense.
>
>
> My somewhat long-winded point is instead that IC shouldn't even bother
> trying to sink. I understand it may eventually expose other optimizations
> but it doesn't seem to be worth the complexity added. As you say, the
> heuristic is pretty simple (fold instructions to blocks with a single
> predecessor, IIRC, but don't quote me on that). There are cases where
it
> regresses, and I'm not surprised as sinking is not done after a proper
> analysis (as, e.g. the one GVNSink or LICM are doing). You could probably
> catch some cases teaching IC about profile metadata when taking sink
> decisions, but that sounds like adding even more madness.
>
> --
> Davide
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170912/b1b44f2c/attachment.html>

llvm dev - Sep 2017 - InstCombine, graph rewriting, Equality saturation

[llvm-dev] InstCombine, graph rewriting, Equality saturation

[llvm-dev] InstCombine, graph rewriting, Equality saturation

[llvm-dev] InstCombine, graph rewriting, Equality saturation

[llvm-dev] InstCombine, graph rewriting, Equality saturation