David Chisnall via llvm-dev
2017-Sep-12 08:33 UTC
[llvm-dev] InstCombine, graph rewriting, Equality saturation
On 11 Sep 2017, at 21:32, Sean Silva via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > As a data point for this, I did a quick walkthrough of the top-level of instcombine, and one thing that stuck out to me is that it tries to sink instructions in certain "easy" scenarios. That doesn't fit a graph rewriting system.The sinking in instcombine is not always a win. We hit a case recently where instcombine made a hot loop in a benchmark significantly slower by moving an address-space cast instruction into a loop as part of its gep of addrspacecast to addrspacecast of gep transformation. On our architecture, this translates to an extra instruction in a hot loop for the address-space cast *and* means that we then can’t use the complex addressing modes for the loads and stores because we have an address space cast in between the GEP and the memory op. We end up increasing the number of instruction in the loop by around 20%, with a corresponding decrease in performance. My somewhat long-winded point is that I’m not convinced that InstCombine is doing sufficient reasoning when it moves instructions between basic blocks to determine that the moves actually make sense. In this benchmark, this one misoptimisation offset all of the other gains from InstCombine and simply disabling it gave us better code. I don’t know how many other examples of this there are, but it would probably be good to have an InstCombine that ran on single basic blocks (or sequences with trivial flow control) and did folding that was always a win and something separate that’s more path-aware (and probably doesn’t run at O1). David
Craig Topper via llvm-dev
2017-Sep-12 15:31 UTC
[llvm-dev] InstCombine, graph rewriting, Equality saturation
This sounds less like the explicit sinking code where InstCombine tries to analyzer users and just moves already existing instructions. That code shouldn't be able to move into loops because it checks that the successor its moving into only has a single predecessor. I think this case is a consequence of InstCombine creating new instructions in the basic block that the parent instruction of the transform lives in. ~Craig On Tue, Sep 12, 2017 at 1:33 AM, David Chisnall via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On 11 Sep 2017, at 21:32, Sean Silva via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > > > > As a data point for this, I did a quick walkthrough of the top-level of > instcombine, and one thing that stuck out to me is that it tries to sink > instructions in certain "easy" scenarios. That doesn't fit a graph > rewriting system. > > The sinking in instcombine is not always a win. We hit a case recently > where instcombine made a hot loop in a benchmark significantly slower by > moving an address-space cast instruction into a loop as part of its gep of > addrspacecast to addrspacecast of gep transformation. On our architecture, > this translates to an extra instruction in a hot loop for the address-space > cast *and* means that we then can’t use the complex addressing modes for > the loads and stores because we have an address space cast in between the > GEP and the memory op. We end up increasing the number of instruction in > the loop by around 20%, with a corresponding decrease in performance. > > My somewhat long-winded point is that I’m not convinced that InstCombine > is doing sufficient reasoning when it moves instructions between basic > blocks to determine that the moves actually make sense. In this benchmark, > this one misoptimisation offset all of the other gains from InstCombine and > simply disabling it gave us better code. > > I don’t know how many other examples of this there are, but it would > probably be good to have an InstCombine that ran on single basic blocks (or > sequences with trivial flow control) and did folding that was always a win > and something separate that’s more path-aware (and probably doesn’t run at > O1). > > David > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170912/27bd3768/attachment.html>
Davide Italiano via llvm-dev
2017-Sep-12 18:15 UTC
[llvm-dev] InstCombine, graph rewriting, Equality saturation
On Sep 12, 2017 1:34 AM, "David Chisnall via llvm-dev" < llvm-dev at lists.llvm.org> wrote: The sinking in instcombine is not always a win. We hit a case recently where instcombine made a hot loop in a benchmark significantly slower by moving an address-space cast instruction into a loop as part of its gep of addrspacecast to addrspacecast of gep transformation. On our architecture, this translates to an extra instruction in a hot loop for the address-space cast *and* means that we then can’t use the complex addressing modes for the loads and stores because we have an address space cast in between the GEP and the memory op. We end up increasing the number of instruction in the loop by around 20%, with a corresponding decrease in performance. My somewhat long-winded point is that I’m not convinced that InstCombine is doing sufficient reasoning when it moves instructions between basic blocks to determine that the moves actually make sense. My somewhat long-winded point is instead that IC shouldn't even bother trying to sink. I understand it may eventually expose other optimizations but it doesn't seem to be worth the complexity added. As you say, the heuristic is pretty simple (fold instructions to blocks with a single predecessor, IIRC, but don't quote me on that). There are cases where it regresses, and I'm not surprised as sinking is not done after a proper analysis (as, e.g. the one GVNSink or LICM are doing). You could probably catch some cases teaching IC about profile metadata when taking sink decisions, but that sounds like adding even more madness. -- Davide -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170912/877e80dc/attachment.html>
Craig Topper via llvm-dev
2017-Sep-12 18:37 UTC
[llvm-dev] InstCombine, graph rewriting, Equality saturation
There's the explicit sinking code in the main worklist processing loop around TryToSinkInstruction. I don't believe that can move code into a loop since it can only move into a successor block with no predecessors. So I don't believe that's what happened in the addressspace cast case. Then there's accidental sinking that occurs when InstCombine looks at two or more instructions with some in another basic block and combines them to two or more instructions. In that case all of the combined instructions will be created in the block that contains the instruction that started the combine since that's going to be the insert location for the IRBuilder. I think that's what happened in the addressspace cast case. I don't know how to fix this without qualifying every transform that produces mutliple instructions with a check that all the original instructions started in the same basic block. ~Craig On Tue, Sep 12, 2017 at 11:15 AM, Davide Italiano via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > > On Sep 12, 2017 1:34 AM, "David Chisnall via llvm-dev" < > llvm-dev at lists.llvm.org> wrote: > > The sinking in instcombine is not always a win. We hit a case recently > where instcombine made a hot loop in a benchmark significantly slower by > moving an address-space cast instruction into a loop as part of its gep of > addrspacecast to addrspacecast of gep transformation. On our architecture, > this translates to an extra instruction in a hot loop for the address-space > cast *and* means that we then can’t use the complex addressing modes for > the loads and stores because we have an address space cast in between the > GEP and the memory op. We end up increasing the number of instruction in > the loop by around 20%, with a corresponding decrease in performance. > > My somewhat long-winded point is that I’m not convinced that InstCombine > is doing sufficient reasoning when it moves instructions between basic > blocks to determine that the moves actually make sense. > > > My somewhat long-winded point is instead that IC shouldn't even bother > trying to sink. I understand it may eventually expose other optimizations > but it doesn't seem to be worth the complexity added. As you say, the > heuristic is pretty simple (fold instructions to blocks with a single > predecessor, IIRC, but don't quote me on that). There are cases where it > regresses, and I'm not surprised as sinking is not done after a proper > analysis (as, e.g. the one GVNSink or LICM are doing). You could probably > catch some cases teaching IC about profile metadata when taking sink > decisions, but that sounds like adding even more madness. > > -- > Davide > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170912/b1b44f2c/attachment.html>