Chandler Carruth
2014-Jan-22 11:39 UTC
[LLVMdev] Why should we have the LoopPass and LoopPassManager? Can we get rid of this complexity?
On Wed, Jan 22, 2014 at 1:01 AM, Andrew Trick <atrick at apple.com> wrote:> On Jan 22, 2014, at 12:44 AM, Chandler Carruth <chandlerc at gmail.com> > wrote: > > > On Wed, Jan 22, 2014 at 12:33 AM, Andrew Trick <atrick at apple.com> wrote: > >> > There appear to be two chunks of "functionality" provided by loop >> passes: >> > >> > 1) A worklist of loops to process. This is very rarely used: >> > 1.1) LoopSimplify and LoopUnswitch add loops to the queue. >> >> I’m making this up without much thought, but we may benefit from >> iterative loop transforms like Rotate -> LICM -> Unswitch -> Rotate -> >> LICM. We might need to come up with a preferred alternative: we can either >> continue to convert transforms into a utilities, or we can invent new pass >> manager tricks. e.g. iterate over a set of function passes or selectively >> run a pass on “dirty” regions. > > > This is a really good point. Owen pointed it out to me as well in another > guise: when we unroll a loop, we need to re-run a bunch of the passes, but > re-running them when we *haven't* successfully unrolled a loop is a total > waste. > > I'd like to think more about this, so a simpler option: what if we *just* > extract LoopSimplify and LCSSA? If all the LoopPasses preserve these, then > them being function passes would be fine. This would allow us to at least > *start* writing function passes over loops (like the LoopVectorizer) rather > than being forced to phrase them as LoopPasses. > > I think I could implement this simpler option right away, and that would > unblock a lot of our efforts around unrolling, vectorizing, and PGO. > > > Great. LoopSimplify can be used as a utility to cleanup transformed loops > if needed. >This would really only work with pretty specialized logic in the LoopVectorizer and LSR to select only the cleanup aspects of LoopSimplify that they need. And it wouldn't even have any utility in LSR. Unfortunately, the loop pass manager makes doing this incrementally hard. The loop vectorizer runs on the inner most loop, makes in not-simplified. We then pop out to the outer loop and verify everything. The now-function-pass LoopSimplify fails its verification because it verifies *all* of the loops in the function, even though the inner loops aren't going to be revisited. My suggestion is to disable the whole-function verification in LoopSimplify while we're fixing this. We can instead have LCSSA verify the specific loop it is processing which will give us the same safety guarantees. Does that sound OK? That will let me do the following: 1) Hoist LoopSimplify to a function pass 2) Hoist LCSSA to a function pass 3) Make the LoopVectorizer a function pass 4) Make IVUsers a utility of LSR, run on-demand for the loop that LSR is processing (no pass manager involvement) 5) Make LSR a function pass As a consequence of this, we will actually fix the long-standing issue of LoopSimplify vs. ScalarEvolution, and cut the number of runs of LoopSimplify and LCSSA by half (!!!) so I think this is worth doing. I have a patch that does #1 already, but wanted to check that you're OK weakening the verification. Otherwise, I have to do 1, 2, 3, and 5 in a single commit, or teach the LoopVectorizer and LSR to preserve LoopSimplify... Yuck. -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140122/19ec6b23/attachment.html>
Chandler Carruth
2014-Jan-22 12:01 UTC
[LLVMdev] Why should we have the LoopPass and LoopPassManager? Can we get rid of this complexity?
On Wed, Jan 22, 2014 at 3:39 AM, Chandler Carruth <chandlerc at gmail.com>wrote:> I have a patch that does #1 already, but wanted to check that you're OK > weakening the verification. Otherwise, I have to do 1, 2, 3, and 5 in a > single commit, or teach the LoopVectorizer and LSR to preserve > LoopSimplify... Yuck.This patch appears to cause slight changes in three test cases: LLVM :: Transforms/IndVarSimplify/lftr-reuse.ll LLVM :: Transforms/LoopSimplify/ashr-crash.ll LLVM :: Transforms/LoopStrengthReduce/2011-12-19-PostincQuadratic.ll Looking at lftr-reuse.ll, we successfully hoist the 'icmp slt' into the 'outer:' block as the comment says would be nice (because the outer loop is simplified now, the test is checking for unsimplified). The LSR failure is just that the loop basic blocks have different names (loopexit instead of preheader). The ashr-crash.ll case is minutely interesting -- we fail to hoist the comparison that the test wants hoisted into the entry block. My suspicion is that getting this to hoist with the heavily reduced pipeline used is problematic, as the test seems more geared to tickle SCEV bugs than test important optimization invariants. All of the other regression tests pass, so this looks pretty good to me. Let me know! I can mail the patch tomorrow if it helps. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140122/94d5eb58/attachment.html>
Andrew Trick
2014-Jan-22 20:02 UTC
[LLVMdev] Why should we have the LoopPass and LoopPassManager? Can we get rid of this complexity?
On Jan 22, 2014, at 3:39 AM, Chandler Carruth <chandlerc at gmail.com> wrote:> > On Wed, Jan 22, 2014 at 1:01 AM, Andrew Trick <atrick at apple.com> wrote: > On Jan 22, 2014, at 12:44 AM, Chandler Carruth <chandlerc at gmail.com> wrote: > >> >> On Wed, Jan 22, 2014 at 12:33 AM, Andrew Trick <atrick at apple.com> wrote: >> > There appear to be two chunks of "functionality" provided by loop passes: >> > >> > 1) A worklist of loops to process. This is very rarely used: >> > 1.1) LoopSimplify and LoopUnswitch add loops to the queue. >> >> I’m making this up without much thought, but we may benefit from iterative loop transforms like Rotate -> LICM -> Unswitch -> Rotate -> LICM. We might need to come up with a preferred alternative: we can either continue to convert transforms into a utilities, or we can invent new pass manager tricks. e.g. iterate over a set of function passes or selectively run a pass on “dirty” regions. >> >> This is a really good point. Owen pointed it out to me as well in another guise: when we unroll a loop, we need to re-run a bunch of the passes, but re-running them when we *haven't* successfully unrolled a loop is a total waste. >> >> I'd like to think more about this, so a simpler option: what if we *just* extract LoopSimplify and LCSSA? If all the LoopPasses preserve these, then them being function passes would be fine. This would allow us to at least *start* writing function passes over loops (like the LoopVectorizer) rather than being forced to phrase them as LoopPasses. >> >> I think I could implement this simpler option right away, and that would unblock a lot of our efforts around unrolling, vectorizing, and PGO. > > Great. LoopSimplify can be used as a utility to cleanup transformed loops if needed. > > This would really only work with pretty specialized logic in the LoopVectorizer and LSR to select only the cleanup aspects of LoopSimplify that they need. And it wouldn't even have any utility in LSR. > > Unfortunately, the loop pass manager makes doing this incrementally hard. The loop vectorizer runs on the inner most loop, makes in not-simplified. We then pop out to the outer loop and verify everything. The now-function-pass LoopSimplify fails its verification because it verifies *all* of the loops in the function, even though the inner loops aren't going to be revisited. > > My suggestion is to disable the whole-function verification in LoopSimplify while we're fixing this. We can instead have LCSSA verify the specific loop it is processing which will give us the same safety guarantees. Does that sound OK?Yes, in general we cannot rely on LoopSimplify for correctness since it’s not always possible to put loops in canonical form. Passes should check isSimplifiedForm, so it should only be a question of missed optimization.> That will let me do the following: > > 1) Hoist LoopSimplify to a function pass > 2) Hoist LCSSA to a function pass > 3) Make the LoopVectorizer a function pass > 4) Make IVUsers a utility of LSR, run on-demand for the loop that LSR is processing (no pass manager involvement) > 5) Make LSR a function passGreat.> As a consequence of this, we will actually fix the long-standing issue of LoopSimplify vs. ScalarEvolution, and cut the number of runs of LoopSimplify and LCSSA by half (!!!) so I think this is worth doing.This was a really annoying issue that I spent a lot of time on in the past. So thanks.> I have a patch that does #1 already, but wanted to check that you're OK weakening the verification. Otherwise, I have to do 1, 2, 3, and 5 in a single commit, or teach the LoopVectorizer and LSR to preserve LoopSimplify... Yuck.Yes. Thank you. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140122/86b25f3f/attachment.html>
Andrew Trick
2014-Jan-22 20:09 UTC
[LLVMdev] Why should we have the LoopPass and LoopPassManager? Can we get rid of this complexity?
On Jan 22, 2014, at 4:01 AM, Chandler Carruth <chandlerc at gmail.com> wrote:> > On Wed, Jan 22, 2014 at 3:39 AM, Chandler Carruth <chandlerc at gmail.com> wrote: > I have a patch that does #1 already, but wanted to check that you're OK weakening the verification. Otherwise, I have to do 1, 2, 3, and 5 in a single commit, or teach the LoopVectorizer and LSR to preserve LoopSimplify... Yuck. > > This patch appears to cause slight changes in three test cases: > > LLVM :: Transforms/IndVarSimplify/lftr-reuse.ll > LLVM :: Transforms/LoopSimplify/ashr-crash.ll > LLVM :: Transforms/LoopStrengthReduce/2011-12-19-PostincQuadratic.ll > > Looking at lftr-reuse.ll, we successfully hoist the 'icmp slt' into the 'outer:' block as the comment says would be nice (because the outer loop is simplified now, the test is checking for unsimplified). The LSR failure is just that the loop basic blocks have different names (loopexit instead of preheader). > > The ashr-crash.ll case is minutely interesting -- we fail to hoist the comparison that the test wants hoisted into the entry block. My suspicion is that getting this to hoist with the heavily reduced pipeline used is problematic, as the test seems more geared to tickle SCEV bugs than test important optimization invariants.This looks like an example of the kind of order-of-loop-transform problems that I spent a staggering amount of time debugging. So the underlying problem should go away. That said, it also looks like an example where we benefit from applying multiple passes to the inner loop before optimizing the outer loop. If you want to file a PR and disable it for now that’s cool. -Andy> All of the other regression tests pass, so this looks pretty good to me. Let me know! I can mail the patch tomorrow if it helps.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140122/ee6f3e79/attachment.html>