Hal Finkel
2013-Jun-24 23:24 UTC
[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its
----- Original Message -----> > > > > On Jun 24, 2013, at 11:01 AM, Arnold Schwaighofer < > aschwaighofer at apple.com > wrote: > > > > Hi, > > I wanted to start a discussion about the following issue since I am > not sure about what to do about it: > > The loop-vectorizer has the potential to make code a quite a bit > bigger (esp. in cases where we don’t know the loop count or whether > pointers alias). > Chandler has observed this in snappy where we have a simple memory > copying loop (that may overlap). > > We vectorize this loop and then this loop gets inlined into a > function and prevents this function from getting inlined again. > Causing a significant(?) degradation. > > https://code.google.com/p/snappy/source/browse/trunk/snappy.cc#99 > > We have seen some good performance benefits from vectorizing such > loops. So not vectorizing them is not really a good option I think. > > In < > http://llvm.org/viewvc/llvm-project?view=revision&revision=184698 > > Chandler introduced a flag so that we can run the vectorizer after > all CG passes. This would prevent the inline from seeing the > vectorized code. > > I see some potential issues: > > * We run a loop pass later again with the associated compile time > cost (?) > > > > I want to move loop opts that depend on target info later, outside of > CGSCC: definitely indvars, vectorize/partial unroll. That way we > only inline canonical code and have a clean break between canonical > and lowering passes. Hopefully inlining heuristics will be adequate > without first running these passes. For the most part, I think it's > as simple as inlining first with high-level heuristics, then > lowering later. > > > > > * The vectorizer relies on cleanup passes to run afterwards: dce, > instsimplify, simplifycfg, maybe some form of redundancy elimination > If we run later we have to run those passes again increasing compile > time OR > We have to duplicate them in the loop vectorizer increasing its > complexity > > > > > We'll have to handle this case-by-case as we gradually move passes > around. But the general idea is that lowering passes like the > vectorizer should clean up after themselves as much as feasible > (whereas canonicalization passes should not need to). We should be > developing utilities to cleanup redundancies incrementally. A value > number utility would make sense. Of course, if a very light-weight > pass can simply be rescheduled to run again to do the cleanup then > we don't need a cleanup utility. > > > > * The vectorizer would like SCEV analysis to be as precise as > possible: one reason are dependency checks that want to know that > expressions cannot wrap (AddRec expressions to be more specific): > At the moment, indvars will remove those flags in some cases which > currently is not a problem because SCEV analysis still has the old > values cached (except in the case that Chandler mention to me on IRC > where we inline a function - in which case that info is lost). My > understanding of this is that this is not really something we can > fix easily because of the way that SCEV works > (unique-ifying/commoning expressions and thereby dropping the > flags). > A potential solution would be to move indvars to later. The question > is do other loop passes which simplify IR depend on indvars? Andy > what is your take on this? > > > > Indvars should ideally preserve NSW flags whenever possible. However, > we don't want to rely on SCEV to preserve them. SCEV expressions are > implicitly reassociated and uniqued in a flow-insensitive universe > independent of the def-use chain of values. SCEV simply can't > represent the flags in most cases. I think the only flag that makes > sense in SCEV is the no-wrap flag on a recurrence (that's > independent of signed/unsigned overflow).Why can't SCEV keep a flow sensitive (effectively per-BB) map of expressions and their original flags (and perhaps cached deduced flags)? It seems like this problem is solvable within SCEV. -Hal> > > As long as indvars does not rely on SCEVExpander it should be able to > preserve the flags. Unfortunately, it still uses SCEVExpander in a > few places. LinearFunctionTestReplace is one that should simply be > moved into LSR instead. For the couple other cases, we'll just have > to work on alternative implementations that don't drop flags, but I > think it's worth doing. > > > That said, we should try not to rely on NSW at all unless clearly > necessary. It introduces nasty complexity that needs to be well > justified. e.g. in the vectorized loop preheader we should > explicitly check for wrapping and only try to optimize those checks > using NSW if we have data that indicates it's really necessary. > > > -Andy > > > > > > The benefit of vectorizing later is that we would have more context > at the inlined call site. And it would solve the problem of the > inliner seeing vectorized code. > > What do you all think? > > > On Jun 24, 2013, at 2:21 AM, Chandler Carruth < chandlerc at gmail.com > > wrote: > > > > Adding this based on a discussion with Arnold and it seems at least > worth having this flag for us to both run some experiments to see if > this strategy is workable. It may solve some of the regressions seen > with the loop vectorizer. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Andrew Trick
2013-Jun-25 01:21 UTC
[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its
On Jun 24, 2013, at 4:24 PM, Hal Finkel <hfinkel at anl.gov> wrote:>> Indvars should ideally preserve NSW flags whenever possible. However, >> we don't want to rely on SCEV to preserve them. SCEV expressions are >> implicitly reassociated and uniqued in a flow-insensitive universe >> independent of the def-use chain of values. SCEV simply can't >> represent the flags in most cases. I think the only flag that makes >> sense in SCEV is the no-wrap flag on a recurrence (that's >> independent of signed/unsigned overflow). > > Why can't SCEV keep a flow sensitive (effectively per-BB) map of expressions and their original flags (and perhaps cached deduced flags)? It seems like this problem is solvable within SCEV. > > -HalI think you would have to track all the uses of each expression... You can turn SCEV into an IR and inherit all the representational problems inherent in llvm IR. Or you can use it as an analysis that efficiently reprents only the mathematical properties of an expression and is simple to work with. I think it's fantastic as an analysis but really stinks at being an IR. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130624/46290639/attachment.html>
Hal Finkel
2013-Jun-25 13:14 UTC
[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its
----- Original Message -----> > > > On Jun 24, 2013, at 4:24 PM, Hal Finkel < hfinkel at anl.gov > wrote: > > > > > Indvars should ideally preserve NSW flags whenever possible. However, > we don't want to rely on SCEV to preserve them. SCEV expressions are > implicitly reassociated and uniqued in a flow-insensitive universe > independent of the def-use chain of values. SCEV simply can't > represent the flags in most cases. I think the only flag that makes > sense in SCEV is the no-wrap flag on a recurrence (that's > independent of signed/unsigned overflow). > > Why can't SCEV keep a flow sensitive (effectively per-BB) map of > expressions and their original flags (and perhaps cached deduced > flags)? It seems like this problem is solvable within SCEV. > > -Hal > > I think you would have to track all the uses of each expression...Yes, but SCEV's are flow insensitive, normalized, and uniqued, and we only need to keep track of the original flagged instructions (specifically the parent basic blocks). Combined with some caching of subsequent flags for derived expressions I think this may even be not too expensive.> > > > You can turn SCEV into an IR and inherit all the representational > problems inherent in llvm IR. Or you can use it as an analysis that > efficiently reprents only the mathematical properties of an > expression and is simple to work with. I think it's fantastic as an > analysis but really stinks at being an IR.Agreed. Unfortunately, we do need to keep track of wrapping properties in order to generate correct code in all cases (assuming we don't want to be unduly pessimistic). I don't think that enhancing SE to deal with this will automatically drag in all of the messiness of a full IR (especially if we can just keep track of the necessary flag information 'on the side'). I also don't like the idea of writing mini-SEs all over the place to work around problems in SE. Regardless, I take it that you feel that I'm off-base in wanting SE to deal with nsw in some general sense, but I don't see why. As you've pointed out to me, at least some SE computations assume nsw even when perhaps they shouldn't (like the backedge-taken counts). That combined with the fact that SE is dropping the flags, causing problems for downstream passes, seems like a good motivation for tracking them within SE. A more general question: Currently, when we have (i +(nsw) 5) in SE, etc. implicitly assumes that (i + 25) also does not wrap, correct? If that's right, then while this seems to follow the C model, I don't think that it is specified in the language reference. Thanks again, Hal> > > -Andy-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Maybe Matching Threads
- [LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its
- [LLVMdev] Scalar Evolution and Loop Trip Count.
- [LLVMdev] alias analysis on llvm internal globals
- [LLVMdev] SimplifyIndVar looses nsw flags
- loop unrolling introduces conditional branch