Hal Finkel
2013-Jun-25 13:14 UTC
[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its
----- Original Message -----> > > > On Jun 24, 2013, at 4:24 PM, Hal Finkel < hfinkel at anl.gov > wrote: > > > > > Indvars should ideally preserve NSW flags whenever possible. However, > we don't want to rely on SCEV to preserve them. SCEV expressions are > implicitly reassociated and uniqued in a flow-insensitive universe > independent of the def-use chain of values. SCEV simply can't > represent the flags in most cases. I think the only flag that makes > sense in SCEV is the no-wrap flag on a recurrence (that's > independent of signed/unsigned overflow). > > Why can't SCEV keep a flow sensitive (effectively per-BB) map of > expressions and their original flags (and perhaps cached deduced > flags)? It seems like this problem is solvable within SCEV. > > -Hal > > I think you would have to track all the uses of each expression...Yes, but SCEV's are flow insensitive, normalized, and uniqued, and we only need to keep track of the original flagged instructions (specifically the parent basic blocks). Combined with some caching of subsequent flags for derived expressions I think this may even be not too expensive.> > > > You can turn SCEV into an IR and inherit all the representational > problems inherent in llvm IR. Or you can use it as an analysis that > efficiently reprents only the mathematical properties of an > expression and is simple to work with. I think it's fantastic as an > analysis but really stinks at being an IR.Agreed. Unfortunately, we do need to keep track of wrapping properties in order to generate correct code in all cases (assuming we don't want to be unduly pessimistic). I don't think that enhancing SE to deal with this will automatically drag in all of the messiness of a full IR (especially if we can just keep track of the necessary flag information 'on the side'). I also don't like the idea of writing mini-SEs all over the place to work around problems in SE. Regardless, I take it that you feel that I'm off-base in wanting SE to deal with nsw in some general sense, but I don't see why. As you've pointed out to me, at least some SE computations assume nsw even when perhaps they shouldn't (like the backedge-taken counts). That combined with the fact that SE is dropping the flags, causing problems for downstream passes, seems like a good motivation for tracking them within SE. A more general question: Currently, when we have (i +(nsw) 5) in SE, etc. implicitly assumes that (i + 25) also does not wrap, correct? If that's right, then while this seems to follow the C model, I don't think that it is specified in the language reference. Thanks again, Hal> > > -Andy-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Andrew Trick
2013-Jun-26 15:14 UTC
[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its
Sent from my iPhone... On Jun 25, 2013, at 8:14 AM, Hal Finkel <hfinkel at anl.gov> wrote:> ----- Original Message ----- >> >> >> >> On Jun 24, 2013, at 4:24 PM, Hal Finkel < hfinkel at anl.gov > wrote: >> >> >> >> >> Indvars should ideally preserve NSW flags whenever possible. However, >> we don't want to rely on SCEV to preserve them. SCEV expressions are >> implicitly reassociated and uniqued in a flow-insensitive universe >> independent of the def-use chain of values. SCEV simply can't >> represent the flags in most cases. I think the only flag that makes >> sense in SCEV is the no-wrap flag on a recurrence (that's >> independent of signed/unsigned overflow). >> >> Why can't SCEV keep a flow sensitive (effectively per-BB) map of >> expressions and their original flags (and perhaps cached deduced >> flags)? It seems like this problem is solvable within SCEV. >> >> -Hal >> >> I think you would have to track all the uses of each expression... > > Yes, but SCEV's are flow insensitive, normalized, and uniqued, and we only need to keep track of the original flagged instructions (specifically the parent basic blocks). Combined with some caching of subsequent flags for derived expressions I think this may even be not too expensive.Clients of SCEV need SCEV to Value mappings. But do we really need to preserve it across passes? Invalidation is a nasty problem. We already have issues with SCEVUnknown invalidation.>> You can turn SCEV into an IR and inherit all the representational >> problems inherent in llvm IR. Or you can use it as an analysis that >> efficiently reprents only the mathematical properties of an >> expression and is simple to work with. I think it's fantastic as an >> analysis but really stinks at being an IR. > > Agreed. Unfortunately, we do need to keep track of wrapping properties in order to generate correct code in all cases (assuming we don't want to be unduly pessimistic). I don't think that enhancing SE to deal with this will automatically drag in all of the messiness of a full IR (especially if we can just keep track of the necessary flag information 'on the side'). I also don't like the idea of writing mini-SEs all over the place to work around problems in SE. > > Regardless, I take it that you feel that I'm off-base in wanting SE to deal with nsw in some general sense, but I don't see why. As you've pointed out to me, at least some SE computations assume nsw even when perhaps they shouldn't (like the backedge-taken counts). That combined with the fact that SE is dropping the flags, causing problems for downstream passes, seems like a good motivation for tracking them within SE.SE "drops" flags internally because we don't have a safe way to express them. But why is Indvars dropping the IR flags? We should: 1) determine if its safe for Indvars to preserve nsw in the cases we miss now 2) if not we may want a loop level analysis to preserve certain facts like trip count (hopefully not needed) 3) defer parts of Indvars that are destructive or even reorganize loop passes. E.g.linear function test replace could as late as LSR.> A more general question: Currently, when we have (i +(nsw) 5) in SE, etc. implicitly assumes that (i + 25) also does not wrap, correct? If that's right, then while this seems to follow the C model, I don't think that it is specified in the language reference.Good point. I'm sure we make that assumption in places. Andy> Thanks again, > Hal > >> >> >> -Andy > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory
Arnold Schwaighofer
2013-Jun-26 16:14 UTC
[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its
On Jun 26, 2013, at 10:14 AM, Andrew Trick <atrick at apple.com> wrote:> On Jun 25, 2013, at 8:14 AM, Hal Finkel <hfinkel at anl.gov> wrote: > >> ----- Original Message ----- > SE "drops" flags internally because we don't have a safe way to express them. But why is Indvars dropping the IR flags? We should: > 1) determine if its safe for Indvars to preserve nsw in the cases we miss now > 2) if not we may want a loop level analysis to preserve certain facts like trip count (hopefully not needed) > 3) defer parts of Indvars that are destructive or even reorganize loop passes. E.g.linear function test replace could as late as LSR.So that I remember when I come back to work on this let me give an example why currently just moving loop-vectorize would result in us missing some opportunities (without fixing stuff). Note, that I am not saying that this is not fixable: For example, by doing some additional analysis on the accesses - we do know the loop bounds after all in the example below - so we can just ask SCEV whether “2*60+1 < max_i32” in the loop vectorizer. In the case where we don’t statically know that this is true we could emit runtime checks. Maybe we can also fix the widen'ing of induction variables that IndVars is performing that is dropping the flags. I just want want to point out that when we do our experiments now - without fixing this - this is a source of potential lost opportunities. Having said that here is an example of this issue that we will run into right now if we just move the vectorizer to later: struct { int A[128]; int B[128]; } S; void test() { for (int i = 0; i < 60; ++i) { S.A[2*i] = S.A[2*i+1]; } } Currently, if we run “clang -O3 -debug-only=loop-vectorize" The cached SCEV info that we get tells us that the two accesses don’t wrap (the vectorizer needs this information to know it is safe to perform vectorization): LV: Src Scev: {(4 + @S),+,8}<nsw><%for.body>Sink Scev: {@S,+,8}<nsw><%for.body>(Induction step: 2) However the IR coming into the vectorizer looks somewhat like the following: ==target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" target triple = "x86_64-apple-macosx10.8.0" %struct.anon = type { [128 x i32], [128 x i32] } @S = common global %struct.anon zeroinitializer, align 4 ; Function Attrs: nounwind ssp uwtable define void @test() #0 { entry: br label %for.body for.body: ; preds = %entry, %for.body %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] %0 = shl nsw i64 %indvars.iv, 1 %1 = or i64 %0, 1 %arrayidx = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64 %1 %2 = load i32* %arrayidx, align 4, !tbaa !0 %arrayidx3 = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64 %0 store i32 %2, i32* %arrayidx3, align 4, !tbaa !0 %indvars.iv.next = add i64 %indvars.iv, 1 %lftr.wideiv = trunc i64 %indvars.iv.next to i32 %exitcond = icmp ne i32 %lftr.wideiv, 60 br i1 %exitcond, label %for.body, label %for.end for.end: ; preds = %for.body ret void } attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } !0 = metadata !{metadata !"int", metadata !1} !1 = metadata !{metadata !"omnipotent char", metadata !2} !2 = metadata !{metadata !"Simple C/C++ TBAA”} == If we run opt -loop-vectorize < IR.ll we will not vectorize the loop because SCEV now does not deduce that the accesses don’t wrap: LV: {(4 + @S),+,8}<%for.body>Sink Scev: {@S,+,8}<%for.body>(Induction step: 0) The point where we loose the flag is in Indvars when we widen the type from i32->i64: *** IR Dump After Loop-Closed SSA Form Pass *** for.body: ; preds = %entry, %for.body %i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ] %mul = shl nsw i32 %i.08, 1 %add7 = or i32 %mul, 1 %idxprom = sext i32 %add7 to i64 %arrayidx = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64 %idxprom %0 = load i32* %arrayidx, align 4, !tbaa !0 %idxprom2 = sext i32 %mul to i64 %arrayidx3 = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64 %idxprom2 store i32 %0, i32* %arrayidx3, align 4, !tbaa !0 %inc = add nsw i32 %i.08, 1 <<< // Flag present. %cmp = icmp slt i32 %inc, 60 br i1 %cmp, label %for.body, label %for.end Dump just before LFTR: define void @test(i32 %N) #0 { entry: br label %for.body for.body: ; preds = %entry, %for.body %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] %i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ] %0 = shl nsw i64 %indvars.iv, 1 %mul = shl nsw i32 %i.08, 1 %1 = or i64 %0, 1 %add7 = or i32 %mul, 1 %idxprom = sext i32 %add7 to i64 %arrayidx = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64 %1 %2 = load i32* %arrayidx, align 4, !tbaa !0 %idxprom2 = sext i32 %mul to i64 %arrayidx3 = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64 %0 store i32 %2, i32* %arrayidx3, align 4, !tbaa !0 %indvars.iv.next = add i64 %indvars.iv, 1 <<< // LOST %inc = add nsw i32 %i.08, 1 %3 = trunc i64 %indvars.iv.next to i32 %cmp = icmp slt i32 %3, 60 br i1 %cmp, label %for.body, label %for.end for.end: ; preds = %for.body ret void } *** IR Dump After Induction Variable Simplification *** for.body: ; preds = %entry, %for.body %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] %0 = shl nsw i64 %indvars.iv, 1 %1 = or i64 %0, 1 %arrayidx = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64 %1 %2 = load i32* %arrayidx, align 4, !tbaa !0 %arrayidx3 = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64 %0 store i32 %2, i32* %arrayidx3, align 4, !tbaa !0 %indvars.iv.next = add i64 %indvars.iv, 1 <<< // HERE %lftr.wideiv = trunc i64 %indvars.iv.next to i32 %exitcond = icmp ne i32 %lftr.wideiv, 60 br i1 %exitcond, label %for.body, label %for.end
Seemingly Similar Threads
- [LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its
- [LLVMdev] Scalar Evolution and Loop Trip Count.
- [LLVMdev] alias analysis on llvm internal globals
- [LLVMdev] SimplifyIndVar looses nsw flags
- loop unrolling introduces conditional branch