thr3ads.net - llvm dev - [LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2013-Jun-25 13:14 UTC

[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

----- Original Message -----> 
> 
> 
> On Jun 24, 2013, at 4:24 PM, Hal Finkel < hfinkel at anl.gov > wrote:
> 
> 
> 
> 
> Indvars should ideally preserve NSW flags whenever possible. However,
> we don't want to rely on SCEV to preserve them. SCEV expressions are
> implicitly reassociated and uniqued in a flow-insensitive universe
> independent of the def-use chain of values. SCEV simply can't
> represent the flags in most cases. I think the only flag that makes
> sense in SCEV is the no-wrap flag on a recurrence (that's
> independent of signed/unsigned overflow).
> 
> Why can't SCEV keep a flow sensitive (effectively per-BB) map of
> expressions and their original flags (and perhaps cached deduced
> flags)? It seems like this problem is solvable within SCEV.
> 
> -Hal
> 
> I think you would have to track all the uses of each expression...
Yes, but SCEV's are flow insensitive, normalized, and uniqued, and we only
need to keep track of the original flagged instructions (specifically the parent
basic blocks). Combined with some caching of subsequent flags for derived
expressions I think this may even be not too expensive.
> 
> 
> 
> You can turn SCEV into an IR and inherit all the representational
> problems inherent in llvm IR. Or you can use it as an analysis that
> efficiently reprents only the mathematical properties of an
> expression and is simple to work with. I think it's fantastic as an
> analysis but really stinks at being an IR.
Agreed. Unfortunately, we do need to keep track of wrapping properties in order
to generate correct code in all cases (assuming we don't want to be unduly
pessimistic). I don't think that enhancing SE to deal with this will
automatically drag in all of the messiness of a full IR (especially if we can
just keep track of the necessary flag information 'on the side'). I also
don't like the idea of writing mini-SEs all over the place to work around
problems in SE.

Regardless, I take it that you feel that I'm off-base in wanting SE to deal
with nsw in some general sense, but I don't see why. As you've pointed
out to me, at least some SE computations assume nsw even when perhaps they
shouldn't (like the backedge-taken counts). That combined with the fact that
SE is dropping the flags, causing problems for downstream passes, seems like a
good motivation for tracking them within SE.

A more general question: Currently, when we have (i +(nsw) 5) in SE, etc.
implicitly assumes that (i + 25) also does not wrap, correct? If that's
right, then while this seems to follow the C model, I don't think that it is
specified in the language reference.

Thanks again,
Hal
> 
> 
> -Andy
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Andrew Trick

2013-Jun-26 15:14 UTC

head link

[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

Sent from my iPhone...

On Jun 25, 2013, at 8:14 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
>> 
>> 
>> 
>> On Jun 24, 2013, at 4:24 PM, Hal Finkel < hfinkel at anl.gov >
wrote:
>> 
>> 
>> 
>> 
>> Indvars should ideally preserve NSW flags whenever possible. However,
>> we don't want to rely on SCEV to preserve them. SCEV expressions
are
>> implicitly reassociated and uniqued in a flow-insensitive universe
>> independent of the def-use chain of values. SCEV simply can't
>> represent the flags in most cases. I think the only flag that makes
>> sense in SCEV is the no-wrap flag on a recurrence (that's
>> independent of signed/unsigned overflow).
>> 
>> Why can't SCEV keep a flow sensitive (effectively per-BB) map of
>> expressions and their original flags (and perhaps cached deduced
>> flags)? It seems like this problem is solvable within SCEV.
>> 
>> -Hal
>> 
>> I think you would have to track all the uses of each expression...
> 
> Yes, but SCEV's are flow insensitive, normalized, and uniqued, and we
only need to keep track of the original flagged instructions (specifically the
parent basic blocks). Combined with some caching of subsequent flags for derived
expressions I think this may even be not too expensive.
Clients of SCEV need SCEV to Value mappings. But do we really need to preserve
it across passes?  Invalidation is a nasty problem. We already have issues with
SCEVUnknown invalidation.
>> You can turn SCEV into an IR and inherit all the representational
>> problems inherent in llvm IR. Or you can use it as an analysis that
>> efficiently reprents only the mathematical properties of an
>> expression and is simple to work with. I think it's fantastic as an
>> analysis but really stinks at being an IR.
> 
> Agreed. Unfortunately, we do need to keep track of wrapping properties in
order to generate correct code in all cases (assuming we don't want to be
unduly pessimistic). I don't think that enhancing SE to deal with this will
automatically drag in all of the messiness of a full IR (especially if we can
just keep track of the necessary flag information 'on the side'). I also
don't like the idea of writing mini-SEs all over the place to work around
problems in SE.
> 
> Regardless, I take it that you feel that I'm off-base in wanting SE to
deal with nsw in some general sense, but I don't see why. As you've
pointed out to me, at least some SE computations assume nsw even when perhaps
they shouldn't (like the backedge-taken counts). That combined with the fact
that SE is dropping the flags, causing problems for downstream passes, seems
like a good motivation for tracking them within SE.
SE "drops" flags internally because we don't have a safe way to
express them. But why is Indvars dropping the IR flags? We should:
1) determine if its safe for Indvars to preserve nsw in the cases we miss now   
2) if not we may want a loop level analysis to preserve certain facts like trip
count (hopefully not needed)
3) defer parts of Indvars that are destructive or even reorganize loop passes.
E.g.linear function test replace could as late as LSR.
> A more general question: Currently, when we have (i +(nsw) 5) in SE, etc.
implicitly assumes that (i + 25) also does not wrap, correct? If that's
right, then while this seems to follow the C model, I don't think that it is
specified in the language reference.
Good point. I'm sure we make that assumption in places. 
Andy
> Thanks again,
> Hal
> 
>> 
>> 
>> -Andy
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory

Arnold Schwaighofer

2013-Jun-26 16:14 UTC

head link

[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

On Jun 26, 2013, at 10:14 AM, Andrew Trick <atrick at apple.com>
wrote:> On Jun 25, 2013, at 8:14 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
>> ----- Original Message -----
> SE "drops" flags internally because we don't have a safe way
to express them. But why is Indvars dropping the IR flags? We should:
> 1) determine if its safe for Indvars to preserve nsw in the cases we miss
now
> 2) if not we may want a loop level analysis to preserve certain facts like
trip count (hopefully not needed)
> 3) defer parts of Indvars that are destructive or even reorganize loop
passes. E.g.linear function test replace could as late as LSR.

So that I remember when I come back to work on this let me give an example why
currently just moving loop-vectorize would result in us missing some
opportunities (without fixing stuff). Note, that I am not saying that this is
not fixable:
For example, by doing some additional analysis on the accesses - we do know the
loop bounds after all in the example below - so we can just ask SCEV whether
“2*60+1 < max_i32” in the loop vectorizer. In the case where we don’t
statically know that this is true we could emit runtime checks. Maybe we can
also fix the widen'ing of induction variables that IndVars is performing
that is dropping the flags.

I just want want to point out that when we do our experiments now - without
fixing this - this is a source of potential lost opportunities.

Having said that here is an example of this issue that we will run into right
now if we just move the vectorizer to later:

struct {
  int A[128];
  int B[128];
} S;

void test() {
  for (int i = 0; i < 60; ++i) {
    S.A[2*i] = S.A[2*i+1];
  }
}

Currently, if we run “clang -O3 -debug-only=loop-vectorize"

The cached SCEV info that we get tells us that the two accesses don’t wrap (the
vectorizer needs this information to know it is safe to perform vectorization):

LV: Src Scev: {(4 + @S),+,8}<nsw><%for.body>Sink Scev:
{@S,+,8}<nsw><%for.body>(Induction step: 2)

However the IR coming into the vectorizer looks somewhat like the following:


==target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.8.0"
%struct.anon = type { [128 x i32], [128 x i32] }

@S = common global %struct.anon zeroinitializer, align 4

; Function Attrs: nounwind ssp uwtable
define void @test() #0 {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %0 = shl nsw i64 %indvars.iv, 1
  %1 = or i64 %0, 1
  %arrayidx = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64 %1
  %2 = load i32* %arrayidx, align 4, !tbaa !0
  %arrayidx3 = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64 %0
  store i32 %2, i32* %arrayidx3, align 4, !tbaa !0
  %indvars.iv.next = add i64 %indvars.iv, 1
  %lftr.wideiv = trunc i64 %indvars.iv.next to i32
  %exitcond = icmp ne i32 %lftr.wideiv, 60
  br i1 %exitcond, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

attributes #0 = { nounwind ssp uwtable
"less-precise-fpmad"="false"
"no-frame-pointer-elim"="true"
"no-frame-pointer-elim-non-leaf"="true"
"no-infs-fp-math"="false"
"no-nans-fp-math"="false"
"unsafe-fp-math"="false"
"use-soft-float"="false" }

!0 = metadata !{metadata !"int", metadata !1}
!1 = metadata !{metadata !"omnipotent char", metadata !2}
!2 = metadata !{metadata !"Simple C/C++ TBAA”}
==
If we run opt -loop-vectorize < IR.ll we will not vectorize the loop because
SCEV now does not deduce that the accesses don’t wrap:

LV: {(4 + @S),+,8}<%for.body>Sink Scev:
{@S,+,8}<%for.body>(Induction step: 0)



The point where we loose the flag is in Indvars when we widen the type from
i32->i64:

*** IR Dump After Loop-Closed SSA Form Pass ***
for.body:                                         ; preds = %entry, %for.body
  %i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
  %mul = shl nsw i32 %i.08, 1
  %add7 = or i32 %mul, 1
  %idxprom = sext i32 %add7 to i64
  %arrayidx = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64
%idxprom
  %0 = load i32* %arrayidx, align 4, !tbaa !0
  %idxprom2 = sext i32 %mul to i64
  %arrayidx3 = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64
%idxprom2
  store i32 %0, i32* %arrayidx3, align 4, !tbaa !0
  %inc = add nsw i32 %i.08, 1   <<< // Flag present.
  %cmp = icmp slt i32 %inc, 60
  br i1 %cmp, label %for.body, label %for.end

Dump just before LFTR:

define void @test(i32 %N) #0 {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
  %0 = shl nsw i64 %indvars.iv, 1
  %mul = shl nsw i32 %i.08, 1
  %1 = or i64 %0, 1
  %add7 = or i32 %mul, 1
  %idxprom = sext i32 %add7 to i64
  %arrayidx = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64 %1
  %2 = load i32* %arrayidx, align 4, !tbaa !0
  %idxprom2 = sext i32 %mul to i64
  %arrayidx3 = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64 %0
  store i32 %2, i32* %arrayidx3, align 4, !tbaa !0
  %indvars.iv.next = add i64 %indvars.iv, 1  <<< // LOST
  %inc = add nsw i32 %i.08, 1
  %3 = trunc i64 %indvars.iv.next to i32
  %cmp = icmp slt i32 %3, 60
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

*** IR Dump After Induction Variable Simplification ***
for.body:                                         ; preds = %entry, %for.body
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %0 = shl nsw i64 %indvars.iv, 1
  %1 = or i64 %0, 1
  %arrayidx = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64 %1
  %2 = load i32* %arrayidx, align 4, !tbaa !0
  %arrayidx3 = getelementptr inbounds %struct.anon* @S, i64 0, i32 0, i64 %0
  store i32 %2, i32* %arrayidx3, align 4, !tbaa !0
  %indvars.iv.next = add i64 %indvars.iv, 1 <<< // HERE
  %lftr.wideiv = trunc i64 %indvars.iv.next to i32
  %exitcond = icmp ne i32 %lftr.wideiv, 60
  br i1 %exitcond, label %for.body, label %for.end

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Jun 2013 - [LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

[LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its

Seemingly Similar Threads