thr3ads.net - llvm dev - [LLVMdev] Question about NoWrap flag for SCEVAddRecExpr [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Tong Chen

2015-Jun-10 13:21 UTC

[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr

I am testing vectorization on the following test case:

float x[1024], y[1024];

void myloop1() {
        for (long int k = 0; k < 512; k++) {
           x[2*k] = x[2*k]+y[k];
        }
}

Vectorization failed due to "unsafe dependent memory operation". I
traced
the LoopAccessAnalysis.cpp and found the reason is the NoWrapFlag for 
SCEVAddRecExpr is not set and consequently the dependence distant became 
unknown. 

Can anyone familiar with ScalarRevolution tell me whether this is an 
expected behavior or a bug? 

Tong
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150610/7949fbac/attachment.html>

Sanjoy Das

2015-Jun-10 20:29 UTC

head link

[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr

[+CC Andy]
> Can anyone familiar with ScalarRevolution tell me whether this is an
> expected behavior or a bug?
Assuming you're talking about 2*k, this is a bug.  ScalarEvolution
should be able to prove that {0,+,4} is <nsw> and <nuw>.

Part of the problem is that in this case ScalarEvolution does not try
to prove that {0,+,4} is <nsw> when the expression is constructed
(since proving that has non-trivial cost) [1].  To get ScalarEvolution
to try to prove that {0,+,4} has no-wrap properties, the client needs
to construct a sign-extend expression of {0,+,4}.  SCEV will try to
change a sext of an add-rec to an add-rec of sexts and try to prove
no-wrap in the process [2].

To easily do this from IR, you can just add a dummy sext instruction
(like in the IR fragment below) and run
'opt -analyze -scalar-evolution -scalar-evolution' (just running SCEV
won't dce the unused instruction):

  ; ModuleID = '<stdin>'
  target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
  target triple = "x86_64-apple-macosx10.10.0"

  @x = common global [1024 x float] zeroinitializer, align 16
  @y = common global [1024 x float] zeroinitializer, align 16

  ; Function Attrs: nounwind ssp uwtable
  define void @myloop1() {
  bb:
    br label %bb2

  bb1:                                              ; preds = %bb2
    ret void

  bb2:                                              ; preds = %bb2, %bb
    %k.01 = phi i64 [ 0, %bb ], [ %tmp15, %bb2 ]
    %tmp = shl nsw i64 %k.01, 1
    %dummy_sext = sext i64 %tmp to i128
    %tmp3 = getelementptr inbounds [1024 x float], [1024 x float]* @x,
i64 0, i64 %tmp
    %tmp4 = load float, float* %tmp3, align 16, !tbaa !2
    %tmp5 = getelementptr inbounds [1024 x float], [1024 x float]* @y,
i64 0, i64 %k.01
    %tmp6 = load float, float* %tmp5, align 8, !tbaa !2
    %tmp7 = fadd float %tmp4, %tmp6
    store float %tmp7, float* %tmp3, align 16, !tbaa !2
    %tmp8 = or i64 %k.01, 1
    %tmp9 = shl nsw i64 %tmp8, 1
    %tmp10 = getelementptr inbounds [1024 x float], [1024 x float]*
@x, i64 0, i64 %tmp9
    %tmp11 = load float, float* %tmp10, align 8, !tbaa !2
    %tmp12 = getelementptr inbounds [1024 x float], [1024 x float]*
@y, i64 0, i64 %tmp8
    %tmp13 = load float, float* %tmp12, align 4, !tbaa !2
    %tmp14 = fadd float %tmp11, %tmp13
    store float %tmp14, float* %tmp10, align 8, !tbaa !2
    %tmp15 = add nsw i64 %k.01, 2
    %exitcond.1 = icmp eq i64 %tmp15, 512
    br i1 %exitcond.1, label %bb1, label %bb2
  }

  !0 = !{i32 1, !"PIC Level", i32 2}
  !1 = !{!"clang version 3.7.0 (clang-stage2-configure-Rlto_build
239114)"}
  !2 = !{!3, !3, i64 0}
  !3 = !{!"float", !4, i64 0}
  !4 = !{!"omnipotent char", !5, i64 0}
  !5 = !{!"Simple C/C++ TBAA"}


However, adding a dummy sext only proves <nuw> for {0,+,4} and not
<nsw>.  The problem is that when constructing sext({0,+,4}) SCEV
realizes that since {0,+,4} is always positive, sext({0,+,4}) =zext({0,+,4});
and to change a zext of an add-rec to an add-rec of
zexts, SCEV only needs to prove <nuw> and not <nsw>.


I wonder if it makes sense to add a hook to SCEV that gets it to try
as hard as it can to prove <nuw> / <nsw> for specific add
recurrences.



[1]: It will try to prove nuw and nsw in cases where it is "easy", but
     not in this specific case.

[2]: So a worthwhile project is to have the vectorizer construct sign
     extend expressions of add recurrences that it really cares about
     proving no-wrap of and then check the flags on the
     SCEVAddRecExpr.  It may consume too much compile time, so there's
     a tricky trade-off here.


-- Sanjoy

Gerolf Hoflehner

2015-Jun-10 23:29 UTC

head link

[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr

Hi Sanjoy,

Interesting. I took a closer look at SCEV when I saw Tong’s example. I’m curious
about "[1]: It will try to prove nuw and nsw in cases where it is
"easy", but not in this specific case.” Could you describe “easy” and
“difficult” algorithmically?

Thanks
Gerolf
> On Jun 10, 2015, at 1:29 PM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> 
> [+CC Andy]
> 
>> Can anyone familiar with ScalarRevolution tell me whether this is an
>> expected behavior or a bug?
> 
> Assuming you're talking about 2*k, this is a bug.  ScalarEvolution
> should be able to prove that {0,+,4} is <nsw> and <nuw>.
> 
> Part of the problem is that in this case ScalarEvolution does not try
> to prove that {0,+,4} is <nsw> when the expression is constructed
> (since proving that has non-trivial cost) [1].  To get ScalarEvolution
> to try to prove that {0,+,4} has no-wrap properties, the client needs
> to construct a sign-extend expression of {0,+,4}.  SCEV will try to
> change a sext of an add-rec to an add-rec of sexts and try to prove
> no-wrap in the process [2].
> 
> To easily do this from IR, you can just add a dummy sext instruction
> (like in the IR fragment below) and run
> 'opt -analyze -scalar-evolution -scalar-evolution' (just running
SCEV
> won't dce the unused instruction):
> 
>  ; ModuleID = '<stdin>'
>  target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
>  target triple = "x86_64-apple-macosx10.10.0"
> 
>  @x = common global [1024 x float] zeroinitializer, align 16
>  @y = common global [1024 x float] zeroinitializer, align 16
> 
>  ; Function Attrs: nounwind ssp uwtable
>  define void @myloop1() {
>  bb:
>    br label %bb2
> 
>  bb1:                                              ; preds = %bb2
>    ret void
> 
>  bb2:                                              ; preds = %bb2, %bb
>    %k.01 = phi i64 [ 0, %bb ], [ %tmp15, %bb2 ]
>    %tmp = shl nsw i64 %k.01, 1
>    %dummy_sext = sext i64 %tmp to i128
>    %tmp3 = getelementptr inbounds [1024 x float], [1024 x float]* @x,
> i64 0, i64 %tmp
>    %tmp4 = load float, float* %tmp3, align 16, !tbaa !2
>    %tmp5 = getelementptr inbounds [1024 x float], [1024 x float]* @y,
> i64 0, i64 %k.01
>    %tmp6 = load float, float* %tmp5, align 8, !tbaa !2
>    %tmp7 = fadd float %tmp4, %tmp6
>    store float %tmp7, float* %tmp3, align 16, !tbaa !2
>    %tmp8 = or i64 %k.01, 1
>    %tmp9 = shl nsw i64 %tmp8, 1
>    %tmp10 = getelementptr inbounds [1024 x float], [1024 x float]*
> @x, i64 0, i64 %tmp9
>    %tmp11 = load float, float* %tmp10, align 8, !tbaa !2
>    %tmp12 = getelementptr inbounds [1024 x float], [1024 x float]*
> @y, i64 0, i64 %tmp8
>    %tmp13 = load float, float* %tmp12, align 4, !tbaa !2
>    %tmp14 = fadd float %tmp11, %tmp13
>    store float %tmp14, float* %tmp10, align 8, !tbaa !2
>    %tmp15 = add nsw i64 %k.01, 2
>    %exitcond.1 = icmp eq i64 %tmp15, 512
>    br i1 %exitcond.1, label %bb1, label %bb2
>  }
> 
>  !0 = !{i32 1, !"PIC Level", i32 2}
>  !1 = !{!"clang version 3.7.0 (clang-stage2-configure-Rlto_build
239114)"}
>  !2 = !{!3, !3, i64 0}
>  !3 = !{!"float", !4, i64 0}
>  !4 = !{!"omnipotent char", !5, i64 0}
>  !5 = !{!"Simple C/C++ TBAA"}
> 
> 
> However, adding a dummy sext only proves <nuw> for {0,+,4} and not
> <nsw>.  The problem is that when constructing sext({0,+,4}) SCEV
> realizes that since {0,+,4} is always positive, sext({0,+,4}) =>
zext({0,+,4}); and to change a zext of an add-rec to an add-rec of
> zexts, SCEV only needs to prove <nuw> and not <nsw>.
> 
> 
> I wonder if it makes sense to add a hook to SCEV that gets it to try
> as hard as it can to prove <nuw> / <nsw> for specific add
recurrences.
> 
> 
> 
> [1]: It will try to prove nuw and nsw in cases where it is
"easy", but
>     not in this specific case.
> 
> [2]: So a worthwhile project is to have the vectorizer construct sign
>     extend expressions of add recurrences that it really cares about
>     proving no-wrap of and then check the flags on the
>     SCEVAddRecExpr.  It may consume too much compile time, so there's
>     a tricky trade-off here.
> 
> 
> -- Sanjoy
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Adam Nemet

2015-Jun-11 00:29 UTC

head link

[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr

[+Arnold]
> On Jun 10, 2015, at 1:29 PM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> 
> [+CC Andy]
> 
>> Can anyone familiar with ScalarRevolution tell me whether this is an
>> expected behavior or a bug?
> 
> Assuming you're talking about 2*k, this is a bug.  ScalarEvolution
> should be able to prove that {0,+,4} is <nsw> and <nuw>.
I also find it surprising that the inbounds gep does not allow us to prove nuw
of the pointer here.  LAA has logic for this.

Not to mention that we’re trying to figure out the distance of x[2*k] against
*itself* which should be zero regardless of wrapping.

A probably more interesting case is x[2*k] = x[2*k+2] + … which would require
non-wrapping.

Looks like we only consider an inbounds gep non-wrapping if it uses a unit
stride.  I am not sure why…

Adam
> 
> Part of the problem is that in this case ScalarEvolution does not try
> to prove that {0,+,4} is <nsw> when the expression is constructed
> (since proving that has non-trivial cost) [1].  To get ScalarEvolution
> to try to prove that {0,+,4} has no-wrap properties, the client needs
> to construct a sign-extend expression of {0,+,4}.  SCEV will try to
> change a sext of an add-rec to an add-rec of sexts and try to prove
> no-wrap in the process [2].
> 
> To easily do this from IR, you can just add a dummy sext instruction
> (like in the IR fragment below) and run
> 'opt -analyze -scalar-evolution -scalar-evolution' (just running
SCEV
> won't dce the unused instruction):
> 
>  ; ModuleID = '<stdin>'
>  target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
>  target triple = "x86_64-apple-macosx10.10.0"
> 
>  @x = common global [1024 x float] zeroinitializer, align 16
>  @y = common global [1024 x float] zeroinitializer, align 16
> 
>  ; Function Attrs: nounwind ssp uwtable
>  define void @myloop1() {
>  bb:
>    br label %bb2
> 
>  bb1:                                              ; preds = %bb2
>    ret void
> 
>  bb2:                                              ; preds = %bb2, %bb
>    %k.01 = phi i64 [ 0, %bb ], [ %tmp15, %bb2 ]
>    %tmp = shl nsw i64 %k.01, 1
>    %dummy_sext = sext i64 %tmp to i128
>    %tmp3 = getelementptr inbounds [1024 x float], [1024 x float]* @x,
> i64 0, i64 %tmp
>    %tmp4 = load float, float* %tmp3, align 16, !tbaa !2
>    %tmp5 = getelementptr inbounds [1024 x float], [1024 x float]* @y,
> i64 0, i64 %k.01
>    %tmp6 = load float, float* %tmp5, align 8, !tbaa !2
>    %tmp7 = fadd float %tmp4, %tmp6
>    store float %tmp7, float* %tmp3, align 16, !tbaa !2
>    %tmp8 = or i64 %k.01, 1
>    %tmp9 = shl nsw i64 %tmp8, 1
>    %tmp10 = getelementptr inbounds [1024 x float], [1024 x float]*
> @x, i64 0, i64 %tmp9
>    %tmp11 = load float, float* %tmp10, align 8, !tbaa !2
>    %tmp12 = getelementptr inbounds [1024 x float], [1024 x float]*
> @y, i64 0, i64 %tmp8
>    %tmp13 = load float, float* %tmp12, align 4, !tbaa !2
>    %tmp14 = fadd float %tmp11, %tmp13
>    store float %tmp14, float* %tmp10, align 8, !tbaa !2
>    %tmp15 = add nsw i64 %k.01, 2
>    %exitcond.1 = icmp eq i64 %tmp15, 512
>    br i1 %exitcond.1, label %bb1, label %bb2
>  }
> 
>  !0 = !{i32 1, !"PIC Level", i32 2}
>  !1 = !{!"clang version 3.7.0 (clang-stage2-configure-Rlto_build
239114)"}
>  !2 = !{!3, !3, i64 0}
>  !3 = !{!"float", !4, i64 0}
>  !4 = !{!"omnipotent char", !5, i64 0}
>  !5 = !{!"Simple C/C++ TBAA"}
> 
> 
> However, adding a dummy sext only proves <nuw> for {0,+,4} and not
> <nsw>.  The problem is that when constructing sext({0,+,4}) SCEV
> realizes that since {0,+,4} is always positive, sext({0,+,4}) =>
zext({0,+,4}); and to change a zext of an add-rec to an add-rec of
> zexts, SCEV only needs to prove <nuw> and not <nsw>.
> 
> 
> I wonder if it makes sense to add a hook to SCEV that gets it to try
> as hard as it can to prove <nuw> / <nsw> for specific add
recurrences.
> 
> 
> 
> [1]: It will try to prove nuw and nsw in cases where it is
"easy", but
>     not in this specific case.
> 
> [2]: So a worthwhile project is to have the vectorizer construct sign
>     extend expressions of add recurrences that it really cares about
>     proving no-wrap of and then check the flags on the
>     SCEVAddRecExpr.  It may consume too much compile time, so there's
>     a tricky trade-off here.
> 
> 
> -- Sanjoy
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

llvm dev - Jun 2015 - [LLVMdev] Question about NoWrap flag for SCEVAddRecExpr

[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr

[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr

[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr

[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr