[+Arnold]> On Jun 10, 2015, at 1:29 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > [+CC Andy] > >> Can anyone familiar with ScalarRevolution tell me whether this is an >> expected behavior or a bug? > > Assuming you're talking about 2*k, this is a bug. ScalarEvolution > should be able to prove that {0,+,4} is <nsw> and <nuw>.I also find it surprising that the inbounds gep does not allow us to prove nuw of the pointer here. LAA has logic for this. Not to mention that we’re trying to figure out the distance of x[2*k] against *itself* which should be zero regardless of wrapping. A probably more interesting case is x[2*k] = x[2*k+2] + … which would require non-wrapping. Looks like we only consider an inbounds gep non-wrapping if it uses a unit stride. I am not sure why… Adam> > Part of the problem is that in this case ScalarEvolution does not try > to prove that {0,+,4} is <nsw> when the expression is constructed > (since proving that has non-trivial cost) [1]. To get ScalarEvolution > to try to prove that {0,+,4} has no-wrap properties, the client needs > to construct a sign-extend expression of {0,+,4}. SCEV will try to > change a sext of an add-rec to an add-rec of sexts and try to prove > no-wrap in the process [2]. > > To easily do this from IR, you can just add a dummy sext instruction > (like in the IR fragment below) and run > 'opt -analyze -scalar-evolution -scalar-evolution' (just running SCEV > won't dce the unused instruction): > > ; ModuleID = '<stdin>' > target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" > target triple = "x86_64-apple-macosx10.10.0" > > @x = common global [1024 x float] zeroinitializer, align 16 > @y = common global [1024 x float] zeroinitializer, align 16 > > ; Function Attrs: nounwind ssp uwtable > define void @myloop1() { > bb: > br label %bb2 > > bb1: ; preds = %bb2 > ret void > > bb2: ; preds = %bb2, %bb > %k.01 = phi i64 [ 0, %bb ], [ %tmp15, %bb2 ] > %tmp = shl nsw i64 %k.01, 1 > %dummy_sext = sext i64 %tmp to i128 > %tmp3 = getelementptr inbounds [1024 x float], [1024 x float]* @x, > i64 0, i64 %tmp > %tmp4 = load float, float* %tmp3, align 16, !tbaa !2 > %tmp5 = getelementptr inbounds [1024 x float], [1024 x float]* @y, > i64 0, i64 %k.01 > %tmp6 = load float, float* %tmp5, align 8, !tbaa !2 > %tmp7 = fadd float %tmp4, %tmp6 > store float %tmp7, float* %tmp3, align 16, !tbaa !2 > %tmp8 = or i64 %k.01, 1 > %tmp9 = shl nsw i64 %tmp8, 1 > %tmp10 = getelementptr inbounds [1024 x float], [1024 x float]* > @x, i64 0, i64 %tmp9 > %tmp11 = load float, float* %tmp10, align 8, !tbaa !2 > %tmp12 = getelementptr inbounds [1024 x float], [1024 x float]* > @y, i64 0, i64 %tmp8 > %tmp13 = load float, float* %tmp12, align 4, !tbaa !2 > %tmp14 = fadd float %tmp11, %tmp13 > store float %tmp14, float* %tmp10, align 8, !tbaa !2 > %tmp15 = add nsw i64 %k.01, 2 > %exitcond.1 = icmp eq i64 %tmp15, 512 > br i1 %exitcond.1, label %bb1, label %bb2 > } > > !0 = !{i32 1, !"PIC Level", i32 2} > !1 = !{!"clang version 3.7.0 (clang-stage2-configure-Rlto_build 239114)"} > !2 = !{!3, !3, i64 0} > !3 = !{!"float", !4, i64 0} > !4 = !{!"omnipotent char", !5, i64 0} > !5 = !{!"Simple C/C++ TBAA"} > > > However, adding a dummy sext only proves <nuw> for {0,+,4} and not > <nsw>. The problem is that when constructing sext({0,+,4}) SCEV > realizes that since {0,+,4} is always positive, sext({0,+,4}) => zext({0,+,4}); and to change a zext of an add-rec to an add-rec of > zexts, SCEV only needs to prove <nuw> and not <nsw>. > > > I wonder if it makes sense to add a hook to SCEV that gets it to try > as hard as it can to prove <nuw> / <nsw> for specific add recurrences. > > > > [1]: It will try to prove nuw and nsw in cases where it is "easy", but > not in this specific case. > > [2]: So a worthwhile project is to have the vectorizer construct sign > extend expressions of add recurrences that it really cares about > proving no-wrap of and then check the flags on the > SCEVAddRecExpr. It may consume too much compile time, so there's > a tricky trade-off here. > > > -- Sanjoy > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
I'm not sure if inbounds can be used to prove <nuw>. If an object %OBJ is allocated at address -1 then "gep inbounds %OBJ 1" is not poison, but the underlying computation unsigned-overflows. It may be possible to use inbounds to derive <nsw> but that too is tricky since inbounds does not directly imply anything about no-wrap as far as I can tell. On Wed, Jun 10, 2015 at 5:29 PM, Adam Nemet <anemet at apple.com> wrote:> [+Arnold] > >> On Jun 10, 2015, at 1:29 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: >> >> [+CC Andy] >> >>> Can anyone familiar with ScalarRevolution tell me whether this is an >>> expected behavior or a bug? >> >> Assuming you're talking about 2*k, this is a bug. ScalarEvolution >> should be able to prove that {0,+,4} is <nsw> and <nuw>. > > I also find it surprising that the inbounds gep does not allow us to prove nuw of the pointer here. LAA has logic for this. > > Not to mention that we’re trying to figure out the distance of x[2*k] against *itself* which should be zero regardless of wrapping. > > A probably more interesting case is x[2*k] = x[2*k+2] + … which would require non-wrapping. > > Looks like we only consider an inbounds gep non-wrapping if it uses a unit stride. I am not sure why… > > Adam > >> >> Part of the problem is that in this case ScalarEvolution does not try >> to prove that {0,+,4} is <nsw> when the expression is constructed >> (since proving that has non-trivial cost) [1]. To get ScalarEvolution >> to try to prove that {0,+,4} has no-wrap properties, the client needs >> to construct a sign-extend expression of {0,+,4}. SCEV will try to >> change a sext of an add-rec to an add-rec of sexts and try to prove >> no-wrap in the process [2]. >> >> To easily do this from IR, you can just add a dummy sext instruction >> (like in the IR fragment below) and run >> 'opt -analyze -scalar-evolution -scalar-evolution' (just running SCEV >> won't dce the unused instruction): >> >> ; ModuleID = '<stdin>' >> target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" >> target triple = "x86_64-apple-macosx10.10.0" >> >> @x = common global [1024 x float] zeroinitializer, align 16 >> @y = common global [1024 x float] zeroinitializer, align 16 >> >> ; Function Attrs: nounwind ssp uwtable >> define void @myloop1() { >> bb: >> br label %bb2 >> >> bb1: ; preds = %bb2 >> ret void >> >> bb2: ; preds = %bb2, %bb >> %k.01 = phi i64 [ 0, %bb ], [ %tmp15, %bb2 ] >> %tmp = shl nsw i64 %k.01, 1 >> %dummy_sext = sext i64 %tmp to i128 >> %tmp3 = getelementptr inbounds [1024 x float], [1024 x float]* @x, >> i64 0, i64 %tmp >> %tmp4 = load float, float* %tmp3, align 16, !tbaa !2 >> %tmp5 = getelementptr inbounds [1024 x float], [1024 x float]* @y, >> i64 0, i64 %k.01 >> %tmp6 = load float, float* %tmp5, align 8, !tbaa !2 >> %tmp7 = fadd float %tmp4, %tmp6 >> store float %tmp7, float* %tmp3, align 16, !tbaa !2 >> %tmp8 = or i64 %k.01, 1 >> %tmp9 = shl nsw i64 %tmp8, 1 >> %tmp10 = getelementptr inbounds [1024 x float], [1024 x float]* >> @x, i64 0, i64 %tmp9 >> %tmp11 = load float, float* %tmp10, align 8, !tbaa !2 >> %tmp12 = getelementptr inbounds [1024 x float], [1024 x float]* >> @y, i64 0, i64 %tmp8 >> %tmp13 = load float, float* %tmp12, align 4, !tbaa !2 >> %tmp14 = fadd float %tmp11, %tmp13 >> store float %tmp14, float* %tmp10, align 8, !tbaa !2 >> %tmp15 = add nsw i64 %k.01, 2 >> %exitcond.1 = icmp eq i64 %tmp15, 512 >> br i1 %exitcond.1, label %bb1, label %bb2 >> } >> >> !0 = !{i32 1, !"PIC Level", i32 2} >> !1 = !{!"clang version 3.7.0 (clang-stage2-configure-Rlto_build 239114)"} >> !2 = !{!3, !3, i64 0} >> !3 = !{!"float", !4, i64 0} >> !4 = !{!"omnipotent char", !5, i64 0} >> !5 = !{!"Simple C/C++ TBAA"} >> >> >> However, adding a dummy sext only proves <nuw> for {0,+,4} and not >> <nsw>. The problem is that when constructing sext({0,+,4}) SCEV >> realizes that since {0,+,4} is always positive, sext({0,+,4}) =>> zext({0,+,4}); and to change a zext of an add-rec to an add-rec of >> zexts, SCEV only needs to prove <nuw> and not <nsw>. >> >> >> I wonder if it makes sense to add a hook to SCEV that gets it to try >> as hard as it can to prove <nuw> / <nsw> for specific add recurrences. >> >> >> >> [1]: It will try to prove nuw and nsw in cases where it is "easy", but >> not in this specific case. >> >> [2]: So a worthwhile project is to have the vectorizer construct sign >> extend expressions of add recurrences that it really cares about >> proving no-wrap of and then check the flags on the >> SCEVAddRecExpr. It may consume too much compile time, so there's >> a tricky trade-off here. >> >> >> -- Sanjoy >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Inbounds does not make any statements about the behavior of a sub expression involved in an index to the gep addr = index = 2*k // this can very well wrap gep addr, ..., index // addr +index is inbounds There is a slight of hands that we do in the vectorizer with unit strides where we rely on contiguous adress space. Sent from my iPhone> On Jun 10, 2015, at 5:29 PM, Adam Nemet <anemet at apple.com> wrote: > > [+Arnold] > >> On Jun 10, 2015, at 1:29 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: >> >> [+CC Andy] >> >>> Can anyone familiar with ScalarRevolution tell me whether this is an >>> expected behavior or a bug? >> >> Assuming you're talking about 2*k, this is a bug. ScalarEvolution >> should be able to prove that {0,+,4} is <nsw> and <nuw>. > > I also find it surprising that the inbounds gep does not allow us to prove nuw of the pointer here. LAA has logic for this. > > Not to mention that we’re trying to figure out the distance of x[2*k] against *itself* which should be zero regardless of wrapping. > > A probably more interesting case is x[2*k] = x[2*k+2] + … which would require non-wrapping. > > Looks like we only consider an inbounds gep non-wrapping if it uses a unit stride. I am not sure why… > > Adam > >> >> Part of the problem is that in this case ScalarEvolution does not try >> to prove that {0,+,4} is <nsw> when the expression is constructed >> (since proving that has non-trivial cost) [1]. To get ScalarEvolution >> to try to prove that {0,+,4} has no-wrap properties, the client needs >> to construct a sign-extend expression of {0,+,4}. SCEV will try to >> change a sext of an add-rec to an add-rec of sexts and try to prove >> no-wrap in the process [2]. >> >> To easily do this from IR, you can just add a dummy sext instruction >> (like in the IR fragment below) and run >> 'opt -analyze -scalar-evolution -scalar-evolution' (just running SCEV >> won't dce the unused instruction): >> >> ; ModuleID = '<stdin>' >> target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" >> target triple = "x86_64-apple-macosx10.10.0" >> >> @x = common global [1024 x float] zeroinitializer, align 16 >> @y = common global [1024 x float] zeroinitializer, align 16 >> >> ; Function Attrs: nounwind ssp uwtable >> define void @myloop1() { >> bb: >> br label %bb2 >> >> bb1: ; preds = %bb2 >> ret void >> >> bb2: ; preds = %bb2, %bb >> %k.01 = phi i64 [ 0, %bb ], [ %tmp15, %bb2 ] >> %tmp = shl nsw i64 %k.01, 1 >> %dummy_sext = sext i64 %tmp to i128 >> %tmp3 = getelementptr inbounds [1024 x float], [1024 x float]* @x, >> i64 0, i64 %tmp >> %tmp4 = load float, float* %tmp3, align 16, !tbaa !2 >> %tmp5 = getelementptr inbounds [1024 x float], [1024 x float]* @y, >> i64 0, i64 %k.01 >> %tmp6 = load float, float* %tmp5, align 8, !tbaa !2 >> %tmp7 = fadd float %tmp4, %tmp6 >> store float %tmp7, float* %tmp3, align 16, !tbaa !2 >> %tmp8 = or i64 %k.01, 1 >> %tmp9 = shl nsw i64 %tmp8, 1 >> %tmp10 = getelementptr inbounds [1024 x float], [1024 x float]* >> @x, i64 0, i64 %tmp9 >> %tmp11 = load float, float* %tmp10, align 8, !tbaa !2 >> %tmp12 = getelementptr inbounds [1024 x float], [1024 x float]* >> @y, i64 0, i64 %tmp8 >> %tmp13 = load float, float* %tmp12, align 4, !tbaa !2 >> %tmp14 = fadd float %tmp11, %tmp13 >> store float %tmp14, float* %tmp10, align 8, !tbaa !2 >> %tmp15 = add nsw i64 %k.01, 2 >> %exitcond.1 = icmp eq i64 %tmp15, 512 >> br i1 %exitcond.1, label %bb1, label %bb2 >> } >> >> !0 = !{i32 1, !"PIC Level", i32 2} >> !1 = !{!"clang version 3.7.0 (clang-stage2-configure-Rlto_build 239114)"} >> !2 = !{!3, !3, i64 0} >> !3 = !{!"float", !4, i64 0} >> !4 = !{!"omnipotent char", !5, i64 0} >> !5 = !{!"Simple C/C++ TBAA"} >> >> >> However, adding a dummy sext only proves <nuw> for {0,+,4} and not >> <nsw>. The problem is that when constructing sext({0,+,4}) SCEV >> realizes that since {0,+,4} is always positive, sext({0,+,4}) =>> zext({0,+,4}); and to change a zext of an add-rec to an add-rec of >> zexts, SCEV only needs to prove <nuw> and not <nsw>. >> >> >> I wonder if it makes sense to add a hook to SCEV that gets it to try >> as hard as it can to prove <nuw> / <nsw> for specific add recurrences. >> >> >> >> [1]: It will try to prove nuw and nsw in cases where it is "easy", but >> not in this specific case. >> >> [2]: So a worthwhile project is to have the vectorizer construct sign >> extend expressions of add recurrences that it really cares about >> proving no-wrap of and then check the flags on the >> SCEVAddRecExpr. It may consume too much compile time, so there's >> a tricky trade-off here. >> >> >> -- Sanjoy >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
> On Jun 10, 2015, at 6:17 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > > I'm not sure if inbounds can be used to prove <nuw>. If an object > %OBJ is allocated at address -1 then "gep inbounds %OBJ 1" is not > poison, but the underlying computation unsigned-overflows.I think that this should yield poison per langref because the signed arithmetic implied by the indices should be performed with infinite precision. Base is treated as unsigned so 0xff…ff + 1 would be 0x100…00 which would be out of bounds (sort of). See also this which I think is even more explicit: http://llvm.org/docs/GetElementPtr.html#what-happens-if-a-gep-computation-overflows <http://llvm.org/docs/GetElementPtr.html#what-happens-if-a-gep-computation-overflows> Adam> It may be possible to use inbounds to derive <nsw> but that too is > tricky since inbounds does not directly imply anything about no-wrap > as far as I can tell. > > On Wed, Jun 10, 2015 at 5:29 PM, Adam Nemet <anemet at apple.com> wrote: >> [+Arnold] >> >>> On Jun 10, 2015, at 1:29 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: >>> >>> [+CC Andy] >>> >>>> Can anyone familiar with ScalarRevolution tell me whether this is an >>>> expected behavior or a bug? >>> >>> Assuming you're talking about 2*k, this is a bug. ScalarEvolution >>> should be able to prove that {0,+,4} is <nsw> and <nuw>. >> >> I also find it surprising that the inbounds gep does not allow us to prove nuw of the pointer here. LAA has logic for this. >> >> Not to mention that we’re trying to figure out the distance of x[2*k] against *itself* which should be zero regardless of wrapping. >> >> A probably more interesting case is x[2*k] = x[2*k+2] + … which would require non-wrapping. >> >> Looks like we only consider an inbounds gep non-wrapping if it uses a unit stride. I am not sure why… >> >> Adam >> >>> >>> Part of the problem is that in this case ScalarEvolution does not try >>> to prove that {0,+,4} is <nsw> when the expression is constructed >>> (since proving that has non-trivial cost) [1]. To get ScalarEvolution >>> to try to prove that {0,+,4} has no-wrap properties, the client needs >>> to construct a sign-extend expression of {0,+,4}. SCEV will try to >>> change a sext of an add-rec to an add-rec of sexts and try to prove >>> no-wrap in the process [2]. >>> >>> To easily do this from IR, you can just add a dummy sext instruction >>> (like in the IR fragment below) and run >>> 'opt -analyze -scalar-evolution -scalar-evolution' (just running SCEV >>> won't dce the unused instruction): >>> >>> ; ModuleID = '<stdin>' >>> target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" >>> target triple = "x86_64-apple-macosx10.10.0" >>> >>> @x = common global [1024 x float] zeroinitializer, align 16 >>> @y = common global [1024 x float] zeroinitializer, align 16 >>> >>> ; Function Attrs: nounwind ssp uwtable >>> define void @myloop1() { >>> bb: >>> br label %bb2 >>> >>> bb1: ; preds = %bb2 >>> ret void >>> >>> bb2: ; preds = %bb2, %bb >>> %k.01 = phi i64 [ 0, %bb ], [ %tmp15, %bb2 ] >>> %tmp = shl nsw i64 %k.01, 1 >>> %dummy_sext = sext i64 %tmp to i128 >>> %tmp3 = getelementptr inbounds [1024 x float], [1024 x float]* @x, >>> i64 0, i64 %tmp >>> %tmp4 = load float, float* %tmp3, align 16, !tbaa !2 >>> %tmp5 = getelementptr inbounds [1024 x float], [1024 x float]* @y, >>> i64 0, i64 %k.01 >>> %tmp6 = load float, float* %tmp5, align 8, !tbaa !2 >>> %tmp7 = fadd float %tmp4, %tmp6 >>> store float %tmp7, float* %tmp3, align 16, !tbaa !2 >>> %tmp8 = or i64 %k.01, 1 >>> %tmp9 = shl nsw i64 %tmp8, 1 >>> %tmp10 = getelementptr inbounds [1024 x float], [1024 x float]* >>> @x, i64 0, i64 %tmp9 >>> %tmp11 = load float, float* %tmp10, align 8, !tbaa !2 >>> %tmp12 = getelementptr inbounds [1024 x float], [1024 x float]* >>> @y, i64 0, i64 %tmp8 >>> %tmp13 = load float, float* %tmp12, align 4, !tbaa !2 >>> %tmp14 = fadd float %tmp11, %tmp13 >>> store float %tmp14, float* %tmp10, align 8, !tbaa !2 >>> %tmp15 = add nsw i64 %k.01, 2 >>> %exitcond.1 = icmp eq i64 %tmp15, 512 >>> br i1 %exitcond.1, label %bb1, label %bb2 >>> } >>> >>> !0 = !{i32 1, !"PIC Level", i32 2} >>> !1 = !{!"clang version 3.7.0 (clang-stage2-configure-Rlto_build 239114)"} >>> !2 = !{!3, !3, i64 0} >>> !3 = !{!"float", !4, i64 0} >>> !4 = !{!"omnipotent char", !5, i64 0} >>> !5 = !{!"Simple C/C++ TBAA"} >>> >>> >>> However, adding a dummy sext only proves <nuw> for {0,+,4} and not >>> <nsw>. The problem is that when constructing sext({0,+,4}) SCEV >>> realizes that since {0,+,4} is always positive, sext({0,+,4}) =>>> zext({0,+,4}); and to change a zext of an add-rec to an add-rec of >>> zexts, SCEV only needs to prove <nuw> and not <nsw>. >>> >>> >>> I wonder if it makes sense to add a hook to SCEV that gets it to try >>> as hard as it can to prove <nuw> / <nsw> for specific add recurrences. >>> >>> >>> >>> [1]: It will try to prove nuw and nsw in cases where it is "easy", but >>> not in this specific case. >>> >>> [2]: So a worthwhile project is to have the vectorizer construct sign >>> extend expressions of add recurrences that it really cares about >>> proving no-wrap of and then check the flags on the >>> SCEVAddRecExpr. It may consume too much compile time, so there's >>> a tricky trade-off here. >>> >>> >>> -- Sanjoy >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150610/fa42e095/attachment.html>
> On Jun 10, 2015, at 6:19 PM, Arnold <aschwaighofer at apple.com> wrote: > > Inbounds does not make any statements about the behavior of a sub expression involved in an index to the gep > > addr = > index = 2*k // this can very well wrap > gep addr, ..., index // addr +index is inbounds… and since 2*k is part of the SCEV ({x,+,2}), the SCEV may wrap. Got it, thanks. That said, we should still be able to analyze these, I think: 1. {x,+,2} vs {x,+,2} There is obviously a dep-distance 0 between them regardless of wrapping. 2. {x+8,+,2} vs {x,+,2} Because of the known small-constant difference, this is either a dep-distance of 8 or 8-Size_of_addr_space. Thus as long as this is normally a forward dep of a small known distance (and with wrapping huge-distance backward dep) we should be able to vectorize this. What do you think? Adam> > There is a slight of hands that we do in the vectorizer with unit strides where we rely on contiguous adress space. > > > > Sent from my iPhone > >> On Jun 10, 2015, at 5:29 PM, Adam Nemet <anemet at apple.com> wrote: >> >> [+Arnold] >> >>> On Jun 10, 2015, at 1:29 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote: >>> >>> [+CC Andy] >>> >>>> Can anyone familiar with ScalarRevolution tell me whether this is an >>>> expected behavior or a bug? >>> >>> Assuming you're talking about 2*k, this is a bug. ScalarEvolution >>> should be able to prove that {0,+,4} is <nsw> and <nuw>. >> >> I also find it surprising that the inbounds gep does not allow us to prove nuw of the pointer here. LAA has logic for this. >> >> Not to mention that we’re trying to figure out the distance of x[2*k] against *itself* which should be zero regardless of wrapping. >> >> A probably more interesting case is x[2*k] = x[2*k+2] + … which would require non-wrapping. >> >> Looks like we only consider an inbounds gep non-wrapping if it uses a unit stride. I am not sure why… >> >> Adam >> >>> >>> Part of the problem is that in this case ScalarEvolution does not try >>> to prove that {0,+,4} is <nsw> when the expression is constructed >>> (since proving that has non-trivial cost) [1]. To get ScalarEvolution >>> to try to prove that {0,+,4} has no-wrap properties, the client needs >>> to construct a sign-extend expression of {0,+,4}. SCEV will try to >>> change a sext of an add-rec to an add-rec of sexts and try to prove >>> no-wrap in the process [2]. >>> >>> To easily do this from IR, you can just add a dummy sext instruction >>> (like in the IR fragment below) and run >>> 'opt -analyze -scalar-evolution -scalar-evolution' (just running SCEV >>> won't dce the unused instruction): >>> >>> ; ModuleID = '<stdin>' >>> target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" >>> target triple = "x86_64-apple-macosx10.10.0" >>> >>> @x = common global [1024 x float] zeroinitializer, align 16 >>> @y = common global [1024 x float] zeroinitializer, align 16 >>> >>> ; Function Attrs: nounwind ssp uwtable >>> define void @myloop1() { >>> bb: >>> br label %bb2 >>> >>> bb1: ; preds = %bb2 >>> ret void >>> >>> bb2: ; preds = %bb2, %bb >>> %k.01 = phi i64 [ 0, %bb ], [ %tmp15, %bb2 ] >>> %tmp = shl nsw i64 %k.01, 1 >>> %dummy_sext = sext i64 %tmp to i128 >>> %tmp3 = getelementptr inbounds [1024 x float], [1024 x float]* @x, >>> i64 0, i64 %tmp >>> %tmp4 = load float, float* %tmp3, align 16, !tbaa !2 >>> %tmp5 = getelementptr inbounds [1024 x float], [1024 x float]* @y, >>> i64 0, i64 %k.01 >>> %tmp6 = load float, float* %tmp5, align 8, !tbaa !2 >>> %tmp7 = fadd float %tmp4, %tmp6 >>> store float %tmp7, float* %tmp3, align 16, !tbaa !2 >>> %tmp8 = or i64 %k.01, 1 >>> %tmp9 = shl nsw i64 %tmp8, 1 >>> %tmp10 = getelementptr inbounds [1024 x float], [1024 x float]* >>> @x, i64 0, i64 %tmp9 >>> %tmp11 = load float, float* %tmp10, align 8, !tbaa !2 >>> %tmp12 = getelementptr inbounds [1024 x float], [1024 x float]* >>> @y, i64 0, i64 %tmp8 >>> %tmp13 = load float, float* %tmp12, align 4, !tbaa !2 >>> %tmp14 = fadd float %tmp11, %tmp13 >>> store float %tmp14, float* %tmp10, align 8, !tbaa !2 >>> %tmp15 = add nsw i64 %k.01, 2 >>> %exitcond.1 = icmp eq i64 %tmp15, 512 >>> br i1 %exitcond.1, label %bb1, label %bb2 >>> } >>> >>> !0 = !{i32 1, !"PIC Level", i32 2} >>> !1 = !{!"clang version 3.7.0 (clang-stage2-configure-Rlto_build 239114)"} >>> !2 = !{!3, !3, i64 0} >>> !3 = !{!"float", !4, i64 0} >>> !4 = !{!"omnipotent char", !5, i64 0} >>> !5 = !{!"Simple C/C++ TBAA"} >>> >>> >>> However, adding a dummy sext only proves <nuw> for {0,+,4} and not >>> <nsw>. The problem is that when constructing sext({0,+,4}) SCEV >>> realizes that since {0,+,4} is always positive, sext({0,+,4}) =>>> zext({0,+,4}); and to change a zext of an add-rec to an add-rec of >>> zexts, SCEV only needs to prove <nuw> and not <nsw>. >>> >>> >>> I wonder if it makes sense to add a hook to SCEV that gets it to try >>> as hard as it can to prove <nuw> / <nsw> for specific add recurrences. >>> >>> >>> >>> [1]: It will try to prove nuw and nsw in cases where it is "easy", but >>> not in this specific case. >>> >>> [2]: So a worthwhile project is to have the vectorizer construct sign >>> extend expressions of add recurrences that it really cares about >>> proving no-wrap of and then check the flags on the >>> SCEVAddRecExpr. It may consume too much compile time, so there's >>> a tricky trade-off here. >>> >>> >>> -- Sanjoy >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>
Possibly Parallel Threads
- [LLVMdev] Question about NoWrap flag for SCEVAddRecExpr
- [LLVMdev] Question about NoWrap flag for SCEVAddRecExpr
- [LLVMdev] Question about NoWrap flag for SCEVAddRecExpr
- [LLVMdev] Question about NoWrap flag for SCEVAddRecExpr
- [LLVMdev] Missed optimization opportunity with piecewise load shift-or'd together?