Caballero, Diego via llvm-dev
2017-Nov-14 17:16 UTC
[llvm-dev] [SCEV][ScalarEvolution] SE limitation impacting LV
Hi! I would appreciate some feedback from someone with experience in SCEV/SE. D39346 tries to fix an issue in LV (PR34965) that exposes a limitation in SCEV/SE. The best solution to the LV issue might not be a fix at SCEV/SE level but we may want to report/address SCEV/SE limitation as well. For the snippet below, LV expects SE to return a SCEVAddRecExpr for %21. However, SE returns ((4 * (zext i32 (2 + %18) to i64))<nuw><nsw> + undef)<nsw>, probably because it's not able to deduce that %bc.resume.val1 = %bc.resume.val + 1. You can find further information in D39346 and the full test in PR34965 ("IR_after_createVectorizedLoopSkeleton" attachment). I could open a new bug against SCEV/SE if a fix or extension is feasible at that level. Thanks, Diego ---- .outer: ; preds = %2, %0 ... %.ph2 = phi i32 [ 62, %2 ], [ 110, %0 ] %5 = add i32 %.ph2, 1 ... vector.ph: ; preds = %.outer ... %ind.end = add i32 %5, %n.vec %ind.end2 = add i32 %.ph2, %n.vec ... middle.block: ; preds = %vector.body.latch ... scalar.ph: ; preds = %middle.block, %.outer %bc.resume.val = phi i32 [ %ind.end, %middle.block ], [ %5, %.outer ] %bc.resume.val1 = phi i32 [ %ind.end2, %middle.block ], [ %.ph2, %.outer ] br label %16 ; <label>:16: ; preds = %16, %scalar.ph %17 = phi i32 [ %bc.resume.val, %scalar.ph ], [ %23, %16 ] %18 = phi i32 [ %bc.resume.val1, %scalar.ph ], [ %17, %16 ] %19 = add i32 %18, 2 %20 = zext i32 %19 to i64 %21 = getelementptr inbounds i32, i32 addrspace(1)* undef, i64 %20 %22 = ashr i32 undef, %4 store i32 %22, i32 addrspace(1)* %21, align 4 %23 = add i32 %17, 1 %24 = icmp sgt i32 %23, 61 br i1 %24, label %._crit_edge.loopexit.preheader, label %16 -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20171114/e6657172/attachment.html>
Sanjoy Das via llvm-dev
2017-Nov-22 22:58 UTC
[llvm-dev] [SCEV][ScalarEvolution] SE limitation impacting LV
Hi, SCEV is fairly conservative around PHI nodes that aren't recurrences and aren't obviously equivalent to a min-max branch-phi idiom. Is that the limitation you're running into here? -- Sanjoy On Tue, Nov 14, 2017 at 9:16 AM, Caballero, Diego via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hi! > > > > I would appreciate some feedback from someone with experience in SCEV/SE. > D39346 tries to fix an issue in LV (PR34965) that exposes a limitation in > SCEV/SE. The best solution to the LV issue might not be a fix at SCEV/SE > level but we may want to report/address SCEV/SE limitation as well. > > > > For the snippet below, LV expects SE to return a SCEVAddRecExpr for %21. > However, SE returns ((4 * (zext i32 (2 + %18) to i64))<nuw><nsw> + > undef)<nsw>, probably because it’s not able to deduce that %bc.resume.val1 > = %bc.resume.val + 1. You can find further information in D39346 and the > full test in PR34965 (“IR_after_createVectorizedLoopSkeleton” attachment). > > > > I could open a new bug against SCEV/SE if a fix or extension is feasible at > that level. > > > > Thanks, > > Diego > > > > ---- > > > > .outer: ; preds = %2, %0 > > … > > %.ph2 = phi i32 [ 62, %2 ], [ 110, %0 ] > > %5 = add i32 %.ph2, 1 > > … > > > > vector.ph: ; preds = %.outer > > … > > %ind.end = add i32 %5, %n.vec > > %ind.end2 = add i32 %.ph2, %n.vec > > … > > > > middle.block: ; preds > %vector.body.latch > > … > > > > scalar.ph: ; preds = %middle.block, > %.outer > > %bc.resume.val = phi i32 [ %ind.end, %middle.block ], [ %5, %.outer ] > > %bc.resume.val1 = phi i32 [ %ind.end2, %middle.block ], [ %.ph2, %.outer ] > > br label %16 > > > > ; <label>:16: ; preds = %16, %scalar.ph > > %17 = phi i32 [ %bc.resume.val, %scalar.ph ], [ %23, %16 ] > > %18 = phi i32 [ %bc.resume.val1, %scalar.ph ], [ %17, %16 ] > > %19 = add i32 %18, 2 > > %20 = zext i32 %19 to i64 > > %21 = getelementptr inbounds i32, i32 addrspace(1)* undef, i64 %20 > > %22 = ashr i32 undef, %4 > > store i32 %22, i32 addrspace(1)* %21, align 4 > > %23 = add i32 %17, 1 > > %24 = icmp sgt i32 %23, 61 > > br i1 %24, label %._crit_edge.loopexit.preheader, label %16 > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Caballero, Diego via llvm-dev
2017-Nov-22 23:43 UTC
[llvm-dev] [SCEV][ScalarEvolution] SE limitation impacting LV
Thanks for the feedback, Sanjoy. > SCEV is fairly conservative around PHI nodes that aren't recurrences and aren't obviously equivalent to a min-max branch-phi idiom. Is that the limitation you're running into here? Yes, that's exactly the problem. The problematic PHI nodes (%bc.resume.val and %bc.resume.val1) aren't either recurrences or related to min-max idioms. I have a tentative fix for the LV bug and I just wanted to know if this is something that should also be fixed at SCEV level to report a bug accordingly. Your previous comment sounds like "this is a well-known limitation in SCEV" so maybe there is nothing new to report. Thanks, Diego -----Original Message----- From: Sanjoy Das [mailto:sanjoy at playingwithpointers.com] Sent: Wednesday, November 22, 2017 2:58 PM To: Caballero, Diego <diego.caballero at intel.com> Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] [SCEV][ScalarEvolution] SE limitation impacting LV Hi, SCEV is fairly conservative around PHI nodes that aren't recurrences and aren't obviously equivalent to a min-max branch-phi idiom. Is that the limitation you're running into here? -- Sanjoy On Tue, Nov 14, 2017 at 9:16 AM, Caballero, Diego via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hi! > > > > I would appreciate some feedback from someone with experience in SCEV/SE. > D39346 tries to fix an issue in LV (PR34965) that exposes a limitation > in SCEV/SE. The best solution to the LV issue might not be a fix at > SCEV/SE level but we may want to report/address SCEV/SE limitation as well. > > > > For the snippet below, LV expects SE to return a SCEVAddRecExpr for %21. > However, SE returns ((4 * (zext i32 (2 + %18) to i64))<nuw><nsw> + > undef)<nsw>, probably because it’s not able to deduce that > %bc.resume.val1 = %bc.resume.val + 1. You can find further information > in D39346 and the full test in PR34965 (“IR_after_createVectorizedLoopSkeleton” attachment). > > > > I could open a new bug against SCEV/SE if a fix or extension is > feasible at that level. > > > > Thanks, > > Diego > > > > ---- > > > > .outer: ; preds = %2, %0 > > … > > %.ph2 = phi i32 [ 62, %2 ], [ 110, %0 ] > > %5 = add i32 %.ph2, 1 > > … > > > > vector.ph: ; preds = %.outer > > … > > %ind.end = add i32 %5, %n.vec > > %ind.end2 = add i32 %.ph2, %n.vec > > … > > > > middle.block: ; preds > %vector.body.latch > > … > > > > scalar.ph: ; preds = %middle.block, > %.outer > > %bc.resume.val = phi i32 [ %ind.end, %middle.block ], [ %5, %.outer > ] > > %bc.resume.val1 = phi i32 [ %ind.end2, %middle.block ], [ %.ph2, > %.outer ] > > br label %16 > > > > ; <label>:16: ; preds = %16, %scalar.ph > > %17 = phi i32 [ %bc.resume.val, %scalar.ph ], [ %23, %16 ] > > %18 = phi i32 [ %bc.resume.val1, %scalar.ph ], [ %17, %16 ] > > %19 = add i32 %18, 2 > > %20 = zext i32 %19 to i64 > > %21 = getelementptr inbounds i32, i32 addrspace(1)* undef, i64 %20 > > %22 = ashr i32 undef, %4 > > store i32 %22, i32 addrspace(1)* %21, align 4 > > %23 = add i32 %17, 1 > > %24 = icmp sgt i32 %23, 61 > > br i1 %24, label %._crit_edge.loopexit.preheader, label %16 > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Apparently Analagous Threads
- [SCEV][ScalarEvolution] SE limitation impacting LV
- [LV] [ScalarEvolution] Feedback on bug 34965 - After r311849 Loop Vectorizer crashes with "The instruction should be scalarized"
- RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)
- RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)
- RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)