Jie He via llvm-dev
2021-May-12 02:41 UTC
[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager
yes, but current lowering deopt implementation would generate a statepoint IR which currently only supports X86-64, as mentioned in GC documentation in LLVM. iRCE doesn't reply on GCed language, I remember wrong. but it's not smart right now, can't handle bounds check well like java RCE did. On Tue, 11 May 2021 at 23:04, Philip Reames <listmail at philipreames.com> wrote:> This is incorrect. > > IRCE's current sole known user happens to be a compiler for a GCed > language, but there is no (intentional) dependence on that fact. It should > work on arbitrary IR. > > Loop predication (the form in IndVars) triggers for arbitrary IR. The > separate pass depends on semantics of guards which is related to deopt > semantics, but *not* GC. > > Philip > On 5/11/21 7:17 AM, Jie He wrote: > > as I know, current IRCE implementation relies on some preconditions. it's > intended to language runtime with garbage collection, not for loop > vectorization. > the same is true for loop predication, which is also helpful for > eliminating condition check within a loop. > > Jie He > B.R > > On Tue, 11 May 2021 at 20:50, Jingu Kang via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi Philip, >> >> >> >> I have extended your suggestion slightly more as below. >> >> >> >> newbound1 = min(n, c) >> >> newbound2 = max(n, c) >> >> while (iv < n) { while(iv < newbound1) { >> >> A A >> >> if (iv < c) B >> >> B C >> >> C } >> >> } iv = newbound1 >> >> while (iv < newbound2) { >> >> A >> >> C >> >> } >> >> >> >> I have implemented a simple pass to split bound of loop, which has >> conditional branch with IV, as above example. >> https://reviews.llvm.org/D102234 It is initial version. If possible, >> please review it. >> >> >> >> Thanks >> >> JinGu Kang >> >> >> >> *From:* Jingu Kang <Jingu.Kang at arm.com> >> *Sent:* 04 May 2021 12:45 >> *To:* Philip Reames <listmail at philipreames.com>; Jingu Kang < >> Jingu.Kang at arm.com> >> *Cc:* llvm-dev at lists.llvm.org >> *Subject:* RE: [llvm-dev] Enabling IRCE pass or Adding something similar >> in the pipeline of new pass manager >> >> >> >> Philip, I appreciate your kind comments. >> >> >In this example, forming the full pre/main/post loop structure of IRCE >> is overkill. Instead, we could simply restrict the loop bounds in the >> following manner: >> >> >loop.ph: >> >> > ;; Warning: psuedo code, might have edge conditions wrong >> >> > %c = icmp sgt %iv, %n >> >> > %min = umax(%n, %a) >> >> > br i1 %c, label %exit, label %loop.ph >> >> > >> >> >loop.ph.split: >> >> > br label %loop >> >> > >> >> >loop: >> >> > %iv = phi i64 [ %inc, %loop ], [ 1, %loop.ph ] >> >> > %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv >> >> > %val = load i64, i64* %src.arrayidx >> >> > %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv >> >> > store i64 %val, i64* %dst.arrayidx >> >> > %inc = add nuw nsw i64 %iv, 1 >> >> > %cond = icmp eq i64 %inc, %min >> >> > br i1 %cond, label %exit, label %loop >> >> > >> >> >exit: >> >> > ret void >> >> >} >> >> > >> >> >I'm not quite sure what to call this transform, but it's not IRCE. If this example is actually general enough to cover your use cases, it's going to be a lot easier to judge profitability on than the general form of iteration set splitting >> >> >> >> I agree with you. If the llvm community is ok to accept above approach as >> a pass or a part of a certain pass, I would be happy to implement it >> because I am aiming to handle this case with llvm upstream. >> >> >> >> >Another way to frame this special case might be to recognize the >> conditional block can be inverted into an early exit. (Reasoning: %iv is >> strictly increasing, condition is monotonic, path if not taken has no >> observable effect) Consider: >> >> >loop.ph: >> >> > br label %loop >> >> > >> >> >loop: >> >> > %iv = phi i64 [ %inc, %for.inc ], [ 1, %loop.ph ] >> >> > %cmp = icmp sge i64 %iv, %a >> >> > br i1 %cmp, label %exit, label %for.inc >> >> > >> >> >for.inc: >> >> > %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv >> >> > %val = load i64, i64* %src.arrayidx >> >> > %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv >> >> > store i64 %val, i64* %dst.arrayidx >> >> > %inc = add nuw nsw i64 %iv, 1 >> >> > %cond = icmp eq i64 %inc, %n >> >> > br i1 %cond, label %exit, label %loop >> >> > >> >> >exit: >> >> > ret void >> >> >} >> >> >Once that's done, the multiple exit vectorization work should vectorize >> this loop. Thinking about it, I really like this variant. >> >> I have not looked at the multiple exit vectorization work yet but it >> looks we could consider the inverted condition as early exit’s condition. >> >> >The costing here seems quite off. I have not looked at how the >> vectorize costs predicated loads on hardware without predication, but >> needing to scalarize a conditional VF-times and form a vector again does >> not have a cost of 3 million. This could definitely be improved. >> >> I agree with you. >> >> >> >> Additionally, if possible, I would like to suggest to enable or add >> transformations in order to help vectorization. For example, as removing >> conditional branch inside loop, we could split a loop with dependency, >> which blocks vectorization, into vectorizable loop and non-vectorizable one >> using transformations like loop distribution. I am not sure why these >> features have not been enabled as default on pass manager but it would make >> more loops vectorizable. >> >> >> >> Thanks >> >> JinGu Kang >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > > -- > Best Regards > He Jie 何杰 > >-- Best Regards He Jie 何杰 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210512/97d4f353/attachment.html>
Philip Reames via llvm-dev
2021-May-12 18:45 UTC
[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager
On 5/11/21 7:41 PM, Jie He wrote:> yes, but current lowering deopt implementation would generate a > statepoint IR which currently only supports X86-64, as mentioned in GC > documentation in LLVM.I believe this is supported on at least AArch64 if memory serves.> > iRCE doesn't reply on GCed language, I remember wrong. but it's not > smart right now, can't handle bounds check well like java RCE did.Er, I think you're either misunderstanding or need to clarify your point. IRCE does exactly the standard pre/main/post loop technique which was used in C2 back in the day. LoopPred does the widening transformation. Do you have a particular case in mind you're thinking of?> > On Tue, 11 May 2021 at 23:04, Philip Reames <listmail at philipreames.com > <mailto:listmail at philipreames.com>> wrote: > > This is incorrect. > > IRCE's current sole known user happens to be a compiler for a GCed > language, but there is no (intentional) dependence on that fact. > It should work on arbitrary IR. > > Loop predication (the form in IndVars) triggers for arbitrary IR. > The separate pass depends on semantics of guards which is related > to deopt semantics, but *not* GC. > > Philip > > On 5/11/21 7:17 AM, Jie He wrote: >> as I know, current IRCE implementation relies on some >> preconditions. it's intended to language runtime with garbage >> collection, not for loop vectorization. >> the same is true for loop predication, which is also helpful for >> eliminating condition check within a loop. >> >> Jie He >> B.R >> >> On Tue, 11 May 2021 at 20:50, Jingu Kang via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi Philip, >> >> I have extended your suggestion slightly more as below. >> >> newbound1 = min(n, c) >> >> newbound2 = max(n, c) >> >> while (iv < n) { while(iv < newbound1) { >> >> A A >> >> if (iv < c) B >> >> B C >> >> C } >> >> } iv = newbound1 >> >> while (iv < newbound2) { >> >> A >> >> C >> >> } >> >> I have implemented a simple pass to split bound of loop, >> which has conditional branch with IV, as above example. >> https://reviews.llvm.org/D102234 >> <https://reviews.llvm.org/D102234> It is initial version. If >> possible, please review it. >> >> Thanks >> >> JinGu Kang >> >> *From:* Jingu Kang <Jingu.Kang at arm.com >> <mailto:Jingu.Kang at arm.com>> >> *Sent:* 04 May 2021 12:45 >> *To:* Philip Reames <listmail at philipreames.com >> <mailto:listmail at philipreames.com>>; Jingu Kang >> <Jingu.Kang at arm.com <mailto:Jingu.Kang at arm.com>> >> *Cc:* llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> *Subject:* RE: [llvm-dev] Enabling IRCE pass or Adding >> something similar in the pipeline of new pass manager >> >> Philip, I appreciate your kind comments. >> >> >In this example, forming the full pre/main/post loop structure of IRCE is overkill. Instead, >> we could simply restrict the loop bounds in the following manner: >> >> >loop.ph <http://loop.ph>: >> >> > ;; Warning: psuedo code, might have edge conditions wrong >> >> > %c = icmp sgt %iv, %n >> >> > %min = umax(%n, %a) >> >> > br i1 %c, label %exit, label %loop.ph <http://loop.ph> >> >> > >> >> >loop.ph.split: >> >> > br label %loop >> >> > >> >> >loop: >> >> > %iv = phi i64 [ %inc, %loop ], [ 1, %loop.ph <http://loop.ph> ] >> >> > %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv >> >> > %val = load i64, i64* %src.arrayidx >> >> > %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv >> >> > store i64 %val, i64* %dst.arrayidx >> >> > %inc = add nuw nsw i64 %iv, 1 >> >> > %cond = icmp eq i64 %inc, %min >> >> > br i1 %cond, label %exit, label %loop >> >> > >> >> >exit: >> >> > ret void >> >> >} >> >> > >> >> >I'm not quite sure what to call this transform, but it's not IRCE. If this example is actually general enough to cover your use cases, it's going to be a lot easier to judge profitability on than the general form of iteration set splitting >> >> I agree with you. If the llvm community is ok to accept above >> approach as a pass or a part of a certain pass, I would be >> happy to implement it because I am aiming to handle this case >> with llvm upstream. >> >> >Another way to frame this special case might be to recognize the conditional block can be >> inverted into an early exit. (Reasoning: %iv is strictly >> increasing, condition is monotonic, path if not taken has no >> observable effect) Consider: >> >> >loop.ph <http://loop.ph>: >> >> > br label %loop >> >> > >> >> >loop: >> >> > %iv = phi i64 [ %inc, %for.inc ], [ 1, %loop.ph <http://loop.ph> ] >> >> > %cmp = icmp sge i64 %iv, %a >> >> > br i1 %cmp, label %exit, label %for.inc >> >> > >> >> >for.inc: >> >> > %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv >> >> > %val = load i64, i64* %src.arrayidx >> >> > %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv >> >> > store i64 %val, i64* %dst.arrayidx >> >> > %inc = add nuw nsw i64 %iv, 1 >> >> > %cond = icmp eq i64 %inc, %n >> >> > br i1 %cond, label %exit, label %loop >> >> > >> >> >exit: >> >> > ret void >> >> >} >> >> >Once that's done, the multiple exit vectorization work should vectorize this loop. Thinking >> about it, I really like this variant. >> >> I have not looked at the multiple exit vectorization work >> yet but it looks we could consider the inverted condition as >> early exit’s condition. >> >> >The costing here seems quite off. I have not looked at how the vectorize costs predicated loads >> on hardware without predication, but needing to scalarize a >> conditional VF-times and form a vector again does not have a >> cost of 3 million. This could definitely be improved. >> >> I agree with you. >> >> Additionally, if possible, I would like to suggest to enable >> or add transformations in order to help vectorization. For >> example, as removing conditional branch inside loop, we could >> split a loop with dependency, which blocks vectorization, >> into vectorizable loop and non-vectorizable one using >> transformations like loop distribution. I am not sure why >> these features have not been enabled as default on pass >> manager but it would make more loops vectorizable. >> >> Thanks >> >> JinGu Kang >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> >> >> >> -- >> Best Regards >> He Jie 何杰 > > > > -- > Best Regards > He Jie 何杰-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210512/99e580c4/attachment.html>