Jie He via llvm-dev
2021-May-11 14:17 UTC
[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager
as I know, current IRCE implementation relies on some preconditions. it's intended to language runtime with garbage collection, not for loop vectorization. the same is true for loop predication, which is also helpful for eliminating condition check within a loop. Jie He B.R On Tue, 11 May 2021 at 20:50, Jingu Kang via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi Philip, > > > > I have extended your suggestion slightly more as below. > > > > newbound1 = min(n, c) > > newbound2 = max(n, c) > > while (iv < n) { while(iv < newbound1) { > > A A > > if (iv < c) B > > B C > > C } > > } iv = newbound1 > > while (iv < newbound2) { > > A > > C > > } > > > > I have implemented a simple pass to split bound of loop, which has > conditional branch with IV, as above example. > https://reviews.llvm.org/D102234 It is initial version. If possible, > please review it. > > > > Thanks > > JinGu Kang > > > > *From:* Jingu Kang <Jingu.Kang at arm.com> > *Sent:* 04 May 2021 12:45 > *To:* Philip Reames <listmail at philipreames.com>; Jingu Kang < > Jingu.Kang at arm.com> > *Cc:* llvm-dev at lists.llvm.org > *Subject:* RE: [llvm-dev] Enabling IRCE pass or Adding something similar > in the pipeline of new pass manager > > > > Philip, I appreciate your kind comments. > > >In this example, forming the full pre/main/post loop structure of IRCE is > overkill. Instead, we could simply restrict the loop bounds in the > following manner: > > >loop.ph: > > > ;; Warning: psuedo code, might have edge conditions wrong > > > %c = icmp sgt %iv, %n > > > %min = umax(%n, %a) > > > br i1 %c, label %exit, label %loop.ph > > > > > >loop.ph.split: > > > br label %loop > > > > > >loop: > > > %iv = phi i64 [ %inc, %loop ], [ 1, %loop.ph ] > > > %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv > > > %val = load i64, i64* %src.arrayidx > > > %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv > > > store i64 %val, i64* %dst.arrayidx > > > %inc = add nuw nsw i64 %iv, 1 > > > %cond = icmp eq i64 %inc, %min > > > br i1 %cond, label %exit, label %loop > > > > > >exit: > > > ret void > > >} > > > > > >I'm not quite sure what to call this transform, but it's not IRCE. If this example is actually general enough to cover your use cases, it's going to be a lot easier to judge profitability on than the general form of iteration set splitting > > > > I agree with you. If the llvm community is ok to accept above approach as > a pass or a part of a certain pass, I would be happy to implement it > because I am aiming to handle this case with llvm upstream. > > > > >Another way to frame this special case might be to recognize the > conditional block can be inverted into an early exit. (Reasoning: %iv is > strictly increasing, condition is monotonic, path if not taken has no > observable effect) Consider: > > >loop.ph: > > > br label %loop > > > > > >loop: > > > %iv = phi i64 [ %inc, %for.inc ], [ 1, %loop.ph ] > > > %cmp = icmp sge i64 %iv, %a > > > br i1 %cmp, label %exit, label %for.inc > > > > > >for.inc: > > > %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv > > > %val = load i64, i64* %src.arrayidx > > > %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv > > > store i64 %val, i64* %dst.arrayidx > > > %inc = add nuw nsw i64 %iv, 1 > > > %cond = icmp eq i64 %inc, %n > > > br i1 %cond, label %exit, label %loop > > > > > >exit: > > > ret void > > >} > > >Once that's done, the multiple exit vectorization work should vectorize > this loop. Thinking about it, I really like this variant. > > I have not looked at the multiple exit vectorization work yet but it > looks we could consider the inverted condition as early exit’s condition. > > >The costing here seems quite off. I have not looked at how the vectorize > costs predicated loads on hardware without predication, but needing to > scalarize a conditional VF-times and form a vector again does not have a > cost of 3 million. This could definitely be improved. > > I agree with you. > > > > Additionally, if possible, I would like to suggest to enable or add > transformations in order to help vectorization. For example, as removing > conditional branch inside loop, we could split a loop with dependency, > which blocks vectorization, into vectorizable loop and non-vectorizable one > using transformations like loop distribution. I am not sure why these > features have not been enabled as default on pass manager but it would make > more loops vectorizable. > > > > Thanks > > JinGu Kang > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Best Regards He Jie 何杰 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210511/af917cf8/attachment.html>
Philip Reames via llvm-dev
2021-May-11 15:04 UTC
[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager
This is incorrect. IRCE's current sole known user happens to be a compiler for a GCed language, but there is no (intentional) dependence on that fact. It should work on arbitrary IR. Loop predication (the form in IndVars) triggers for arbitrary IR. The separate pass depends on semantics of guards which is related to deopt semantics, but *not* GC. Philip On 5/11/21 7:17 AM, Jie He wrote:> as I know, current IRCE implementation relies on some preconditions. > it's intended to language runtime with garbage collection, not for > loop vectorization. > the same is true for loop predication, which is also helpful for > eliminating condition check within a loop. > > Jie He > B.R > > On Tue, 11 May 2021 at 20:50, Jingu Kang via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Hi Philip, > > I have extended your suggestion slightly more as below. > > newbound1 = min(n, c) > > newbound2 = max(n, c) > > while (iv < n) { while(iv < newbound1) { > > A A > > if (iv < c) B > > B C > > C } > > } iv = newbound1 > > while (iv < newbound2) { > > A > > C > > } > > I have implemented a simple pass to split bound of loop, which has > conditional branch with IV, as above example. > https://reviews.llvm.org/D102234 > <https://reviews.llvm.org/D102234> It is initial version. If > possible, please review it. > > Thanks > > JinGu Kang > > *From:* Jingu Kang <Jingu.Kang at arm.com <mailto:Jingu.Kang at arm.com>> > *Sent:* 04 May 2021 12:45 > *To:* Philip Reames <listmail at philipreames.com > <mailto:listmail at philipreames.com>>; Jingu Kang > <Jingu.Kang at arm.com <mailto:Jingu.Kang at arm.com>> > *Cc:* llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > *Subject:* RE: [llvm-dev] Enabling IRCE pass or Adding something > similar in the pipeline of new pass manager > > Philip, I appreciate your kind comments. > > >In this example, forming the full pre/main/post loop structure of IRCE is overkill. Instead, we > could simply restrict the loop bounds in the following manner: > > >loop.ph <http://loop.ph>: > > > ;; Warning: psuedo code, might have edge conditions wrong > > > %c = icmp sgt %iv, %n > > > %min = umax(%n, %a) > > > br i1 %c, label %exit, label %loop.ph <http://loop.ph> > > > > > >loop.ph.split: > > > br label %loop > > > > > >loop: > > > %iv = phi i64 [ %inc, %loop ], [ 1, %loop.ph <http://loop.ph> ] > > > %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv > > > %val = load i64, i64* %src.arrayidx > > > %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv > > > store i64 %val, i64* %dst.arrayidx > > > %inc = add nuw nsw i64 %iv, 1 > > > %cond = icmp eq i64 %inc, %min > > > br i1 %cond, label %exit, label %loop > > > > > >exit: > > > ret void > > >} > > > > > >I'm not quite sure what to call this transform, but it's not IRCE. If this example is actually general enough to cover your use cases, it's going to be a lot easier to judge profitability on than the general form of iteration set splitting > > I agree with you. If the llvm community is ok to accept above > approach as a pass or a part of a certain pass, I would be happy > to implement it because I am aiming to handle this case with llvm > upstream. > > >Another way to frame this special case might be to recognize the conditional block can be inverted > into an early exit. (Reasoning: %iv is strictly increasing, > condition is monotonic, path if not taken has no observable > effect) Consider: > > >loop.ph <http://loop.ph>: > > > br label %loop > > > > > >loop: > > > %iv = phi i64 [ %inc, %for.inc ], [ 1, %loop.ph <http://loop.ph> ] > > > %cmp = icmp sge i64 %iv, %a > > > br i1 %cmp, label %exit, label %for.inc > > > > > >for.inc: > > > %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv > > > %val = load i64, i64* %src.arrayidx > > > %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv > > > store i64 %val, i64* %dst.arrayidx > > > %inc = add nuw nsw i64 %iv, 1 > > > %cond = icmp eq i64 %inc, %n > > > br i1 %cond, label %exit, label %loop > > > > > >exit: > > > ret void > > >} > > >Once that's done, the multiple exit vectorization work should vectorize this loop. Thinking about it, > I really like this variant. > > I have not looked at the multiple exit vectorization work yet but > it looks we could consider the inverted condition as early exit’s > condition. > > >The costing here seems quite off. I have not looked at how the vectorize costs predicated loads on > hardware without predication, but needing to scalarize a > conditional VF-times and form a vector again does not have a cost > of 3 million. This could definitely be improved. > > I agree with you. > > Additionally, if possible, I would like to suggest to enable or > add transformations in order to help vectorization. For example, > as removing conditional branch inside loop, we could split a loop > with dependency, which blocks vectorization, into vectorizable > loop and non-vectorizable one using transformations like loop > distribution. I am not sure why these features have not been > enabled as default on pass manager but it would make more loops > vectorizable. > > Thanks > > JinGu Kang > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > > > -- > Best Regards > He Jie 何杰-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210511/0be6d9a8/attachment.html>