Jingu Kang via llvm-dev
2021-May-04 11:44 UTC
[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager
Philip, I appreciate your kind comments.>In this example, forming the full pre/main/post loop structure of IRCE is overkill. Instead, we could simply restrict the loop bounds in the following manner:>loop.ph<http://loop.ph>:> ;; Warning: psuedo code, might have edge conditions wrong> %c = icmp sgt %iv, %n> %min = umax(%n, %a)> br i1 %c, label %exit, label %loop.ph<http://loop.ph>>>loop.ph.split:> br label %loop>>loop:> %iv = phi i64 [ %inc, %loop ], [ 1, %loop.ph<http://loop.ph> ]> %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv> %val = load i64, i64* %src.arrayidx> %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv> store i64 %val, i64* %dst.arrayidx> %inc = add nuw nsw i64 %iv, 1> %cond = icmp eq i64 %inc, %min> br i1 %cond, label %exit, label %loop>>exit:> ret void>}>>I'm not quite sure what to call this transform, but it's not IRCE. If this example is actually general enough to cover your use cases, it's going to be a lot easier to judge profitability on than the general form of iteration set splittingI agree with you. If the llvm community is ok to accept above approach as a pass or a part of a certain pass, I would be happy to implement it because I am aiming to handle this case with llvm upstream.>Another way to frame this special case might be to recognize the conditional block can be inverted into an early exit. (Reasoning: %iv is strictly increasing, condition is monotonic, path if not taken has no observable effect) Consider:>loop.ph<http://loop.ph>:> br label %loop>>loop:> %iv = phi i64 [ %inc, %for.inc ], [ 1, %loop.ph<http://loop.ph> ]> %cmp = icmp sge i64 %iv, %a> br i1 %cmp, label %exit, label %for.inc>>for.inc:> %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv> %val = load i64, i64* %src.arrayidx> %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv> store i64 %val, i64* %dst.arrayidx> %inc = add nuw nsw i64 %iv, 1> %cond = icmp eq i64 %inc, %n> br i1 %cond, label %exit, label %loop>>exit:> ret void>}>Once that's done, the multiple exit vectorization work should vectorize this loop. Thinking about it, I really like this variant.I have not looked at the multiple exit vectorization work yet but it looks we could consider the inverted condition as early exit’s condition.>The costing here seems quite off. I have not looked at how the vectorize costs predicated loads on hardware without predication, but needing to scalarize a conditional VF-times and form a vector again does not have a cost of 3 million. This could definitely be improved.I agree with you. Additionally, if possible, I would like to suggest to enable or add transformations in order to help vectorization. For example, as removing conditional branch inside loop, we could split a loop with dependency, which blocks vectorization, into vectorizable loop and non-vectorizable one using transformations like loop distribution. I am not sure why these features have not been enabled as default on pass manager but it would make more loops vectorizable. Thanks JinGu Kang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210504/8d7322e6/attachment.html>
Jingu Kang via llvm-dev
2021-May-11 12:49 UTC
[llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager
Hi Philip, I have extended your suggestion slightly more as below. newbound1 = min(n, c) newbound2 = max(n, c) while (iv < n) { while(iv < newbound1) { A A if (iv < c) B B C C } } iv = newbound1 while (iv < newbound2) { A C } I have implemented a simple pass to split bound of loop, which has conditional branch with IV, as above example. https://reviews.llvm.org/D102234 It is initial version. If possible, please review it. Thanks JinGu Kang From: Jingu Kang <Jingu.Kang at arm.com> Sent: 04 May 2021 12:45 To: Philip Reames <listmail at philipreames.com>; Jingu Kang <Jingu.Kang at arm.com> Cc: llvm-dev at lists.llvm.org Subject: RE: [llvm-dev] Enabling IRCE pass or Adding something similar in the pipeline of new pass manager Philip, I appreciate your kind comments.>In this example, forming the full pre/main/post loop structure of IRCE is overkill. Instead, we could simply restrict the loop bounds in the following manner:>loop.ph<http://loop.ph>:> ;; Warning: psuedo code, might have edge conditions wrong> %c = icmp sgt %iv, %n> %min = umax(%n, %a)> br i1 %c, label %exit, label %loop.ph<http://loop.ph>>>loop.ph.split:> br label %loop>>loop:> %iv = phi i64 [ %inc, %loop ], [ 1, %loop.ph<http://loop.ph> ]> %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv> %val = load i64, i64* %src.arrayidx> %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv> store i64 %val, i64* %dst.arrayidx> %inc = add nuw nsw i64 %iv, 1> %cond = icmp eq i64 %inc, %min> br i1 %cond, label %exit, label %loop>>exit:> ret void>}>>I'm not quite sure what to call this transform, but it's not IRCE. If this example is actually general enough to cover your use cases, it's going to be a lot easier to judge profitability on than the general form of iteration set splittingI agree with you. If the llvm community is ok to accept above approach as a pass or a part of a certain pass, I would be happy to implement it because I am aiming to handle this case with llvm upstream.>Another way to frame this special case might be to recognize the conditional block can be inverted into an early exit. (Reasoning: %iv is strictly increasing, condition is monotonic, path if not taken has no observable effect) Consider:>loop.ph<http://loop.ph>:> br label %loop>>loop:> %iv = phi i64 [ %inc, %for.inc ], [ 1, %loop.ph<http://loop.ph> ]> %cmp = icmp sge i64 %iv, %a> br i1 %cmp, label %exit, label %for.inc>>for.inc:> %src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv> %val = load i64, i64* %src.arrayidx> %dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv> store i64 %val, i64* %dst.arrayidx> %inc = add nuw nsw i64 %iv, 1> %cond = icmp eq i64 %inc, %n> br i1 %cond, label %exit, label %loop>>exit:> ret void>}>Once that's done, the multiple exit vectorization work should vectorize this loop. Thinking about it, I really like this variant.I have not looked at the multiple exit vectorization work yet but it looks we could consider the inverted condition as early exit’s condition.>The costing here seems quite off. I have not looked at how the vectorize costs predicated loads on hardware without predication, but needing to scalarize a conditional VF-times and form a vector again does not have a cost of 3 million. This could definitely be improved.I agree with you. Additionally, if possible, I would like to suggest to enable or add transformations in order to help vectorization. For example, as removing conditional branch inside loop, we could split a loop with dependency, which blocks vectorization, into vectorizable loop and non-vectorizable one using transformations like loop distribution. I am not sure why these features have not been enabled as default on pass manager but it would make more loops vectorizable. Thanks JinGu Kang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210511/562e884e/attachment.html>