Joan Lluch via llvm-dev
2019-Sep-21 12:06 UTC
[llvm-dev] CFG simplification question, and preservation of branching in the original code
Hi all, For my custom architecture, I want to relax the CFG simplification pass, and other passes replacing conditional branches. I found that the replacement of conditional jumps by “select" or other instructions is often too aggressive, and this causes inefficient code for my target, because in most cases a branch would just be cheaper. For example, considering the following c code: long test (long a, long b) { int neg = 0; long res; if (a < 0) { a = -a; neg = 1; } res = a*b; if (neg) res = -res; return res; } This code can be obviously simplified in c, but please just consider it as an example to show the point. The code above gets compiled like this (-Oz flag): ; Function Attrs: minsize norecurse nounwind optsize readnone define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr #0 { entry: %cmp = icmp slt i32 %a, 0 %sub = sub nsw i32 0, %a %a.addr.0 = select i1 %cmp, i32 %sub, i32 %a %mul = mul nsw i32 %a.addr.0, %b %sub2 = sub nsw i32 0, %mul %res.0 = select i1 %cmp, i32 %sub2, i32 %mul ret i32 %res.0 } All branching was removed and replaced by ‘select’ instructions. Unfortunately, 32 bit operations are expensive on my architecture and in most cases it would be desirable to just keep the original branches, which are relatively cheap. The case above could be converted back to branching by the backend, but for the general case, this is not always practical and misses other optimisation opportunities. I tried to set 'phi-node-folding-threshold’ to 1 or even 0, and this definitely improves the situation in many cases, but Clang still creates instances of ‘select’ instructions, which are detrimental to my target. I am unsure about where are they created, as I believe that the simplifycfg pass does not longer create them. So the question is: Are there any other hooks in clang, or custom code that I can implement, to relax the creation of ’select’ instructions as opposed to preserving branches in the original c code? Thanks, John -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190921/92993003/attachment.html>
Joan Lluch via llvm-dev
2019-Sep-21 12:23 UTC
[llvm-dev] CFG simplification question, and preservation of branching in the original code
Sorry, this was intended for the cfe-dev mailing list. I mistakenly placed it here John> On 21 Sep 2019, at 14:06, Joan Lluch via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Hi all, > > For my custom architecture, I want to relax the CFG simplification pass, and other passes replacing conditional branches. > > I found that the replacement of conditional jumps by “select" or other instructions is often too aggressive, and this causes inefficient code for my target, because in most cases a branch would just be cheaper. > > For example, considering the following c code: > > long test (long a, long b) > { > int neg = 0; > long res; > > if (a < 0) > { > a = -a; > neg = 1; > } > > res = a*b; > > if (neg) > res = -res; > > return res; > } > > > This code can be obviously simplified in c, but please just consider it as an example to show the point. > > The code above gets compiled like this (-Oz flag): > > ; Function Attrs: minsize norecurse nounwind optsize readnone > define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr #0 { > entry: > %cmp = icmp slt i32 %a, 0 > %sub = sub nsw i32 0, %a > %a.addr.0 = select i1 %cmp, i32 %sub, i32 %a > %mul = mul nsw i32 %a.addr.0, %b > %sub2 = sub nsw i32 0, %mul > %res.0 = select i1 %cmp, i32 %sub2, i32 %mul > ret i32 %res.0 > } > > > All branching was removed and replaced by ‘select’ instructions. Unfortunately, 32 bit operations are expensive on my architecture and in most cases it would be desirable to just keep the original branches, which are relatively cheap. The case above could be converted back to branching by the backend, but for the general case, this is not always practical and misses other optimisation opportunities. > > I tried to set 'phi-node-folding-threshold’ to 1 or even 0, and this definitely improves the situation in many cases, but Clang still creates instances of ‘select’ instructions, which are detrimental to my target. I am unsure about where are they created, as I believe that the simplifycfg pass does not longer create them. > > So the question is: Are there any other hooks in clang, or custom code that I can implement, to relax the creation of ’select’ instructions as opposed to preserving branches in the original c code? > > Thanks, > > John > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190921/f8dc40e3/attachment.html>
Danila Malyutin via llvm-dev
2019-Sep-23 14:40 UTC
[llvm-dev] CFG simplification question, and preservation of branching in the original code
Hi Joan, One knob you might want to adjust is TargetTransformInfo::getUserCost which is used by ComputeSpeculationCost in SimplifyCFG to determine whether it’s cheap to speculate some instruction or not. -- Danila From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Joan Lluch via llvm-dev Sent: Saturday, September 21, 2019 15:06 To: llvm-dev <llvm-dev at lists.llvm.org> Subject: [llvm-dev] CFG simplification question, and preservation of branching in the original code Hi all, For my custom architecture, I want to relax the CFG simplification pass, and other passes replacing conditional branches. I found that the replacement of conditional jumps by “select" or other instructions is often too aggressive, and this causes inefficient code for my target, because in most cases a branch would just be cheaper. For example, considering the following c code: long test (long a, long b) { int neg = 0; long res; if (a < 0) { a = -a; neg = 1; } res = a*b; if (neg) res = -res; return res; } This code can be obviously simplified in c, but please just consider it as an example to show the point. The code above gets compiled like this (-Oz flag): ; Function Attrs: minsize norecurse nounwind optsize readnone define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr #0 { entry: %cmp = icmp slt i32 %a, 0 %sub = sub nsw i32 0, %a %a.addr.0 = select i1 %cmp, i32 %sub, i32 %a %mul = mul nsw i32 %a.addr.0, %b %sub2 = sub nsw i32 0, %mul %res.0 = select i1 %cmp, i32 %sub2, i32 %mul ret i32 %res.0 } All branching was removed and replaced by ‘select’ instructions. Unfortunately, 32 bit operations are expensive on my architecture and in most cases it would be desirable to just keep the original branches, which are relatively cheap. The case above could be converted back to branching by the backend, but for the general case, this is not always practical and misses other optimisation opportunities. I tried to set 'phi-node-folding-threshold’ to 1 or even 0, and this definitely improves the situation in many cases, but Clang still creates instances of ‘select’ instructions, which are detrimental to my target. I am unsure about where are they created, as I believe that the simplifycfg pass does not longer create them. So the question is: Are there any other hooks in clang, or custom code that I can implement, to relax the creation of ’select’ instructions as opposed to preserving branches in the original c code? Thanks, John -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190923/8041713e/attachment.html>