Joan Lluch via llvm-dev
2019-Sep-21 12:06 UTC
[llvm-dev] CFG simplification question, and preservation of branching in the original code
Hi all,
For my custom architecture, I want to relax the CFG simplification pass, and
other passes replacing conditional branches.
I found that the replacement of conditional jumps by “select" or other
instructions is often too aggressive, and this causes inefficient code for my
target, because in most cases a branch would just be cheaper.
For example, considering the following c code:
long test (long a, long b)
{
int neg = 0;
long res;
if (a < 0)
{
a = -a;
neg = 1;
}
res = a*b;
if (neg)
res = -res;
return res;
}
This code can be obviously simplified in c, but please just consider it as an
example to show the point.
The code above gets compiled like this (-Oz flag):
; Function Attrs: minsize norecurse nounwind optsize readnone
define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr #0 {
entry:
%cmp = icmp slt i32 %a, 0
%sub = sub nsw i32 0, %a
%a.addr.0 = select i1 %cmp, i32 %sub, i32 %a
%mul = mul nsw i32 %a.addr.0, %b
%sub2 = sub nsw i32 0, %mul
%res.0 = select i1 %cmp, i32 %sub2, i32 %mul
ret i32 %res.0
}
All branching was removed and replaced by ‘select’ instructions. Unfortunately,
32 bit operations are expensive on my architecture and in most cases it would be
desirable to just keep the original branches, which are relatively cheap. The
case above could be converted back to branching by the backend, but for the
general case, this is not always practical and misses other optimisation
opportunities.
I tried to set 'phi-node-folding-threshold’ to 1 or even 0, and this
definitely improves the situation in many cases, but Clang still creates
instances of ‘select’ instructions, which are detrimental to my target. I am
unsure about where are they created, as I believe that the simplifycfg pass does
not longer create them.
So the question is: Are there any other hooks in clang, or custom code that I
can implement, to relax the creation of ’select’ instructions as opposed to
preserving branches in the original c code?
Thanks,
John
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190921/92993003/attachment.html>
Joan Lluch via llvm-dev
2019-Sep-21 12:23 UTC
[llvm-dev] CFG simplification question, and preservation of branching in the original code
Sorry, this was intended for the cfe-dev mailing list. I mistakenly placed it here John> On 21 Sep 2019, at 14:06, Joan Lluch via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Hi all, > > For my custom architecture, I want to relax the CFG simplification pass, and other passes replacing conditional branches. > > I found that the replacement of conditional jumps by “select" or other instructions is often too aggressive, and this causes inefficient code for my target, because in most cases a branch would just be cheaper. > > For example, considering the following c code: > > long test (long a, long b) > { > int neg = 0; > long res; > > if (a < 0) > { > a = -a; > neg = 1; > } > > res = a*b; > > if (neg) > res = -res; > > return res; > } > > > This code can be obviously simplified in c, but please just consider it as an example to show the point. > > The code above gets compiled like this (-Oz flag): > > ; Function Attrs: minsize norecurse nounwind optsize readnone > define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr #0 { > entry: > %cmp = icmp slt i32 %a, 0 > %sub = sub nsw i32 0, %a > %a.addr.0 = select i1 %cmp, i32 %sub, i32 %a > %mul = mul nsw i32 %a.addr.0, %b > %sub2 = sub nsw i32 0, %mul > %res.0 = select i1 %cmp, i32 %sub2, i32 %mul > ret i32 %res.0 > } > > > All branching was removed and replaced by ‘select’ instructions. Unfortunately, 32 bit operations are expensive on my architecture and in most cases it would be desirable to just keep the original branches, which are relatively cheap. The case above could be converted back to branching by the backend, but for the general case, this is not always practical and misses other optimisation opportunities. > > I tried to set 'phi-node-folding-threshold’ to 1 or even 0, and this definitely improves the situation in many cases, but Clang still creates instances of ‘select’ instructions, which are detrimental to my target. I am unsure about where are they created, as I believe that the simplifycfg pass does not longer create them. > > So the question is: Are there any other hooks in clang, or custom code that I can implement, to relax the creation of ’select’ instructions as opposed to preserving branches in the original c code? > > Thanks, > > John > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190921/f8dc40e3/attachment.html>
Danila Malyutin via llvm-dev
2019-Sep-23 14:40 UTC
[llvm-dev] CFG simplification question, and preservation of branching in the original code
Hi Joan,
One knob you might want to adjust is TargetTransformInfo::getUserCost which is
used by ComputeSpeculationCost in SimplifyCFG to determine whether it’s cheap to
speculate some instruction or not.
--
Danila
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Joan
Lluch via llvm-dev
Sent: Saturday, September 21, 2019 15:06
To: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] CFG simplification question, and preservation of branching
in the original code
Hi all,
For my custom architecture, I want to relax the CFG simplification pass, and
other passes replacing conditional branches.
I found that the replacement of conditional jumps by “select" or other
instructions is often too aggressive, and this causes inefficient code for my
target, because in most cases a branch would just be cheaper.
For example, considering the following c code:
long test (long a, long b)
{
int neg = 0;
long res;
if (a < 0)
{
a = -a;
neg = 1;
}
res = a*b;
if (neg)
res = -res;
return res;
}
This code can be obviously simplified in c, but please just consider it as an
example to show the point.
The code above gets compiled like this (-Oz flag):
; Function Attrs: minsize norecurse nounwind optsize readnone
define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr #0 {
entry:
%cmp = icmp slt i32 %a, 0
%sub = sub nsw i32 0, %a
%a.addr.0 = select i1 %cmp, i32 %sub, i32 %a
%mul = mul nsw i32 %a.addr.0, %b
%sub2 = sub nsw i32 0, %mul
%res.0 = select i1 %cmp, i32 %sub2, i32 %mul
ret i32 %res.0
}
All branching was removed and replaced by ‘select’ instructions. Unfortunately,
32 bit operations are expensive on my architecture and in most cases it would be
desirable to just keep the original branches, which are relatively cheap. The
case above could be converted back to branching by the backend, but for the
general case, this is not always practical and misses other optimisation
opportunities.
I tried to set 'phi-node-folding-threshold’ to 1 or even 0, and this
definitely improves the situation in many cases, but Clang still creates
instances of ‘select’ instructions, which are detrimental to my target. I am
unsure about where are they created, as I believe that the simplifycfg pass does
not longer create them.
So the question is: Are there any other hooks in clang, or custom code that I
can implement, to relax the creation of ’select’ instructions as opposed to
preserving branches in the original c code?
Thanks,
John
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190923/8041713e/attachment.html>