thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] CFG simplification question, and preservation of branching in the original code [Sep 2019]

If this information is useful, please help other people find it:
Share via:

Sanjay Patel via llvm-dev

2019-Sep-25 14:00 UTC

[llvm-dev] [cfe-dev] CFG simplification question, and preservation of branching in the original code

Changing the order of the checks in CodeGenPrepare::optimizeSelectInst()
sounds good to me.

But you may need to go further for optimum performance. For example, we may
be canonicalizing math/logic IR patterns into 'select' such as in the
recent:
https://reviews.llvm.org/D67799

So if you want those to become ALU ops again rather than branches, then you
need to do the transform later in the backend. That is, you want to let
DAGCombiner run its set of transforms on 'select' nodes.

On Wed, Sep 25, 2019 at 4:03 AM Joan Lluch via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> Hi Craig,
>
> Thank you for your reply. I have started looking at “CodeGenPrepare” and I
> assume you reffer to CodeGenPrepare::optimizeSelectInst. I will try to
> play a bit with that possibly later today. At first glance, it looks to me
> that for targets that do not support ’select’ at all, the fact that the
> function exits early for ‘OptSize’ can be detrimental, because this will
> just leave ALL existing selects in the code anyway. As said, I will try to
> play with that later, but right now it looks to me that maybe we should
> check  for TLI->isSelectSupported earlier in the function, to get some
> more opportunities to such targets without explicit ’select’ support?
>
> Thanks
>
> John
>
>
> On 25 Sep 2019, at 08:59, Craig Topper <craig.topper at gmail.com>
wrote:
>
> There is code in CodeGenPrepare.cpp that can turn selects into branches
> that tries to account for multiple selects sharing the same condition. It
> doesn't look like either AVR or MSP430 enable that code though.
>
> ~Craig
>
>
> On Tue, Sep 24, 2019 at 11:27 PM Joan Lluch via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> Hi Roman,
>>
>> Thank you for your reply. I understand your point. I just want to add
>> something to clarify my original post in relation to your reply.
>>
>> There are already implemented 8-bit and 16-bit backends, namely the AVR
>> and the MSP430, which already "aggressively convert selects into
branches”,
>> which already benefit (as they are) from setting
>> "phi-node-folding-threshold’ to 1 or zero. This is because
otherwise Clang
>> will generate several selects depending on the same “icmp”. These
backends
>> are unable to optimise that, and they just create a comparison and a
>> conditional branch for every “select” in the IR code, in spite that the
>> original C code was already written in a much better way. So the
resulting
>> effect is the presence of redundant comparisons and branches in the
final
>> code, with a detrimental of generated code quality.
>>
>> The above gets improved by setting "phi-node-folding-threshold’ to
1
>> because some of these extra ‘selects' are no longer there so the
backend
>> stops generating redundant code.
>>
>> John.
>>
>>
>>
>>
>> > On 21 Sep 2019, at 14:48, Roman Lebedev <lebedev.ri at
gmail.com> wrote:
>> >
>> > On Sat, Sep 21, 2019 at 3:18 PM Joan Lluch via cfe-dev
>> > <cfe-dev at lists.llvm.org> wrote:
>> >>
>> >> Hi all,
>> >>
>> >> For my custom architecture, I want to relax the CFG
simplification
>> pass, and any other passes replacing conditional branches.
>> >>
>> >> I found that the replacement of conditional branches by
“select" and
>> other instructions is often too aggressive, and this causes inefficient
>> code for my target as in most cases branches would be cheaper.
>> >>
>> >> For example, considering the following c code:
>> >>
>> >> long test (long a, long b)
>> >> {
>> >>  int neg = 0;
>> >>  long res;
>> >>
>> >>  if (a < 0)
>> >>  {
>> >>    a = -a;
>> >>    neg = 1;
>> >>  }
>> >>
>> >>  res = a*b;
>> >>
>> >>  if (neg)
>> >>    res = -res;
>> >>
>> >>  return res;
>> >> }
>> >>
>> >>
>> >> This code can be simplified in c, but it’s just an example to
show the
>> point.
>> >>
>> >> The code above gets compiled like this (-Oz flag):
>> >>
>> >> ; Function Attrs: minsize norecurse nounwind optsize readnone
>> >> define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr
#0 {
>> >> entry:
>> >>  %cmp = icmp slt i32 %a, 0
>> >>  %sub = sub nsw i32 0, %a
>> >>  %a.addr.0 = select i1 %cmp, i32 %sub, i32 %a
>> >>  %mul = mul nsw i32 %a.addr.0, %b
>> >>  %sub2 = sub nsw i32 0, %mul
>> >>  %res.0 = select i1 %cmp, i32 %sub2, i32 %mul
>> >>  ret i32 %res.0
>> >> }
>> >>
>> >>
>> >> All branching was removed and replaced by ‘select’
instructions. For
>> my architecture, it would be desirable to keep the original branches in
>> most cases, because even simple 32 bit operations are too expensive to
>> speculatively execute them, and branches are cheap.
>> >>
>> >> Setting  'phi-node-folding-threshold’ to 1 or even 0
(instead of the
>> default 2), definitely improves the situation in many cases, but Clang
>> still creates many instances of ‘select’ instructions, which are
>> detrimental to my target. I am unsure about where are they created, as
I
>> believe that the simplifycfg pass does not longer create them.
>> > You definitively can't ban llvm passes/clang from creating
select's.
>> >
>> >> So the question is: Are there any other hooks in clang, or
custom code
>> that I can implement, to relax the creation of ’select’ instructions
and
>> make it preserve branches in the original c code?
>> > I think this is backwards.
>> > Sure, you could maybe disable most of the folds that produce
selects.
>> > That may be good for final codegen, but will also affect other
passes
>> > since not everything deals with 2-node PHI as good as wit selects.
>> >
>> > But, what happens if you still get the select-y IR?
>> > Doesn't matter how, could be hand-written.
>> >
>> > I think you might want to instead aggressively convert selects
into
>> > branches in backend.
>> >
>> >> Thanks,
>> >>
>> >> John
>> > Roman
>> >
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >> _______________________________________________
>> >> cfe-dev mailing list
>> >> cfe-dev at lists.llvm.org
>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190925/954487aa/attachment.html>

Joan Lluch via llvm-dev

2019-Sep-29 12:35 UTC

head link

[llvm-dev] [cfe-dev] CFG simplification question, and preservation of branching in the original code

Hi Sanjay,

Actually, the CodeGenPrepare::optimizeSelectInst is not doing the best it could
do in some circumstances: The case of “OptSize" for targets not supporting
Select was already mentioned to be detrimental.

For targets that actually have selects, but branches are cheap and generally
profitable, particularly for expensive operators, the optimizeSelectInst
function does not do good either. The function tries to identify consecutive
selects with the same condition in order to avoid duplicate branches, which is
ok, but then this effort is discarded in isFormingBranchFromSelectProfitable
because the identified condition is used more than once (on the said two
consecutive selects, of course), which defeats the whole purpose of checking for
them, resulting in poor codegen.

Yet another issue is that Clang attempts to replace ‘selects’ in the source
code, by supposedly optimised code that is not ok for all targets. One example
is this:

long test (long a, long b)
{
  int neg = 0;
  long res;

  if (a < 0)
  {
    a = -a;
    neg = 1;
  }

  if (b < 0)
  {
    b = -b;
    neg = !neg;
  }

  res = a*b; //(unsigned long)a / (unsigned long)b;  // will call __udivsi3

  if (neg)
    res = -res;

  return res;
}


This gets compiled into

; Function Attrs: norecurse nounwind readnone
define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr #0 {
entry:
  %cmp = icmp slt i32 %a, 0
  %sub = sub nsw i32 0, %a
  %a.addr.0 = select i1 %cmp, i32 %sub, i32 %a
  %a.lobit = lshr i32 %a, 31
  %0 = trunc i32 %a.lobit to i16
  %cmp1 = icmp slt i32 %b, 0
  br i1 %cmp1, label %if.then2, label %if.end4

if.then2:                                         ; preds = %entry
  %sub3 = sub nsw i32 0, %b
  %1 = xor i16 %0, 1
  br label %if.end4

if.end4:                                          ; preds = %if.then2, %entry
  %b.addr.0 = phi i32 [ %sub3, %if.then2 ], [ %b, %entry ]
  %neg.1 = phi i16 [ %1, %if.then2 ], [ %0, %entry ]
  %mul = mul nsw i32 %b.addr.0, %a.addr.0
  %tobool5 = icmp eq i16 %neg.1, 0
  %sub7 = sub nsw i32 0, %mul
  %res.0 = select i1 %tobool5, i32 %mul, i32 %sub7
  ret i32 %res.0
}

The offending part here is this:  %a.lobit = lshr i32 %a, 31 . Instead of just
creating a “select” instruction, as the original code suggested with the if (a
< 0) { neg = 1;} statements, the front-end produces a lshr which is very
expensive for small architectures, and makes it very difficult for the backend
to fold it again into an actual select (or branch). In my opinion, the original
C code should have produced a “select” and give the backend the opportunity to
optimise it if required. I think that the frontend should perform only target
independent optimisations.

I posted before my view that LLVM is clearly designed to satisfy big boys such
as the x86 and ARM targets. This means that, unfortunately, it makes too many
general assumptions about what’s cheap, without providing enough hooks to cancel
arbitrary optimisations. As I am implementing backends for 8 or 16 bit targets,
I find myself doing a lot of work just to reverse optimisations that should have
not been applied in the first place. My example above is an instance of a code
mutation performed by the frontend that is not desirable. Existing 8 and 16 bit
trunk targets (particularly the MSP430 and the AVR) are also negatively affected
by the excessively liberal use of shifts by LLVM.

The CodeGenPrepare::optimizeSelectInst function needs some changes to respect
targets with no selects, and targets that may want to avoid expensive
speculative executions.

John


> On 25 Sep 2019, at 16:00, Sanjay Patel <spatel at rotateright.com>
wrote:
> 
> Changing the order of the checks in CodeGenPrepare::optimizeSelectInst()
sounds good to me.
> 
> But you may need to go further for optimum performance. For example, we may
be canonicalizing math/logic IR patterns into 'select' such as in the
recent:
> https://reviews.llvm.org/D67799 <https://reviews.llvm.org/D67799>
> 
> So if you want those to become ALU ops again rather than branches, then you
need to do the transform later in the backend. That is, you want to let
DAGCombiner run its set of transforms on 'select' nodes.
> 
> On Wed, Sep 25, 2019 at 4:03 AM Joan Lluch via cfe-dev <cfe-dev at
lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
> Hi Craig,
> 
> Thank you for your reply. I have started looking at “CodeGenPrepare” and I
assume you reffer to CodeGenPrepare::optimizeSelectInst. I will try to play a
bit with that possibly later today. At first glance, it looks to me that for
targets that do not support ’select’ at all, the fact that the function exits
early for ‘OptSize’ can be detrimental, because this will just leave ALL
existing selects in the code anyway. As said, I will try to play with that
later, but right now it looks to me that maybe we should check  for
TLI->isSelectSupported earlier in the function, to get some more
opportunities to such targets without explicit ’select’ support?
> 
> Thanks 
> 
> John
> 
> 
>> On 25 Sep 2019, at 08:59, Craig Topper <craig.topper at gmail.com
<mailto:craig.topper at gmail.com>> wrote:
>> 
>> There is code in CodeGenPrepare.cpp that can turn selects into branches
that tries to account for multiple selects sharing the same condition. It
doesn't look like either AVR or MSP430 enable that code though.
>> 
>> ~Craig
>> 
>> 
>> On Tue, Sep 24, 2019 at 11:27 PM Joan Lluch via cfe-dev <cfe-dev at
lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>> Hi Roman,
>> 
>> Thank you for your reply. I understand your point. I just want to add
something to clarify my original post in relation to your reply.
>> 
>> There are already implemented 8-bit and 16-bit backends, namely the AVR
and the MSP430, which already "aggressively convert selects into branches”,
which already benefit (as they are) from setting
"phi-node-folding-threshold’ to 1 or zero. This is because otherwise Clang
will generate several selects depending on the same “icmp”. These backends are
unable to optimise that, and they just create a comparison and a conditional
branch for every “select” in the IR code, in spite that the original C code was
already written in a much better way. So the resulting effect is the presence of
redundant comparisons and branches in the final code, with a detrimental of
generated code quality.
>> 
>> The above gets improved by setting "phi-node-folding-threshold’ to
1 because some of these extra ‘selects' are no longer there so the backend
stops generating redundant code.
>> 
>> John.
>> 
>> 
>> 
>> 
>> > On 21 Sep 2019, at 14:48, Roman Lebedev <lebedev.ri at
gmail.com <mailto:lebedev.ri at gmail.com>> wrote:
>> > 
>> > On Sat, Sep 21, 2019 at 3:18 PM Joan Lluch via cfe-dev
>> > <cfe-dev at lists.llvm.org <mailto:cfe-dev at
lists.llvm.org>> wrote:
>> >> 
>> >> Hi all,
>> >> 
>> >> For my custom architecture, I want to relax the CFG
simplification pass, and any other passes replacing conditional branches.
>> >> 
>> >> I found that the replacement of conditional branches by
“select" and other instructions is often too aggressive, and this causes
inefficient code for my target as in most cases branches would be cheaper.
>> >> 
>> >> For example, considering the following c code:
>> >> 
>> >> long test (long a, long b)
>> >> {
>> >>  int neg = 0;
>> >>  long res;
>> >> 
>> >>  if (a < 0)
>> >>  {
>> >>    a = -a;
>> >>    neg = 1;
>> >>  }
>> >> 
>> >>  res = a*b;
>> >> 
>> >>  if (neg)
>> >>    res = -res;
>> >> 
>> >>  return res;
>> >> }
>> >> 
>> >> 
>> >> This code can be simplified in c, but it’s just an example to
show the point.
>> >> 
>> >> The code above gets compiled like this (-Oz flag):
>> >> 
>> >> ; Function Attrs: minsize norecurse nounwind optsize readnone
>> >> define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr
#0 {
>> >> entry:
>> >>  %cmp = icmp slt i32 %a, 0
>> >>  %sub = sub nsw i32 0, %a
>> >>  %a.addr.0 = select i1 %cmp, i32 %sub, i32 %a
>> >>  %mul = mul nsw i32 %a.addr.0, %b
>> >>  %sub2 = sub nsw i32 0, %mul
>> >>  %res.0 = select i1 %cmp, i32 %sub2, i32 %mul
>> >>  ret i32 %res.0
>> >> }
>> >> 
>> >> 
>> >> All branching was removed and replaced by ‘select’
instructions. For my architecture, it would be desirable to keep the original
branches in most cases, because even simple 32 bit operations are too expensive
to speculatively execute them, and branches are cheap.
>> >> 
>> >> Setting  'phi-node-folding-threshold’ to 1 or even 0
(instead of the default 2), definitely improves the situation in many cases, but
Clang still creates many instances of ‘select’ instructions, which are
detrimental to my target. I am unsure about where are they created, as I believe
that the simplifycfg pass does not longer create them.
>> > You definitively can't ban llvm passes/clang from creating
select's.
>> > 
>> >> So the question is: Are there any other hooks in clang, or
custom code that I can implement, to relax the creation of ’select’ instructions
and make it preserve branches in the original c code?
>> > I think this is backwards.
>> > Sure, you could maybe disable most of the folds that produce
selects.
>> > That may be good for final codegen, but will also affect other
passes
>> > since not everything deals with 2-node PHI as good as wit selects.
>> > 
>> > But, what happens if you still get the select-y IR?
>> > Doesn't matter how, could be hand-written.
>> > 
>> > I think you might want to instead aggressively convert selects
into
>> > branches in backend.
>> > 
>> >> Thanks,
>> >> 
>> >> John
>> > Roman
>> > 
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>> >> _______________________________________________
>> >> cfe-dev mailing list
>> >> cfe-dev at lists.llvm.org <mailto:cfe-dev at
lists.llvm.org>
>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
<https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>> 
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
<https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
<https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190929/aff429b0/attachment.html>

Roman Lebedev via llvm-dev

2019-Sep-29 13:57 UTC

head link

[llvm-dev] [cfe-dev] CFG simplification question, and preservation of branching in the original code

On Sun, Sep 29, 2019 at 3:35 PM Joan Lluch via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> Hi Sanjay,
>
> Actually, the CodeGenPrepare::optimizeSelectInst is not doing the best it
could do in some circumstances: The case of “OptSize" for targets not
supporting Select was already mentioned to be detrimental.
>
> For targets that actually have selects, but branches are cheap and
generally profitable, particularly for expensive operators, the
optimizeSelectInst function does not do good either. The function tries to
identify consecutive selects with the same condition in order to avoid duplicate
branches, which is ok, but then this effort is discarded in
isFormingBranchFromSelectProfitable because the identified condition is used
more than once (on the said two consecutive selects, of course), which defeats
the whole purpose of checking for them, resulting in poor codegen.
>
> Yet another issue is that Clang attempts to replace ‘selects’ in the source
code, by supposedly optimised code that is not ok for all targets. One example
is this:LLVM, not clang.
> long test (long a, long b)
> {
>   int neg = 0;
>   long res;
>
>   if (a < 0)
>   {
>     a = -a;
>     neg = 1;
>   }
>
>   if (b < 0)
>   {
>     b = -b;
>     neg = !neg;
>   }
>
>   res = a*b; //(unsigned long)a / (unsigned long)b;  // will call __udivsi3
>
>   if (neg)
>     res = -res;
>
>   return res;
> }
>
>
> This gets compiled into
>
> ; Function Attrs: norecurse nounwind readnone
> define dso_local i32 @test(i32 %a, i32 %b) local_unnamed_addr #0 {
> entry:
>   %cmp = icmp slt i32 %a, 0
>   %sub = sub nsw i32 0, %a
>   %a.addr.0 = select i1 %cmp, i32 %sub, i32 %a
>   %a.lobit = lshr i32 %a, 31
>   %0 = trunc i32 %a.lobit to i16
>   %cmp1 = icmp slt i32 %b, 0
>   br i1 %cmp1, label %if.then2, label %if.end4
>
> if.then2:                                         ; preds = %entry
>   %sub3 = sub nsw i32 0, %b
>   %1 = xor i16 %0, 1
>   br label %if.end4
>
> if.end4:                                          ; preds = %if.then2,
%entry
>   %b.addr.0 = phi i32 [ %sub3, %if.then2 ], [ %b, %entry ]
>   %neg.1 = phi i16 [ %1, %if.then2 ], [ %0, %entry ]
>   %mul = mul nsw i32 %b.addr.0, %a.addr.0
>   %tobool5 = icmp eq i16 %neg.1, 0
>   %sub7 = sub nsw i32 0, %mul
>   %res.0 = select i1 %tobool5, i32 %mul, i32 %sub7
>   ret i32 %res.0
> }
>
> The offending part here is this:  %a.lobit = lshr i32 %a, 31 . Instead of
just creating a “select” instruction, as the original code suggested with the if
(a < 0) { neg = 1;} statements, the front-end produces a lshr which is very
expensive for small architectures, and makes it very difficult for the backend
to fold it again into an actual select (or branch). In my opinion, the original
C code should have produced a “select” and give the backend the opportunity to
optimise it if required. I think that the frontend should perform only target
independent optimisations.
You didn't specify how you compile that code.
We could also get: https://godbolt.org/z/B-5lj1
Which can actually be folded further to just
  long test(long a, long b) {
    return a * b;
  }
Is "test" actually an implementation of a 64-bit-wide multiplication
compiler-rt builtin?
Then i'd think the main problem is that it is being optimized in the
first place, you could end up with endless recursion...
> I posted before my view that LLVM is clearly designed to satisfy big boys
such as the x86 and ARM targets. This means that, unfortunately, it makes too
many general assumptions about what’s cheap, without providing enough hooks to
cancel arbitrary optimisations. As I am implementing backends for 8 or 16 bit
targets, I find myself doing a lot of work just to reverse optimisations that
should have not been applied in the first place. My example above is an instance
of a code mutation performed by the frontend that is not desirable. Existing 8
and 16 bit trunk targets (particularly the MSP430 and the AVR) are also
negatively affected by the excessively liberal use of shifts by LLVM.
>
> The CodeGenPrepare::optimizeSelectInst function needs some changes to
respect targets with no selects, and targets that may want to avoid expensive
speculative executions.
>
> JohnRoman
> On 25 Sep 2019, at 16:00, Sanjay Patel <spatel at rotateright.com>
wrote:
>
> Changing the order of the checks in CodeGenPrepare::optimizeSelectInst()
sounds good to me.
>
> But you may need to go further for optimum performance. For example, we may
be canonicalizing math/logic IR patterns into 'select' such as in the
recent:
> https://reviews.llvm.org/D67799
>
> So if you want those to become ALU ops again rather than branches, then you
need to do the transform later in the backend. That is, you want to let
DAGCombiner run its set of transforms on 'select' nodes.
>
> On Wed, Sep 25, 2019 at 4:03 AM Joan Lluch via cfe-dev <cfe-dev at
lists.llvm.org> wrote:
>>
>> Hi Craig,
>>
>> Thank you for your reply. I have started looking at “CodeGenPrepare”
and I assume you reffer to CodeGenPrepare::optimizeSelectInst. I will try to
play a bit with that possibly later today. At first glance, it looks to me that
for targets that do not support ’select’ at all, the fact that the function
exits early for ‘OptSize’ can be detrimental, because this will just leave ALL
existing selects in the code anyway. As said, I will try to play with that
later, but right now it looks to me that maybe we should check  for
TLI->isSelectSupported earlier in the function, to get some more
opportunities to such targets without explicit ’select’ support?
>>
>> Thanks
>>
>> John
>>
>>
>> On 25 Sep 2019, at 08:59, Craig Topper <craig.topper at
gmail.com> wrote:
>>
>> There is code in CodeGenPrepare.cpp that can turn selects into branches
that tries to account for multiple selects sharing the same condition. It
doesn't look like either AVR or MSP430 enable that code though.
>>
>> ~Craig
>>
>>
>> On Tue, Sep 24, 2019 at 11:27 PM Joan Lluch via cfe-dev <cfe-dev at
lists.llvm.org> wrote:
>>>
>>> Hi Roman,
>>>
>>> Thank you for your reply. I understand your point. I just want to
add something to clarify my original post in relation to your reply.
>>>
>>> There are already implemented 8-bit and 16-bit backends, namely the
AVR and the MSP430, which already "aggressively convert selects into
branches”, which already benefit (as they are) from setting
"phi-node-folding-threshold’ to 1 or zero. This is because otherwise Clang
will generate several selects depending on the same “icmp”. These backends are
unable to optimise that, and they just create a comparison and a conditional
branch for every “select” in the IR code, in spite that the original C code was
already written in a much better way. So the resulting effect is the presence of
redundant comparisons and branches in the final code, with a detrimental of
generated code quality.
>>>
>>> The above gets improved by setting
"phi-node-folding-threshold’ to 1 because some of these extra ‘selects'
are no longer there so the backend stops generating redundant code.
>>>
>>> John.
>>>
>>>
>>>
>>>
>>> > On 21 Sep 2019, at 14:48, Roman Lebedev <lebedev.ri at
gmail.com> wrote:
>>> >
>>> > On Sat, Sep 21, 2019 at 3:18 PM Joan Lluch via cfe-dev
>>> > <cfe-dev at lists.llvm.org> wrote:
>>> >>
>>> >> Hi all,
>>> >>
>>> >> For my custom architecture, I want to relax the CFG
simplification pass, and any other passes replacing conditional branches.
>>> >>
>>> >> I found that the replacement of conditional branches by
“select" and other instructions is often too aggressive, and this causes
inefficient code for my target as in most cases branches would be cheaper.
>>> >>
>>> >> For example, considering the following c code:
>>> >>
>>> >> long test (long a, long b)
>>> >> {
>>> >>  int neg = 0;
>>> >>  long res;
>>> >>
>>> >>  if (a < 0)
>>> >>  {
>>> >>    a = -a;
>>> >>    neg = 1;
>>> >>  }
>>> >>
>>> >>  res = a*b;
>>> >>
>>> >>  if (neg)
>>> >>    res = -res;
>>> >>
>>> >>  return res;
>>> >> }
>>> >>
>>> >>
>>> >> This code can be simplified in c, but it’s just an example
to show the point.
>>> >>
>>> >> The code above gets compiled like this (-Oz flag):
>>> >>
>>> >> ; Function Attrs: minsize norecurse nounwind optsize
readnone
>>> >> define dso_local i32 @test(i32 %a, i32 %b)
local_unnamed_addr #0 {
>>> >> entry:
>>> >>  %cmp = icmp slt i32 %a, 0
>>> >>  %sub = sub nsw i32 0, %a
>>> >>  %a.addr.0 = select i1 %cmp, i32 %sub, i32 %a
>>> >>  %mul = mul nsw i32 %a.addr.0, %b
>>> >>  %sub2 = sub nsw i32 0, %mul
>>> >>  %res.0 = select i1 %cmp, i32 %sub2, i32 %mul
>>> >>  ret i32 %res.0
>>> >> }
>>> >>
>>> >>
>>> >> All branching was removed and replaced by ‘select’
instructions. For my architecture, it would be desirable to keep the original
branches in most cases, because even simple 32 bit operations are too expensive
to speculatively execute them, and branches are cheap.
>>> >>
>>> >> Setting  'phi-node-folding-threshold’ to 1 or even 0
(instead of the default 2), definitely improves the situation in many cases, but
Clang still creates many instances of ‘select’ instructions, which are
detrimental to my target. I am unsure about where are they created, as I believe
that the simplifycfg pass does not longer create them.
>>> > You definitively can't ban llvm passes/clang from creating
select's.
>>> >
>>> >> So the question is: Are there any other hooks in clang, or
custom code that I can implement, to relax the creation of ’select’ instructions
and make it preserve branches in the original c code?
>>> > I think this is backwards.
>>> > Sure, you could maybe disable most of the folds that produce
selects.
>>> > That may be good for final codegen, but will also affect other
passes
>>> > since not everything deals with 2-node PHI as good as wit
selects.
>>> >
>>> > But, what happens if you still get the select-y IR?
>>> > Doesn't matter how, could be hand-written.
>>> >
>>> > I think you might want to instead aggressively convert selects
into
>>> > branches in backend.
>>> >
>>> >> Thanks,
>>> >>
>>> >> John
>>> > Roman
>>> >
>>> >> _______________________________________________
>>> >> LLVM Developers mailing list
>>> >> llvm-dev at lists.llvm.org
>>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >> _______________________________________________
>>> >> cfe-dev mailing list
>>> >> cfe-dev at lists.llvm.org
>>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Sep 2019 - [cfe-dev] CFG simplification question, and preservation of branching in the original code

[llvm-dev] [cfe-dev] CFG simplification question, and preservation of branching in the original code

[llvm-dev] [cfe-dev] CFG simplification question, and preservation of branching in the original code

[llvm-dev] [cfe-dev] CFG simplification question, and preservation of branching in the original code

Possibly Parallel Threads