thr3ads.net - llvm dev - [llvm-dev] Optimizing Compare instruction selection [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Joan Lluch via llvm-dev

2019-Jun-05 15:24 UTC

[llvm-dev] Optimizing Compare instruction selection

Hi Eli,

Thanks again for your reply. 

I am unsure about implementing the getCrossCopyRegClass for my target. My target
does not support or allow moves to and from the SR. The SR exists because it has
implicit involvement in some instructions, but it is opaque to the assembler and
to the user as a register. I mean, there are no instructions to directly move or
read it, or even access it directly.  So I think that implementing the
getCrossCopyRegClass -to tell LLVM that it can be copied to another register
class- is against the actual target features (unless I am missing the actual
purpose of the function)

I ended implementing target Branch and Select instructions that have a glue
input, and another pair that have a GR register as input. Both map to the same
machine instructions through Pattern matching in TargetInstrInfo.td. This is
what I mean, for the case of the Conditional Branch instruction :

let Uses = [SR] in {
  def BRCC  : T2ccbr< "br", "br"> ;
}

def : Pat<(CPU74brcc_g bb:$a, imm:$cc), (BRCC jmptarget:$a, imm:$cc, 0)>;
def : Pat<(CPU74brcc_i bb:$a, imm:$cc, GR16:$ins), (BRCC jmptarget:$a,
imm:$cc, GR16:$ins)>;
  
Then, in MyTargetISelDAGToDAG I create ‘CPU74brcc_g' instructions if they
are glued to a ‘compare’ instruction, or ‘CPU74brcc_i'  instructions if they
link to the result of an arithmetic instruction. During instruction selection
these are replaced by actual, physical, BRCC instructions. This solution has
proved to work fine without having to explicitly involve the SR. Of course, the
SR is still implicitly involved by the correct definition of ‘Uses’ and ‘Defs’
on the corresponding instructions.

Joan Lluch

> On 3 Jun 2019, at 21:00, Eli Friedman <efriedma at quicinc.com>
wrote:
> 
> Have you implemented TargetRegisterInfo::getCrossCopyRegClass?
>  
> -Eli
>  
> From: Joan Lluch <joan.lluch at icloud.com> 
> Sent: Sunday, June 2, 2019 2:39 AM
> To: Eli Friedman <efriedma at quicinc.com>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [EXT] Re: [llvm-dev] Optimizing Compare instruction selection
>  
> Hi Eli,
>  
> Thank  you very much for your response. 
>  
> In fact, I had already tried the X86 approach before, i.e explicitly using
the status register. This is the approach that appeals more to me. I left it
parked because it also produced some problems (but I left it commented out). So
I have now re-lived the code, and it works fine in most cases, but there’s a
particular case that causes LLVM to stop with an assertion. Apparently LLVM
tries to use the SR (Status Register) register as a general purpose one, which
is not allowed in my architecture. The problem exists even before I added the
referred optimisations. The code that causes the problem is this:
>  
> int doSmth( int y );
>  
> int test( int y )
> {
>   int neg = y < 0;
>   if ( neg )
>     y = - y;
>   
>   int rv = doSmth( y );
>   
>   return neg ? - rv : rv;
> }
>  
> Apparently, LLVM attempts to physically use the result of a CMP instruction
through a function call by storing it on a temporary register. This is found
before the doSmth function call,
>  
>   t30: i16 = CMPkr16 t4, TargetConstant:i16<0>
>     t36: ch,glue = CopyToReg t0, Register:i16 $sr, t30
>   t32: i16 = NEGSETCC TargetConstant:i16<4>, t36:1
> 
> 
> And this is generated after the call
>  
>       t35: ch,glue = CopyToReg t0, Register:i16 $sr, t30
>     t31: i16 = SELCC t19, t18, TargetConstant:i16<4>, t35:1
>   t21: ch,glue = CopyToReg t18:1, Register:i16 $r0, t31
>  
> NEGSETCC and SELCC are genuine instructions of my target architecture, they
use the SR along with operands to produce a result.
>  
> As you can see ‘t30’ is used both before and after the function call, which
is what I think that causes trouble. In particular, the assertion that I get is
this:
>  
> Assertion failed: (RegClass->isAllocatable() && "Virtual
register RegClass must be allocatable."), function createVirtualRegister,
file /Users/joan/LLVM+CLANG/llvm-7.0.1.src/lib/CodeGen/MachineRegisterInfo.cpp,
line 170.
>  
> If I remove the ‘isAllocatable=0’ setting on the SR register, then LLVM
will try to spill the SR by calling ‘storeRegToStackSlot’, which then I stop
with my own assertion because the SR can’t be spilled.
>  
> If I then remove also the ‘let CopCost = -1’ for the SR register, then LLVM
attempts to move it to a general purpose register by calling ‘copyPhysRegs’,
which again I stop with an assertion because it’s not legal in my architecture.
>  
> So, I’m stuck as well with this approach.
>  
> Any ideas, on what I can try?
>  
> I think, that at this point I need to stop LLVM from attempting to reuse
the result of the CMP instruction before and after the function call, and
instead make it generate a new cmp instruction after the call. But how do I
achieve that?
>  
> (maybe a difficulty is that I am Custom lowering ’’SELECT_CC’ and ‘BR_CC’,
and Expanding ’SELECT’ and ‘BRCOND’, instead of the opposite? Does this make a
difference?)
>  
> Thanks
>  
> Joan Lluch
> 
> Tel: 620 28 45 13
>  
> On 2 Jun 2019, at 03:23, Eli Friedman <efriedma at quicinc.com
<mailto:efriedma at quicinc.com>> wrote:
>  
> There are basically two possible approaches here.
>  
> One approach is to just wait until after isel to handle this: check for an
addition instruction immediately followed by an cmp; then erase the cmp. See,
for example, ARMBaseInstrInfo::optimizeCompareInstr.
>  
> Another, you make a dedicated target-specific ISel node that produces two
results: essentially, one is the result of the arithmetic, and the other is the
status register.  See, for example, X86TargetLowering::LowerBRCOND.
>  
> I don't think you'd want to try to do anything in Select(); I mean,
you could try to pattern-match (cmp x, (add x, y)) or whatever, but you'd
probably run into trouble if the add has multiple uses.
>  
> -Eli
>  
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org
<mailto:llvm-dev-bounces at lists.llvm.org>> on behalf of Joan Lluch
via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>
> Sent: Saturday, June 1, 2019 10:09 AM
> To: llvm-dev
> Subject: [EXT] [llvm-dev] Optimizing Compare instruction selection
>  
> I attempt to optimize the use of the ‘CMP’ instruction on my architecture
by removing the instruction instances where the Status Register already had the
correct status flags.
>  
> The cmp instruction in my architecture is the typical one that compares two
registers, or a register with an immediate, and sets the Status Flags
accordingly. I implemented my ‘cmp’  instruction in LLVM by custom lowering the
SETCC, SELECT_CC and BR_CC in the usual way, mostly following the ARM
implementation. In particular the target cmp instruction is created and glued to
a target conditional instruction, such as a conditional branch, or a conditional
select.
>  
> The generated code may look like this:
>  
> add R0, R1, R2
> cmp R0, #0
> breq Label
>  
> This works fine, but now I would want to go one step further:
>  
> As part of the SETCC and SELECT_CC lowering, I can determine if the
conditions are met in my architecture to avoid generating a ‘cmp’ instruction at
all. For example, if the left operand of the SETCC is an ‘addition', the
right operand is constant zero, and the condition is 'ISD::CondCode::SETEQ,
I can safely assume that the Status Register in my architecture already has the
right condition just before the ‘cmp’, which has been set by the target ‘add’
instruction. So at this point I want to avoid generating the ‘cmp’ because it is
redundant.
>  
> My goal is to replace the assembly code shown above, by just this:
>  
> add R0, R1, R2
> breq Label
>  
> The ‘cmp’ was redundant in this case because the add instruction already
set the Z (zero) flag
>  
> I tried several things, but I got stuck in some way in all of them. For
example, an approach that I tried is to have a ‘dummy_cmp’ instruction, that I
generate instead of the regular ‘cmp’ when the comparison is redundant. The
‘dummy_cmp’ must be removed at a latter stage, ideally during the selDAGToDAG
phase, but I failed to do so.
>  
> - So my first question is. How do I remove the ‘dummy_cmp’ node in the
MyTargetISelDAGtoDAG::Select() function?
>  
> Another approach that I tried, which I regard as preferable, is to directly
avoid the creation of a ‘cmp’ instruction during ISelLowering. Instead of ‘cmp’,
use the LHS operand of the SETCC operand if it mets the conditions described
above. The problem that I found with this approach is that the ‘cmp’ is just a
‘glue’ to the target branch or select instruction, however, the LHS is an actual
operand (for example an ‘add’ node), so I need to create a ‘glue’ for it, in
order to be attached to the target branch instruction.
>  
> - My second question is therefore, how do I create a ‘glue’ for an existing
‘add’ node that I can attach to the target conditional instruction as a
replacement of the ‘cmp’ instruction?
>  
>  
> Joan Lluch
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190605/2b578a08/attachment.html>

Eli Friedman via llvm-dev

2019-Jun-05 23:43 UTC

head link

[llvm-dev] Optimizing Compare instruction selection

You’re allowed to return “nullptr” from getCrossCopyRegClass, and the scheduler
will try to do the right thing, possibly cloning the instruction that produces
the flag.

-Eli

From: Joan Lluch <joan.lluch at icloud.com>
Sent: Wednesday, June 5, 2019 8:25 AM
To: Eli Friedman <efriedma at quicinc.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [EXT] Re: [llvm-dev] Optimizing Compare instruction selection

Hi Eli,

Thanks again for your reply.

I am unsure about implementing the getCrossCopyRegClass for my target. My target
does not support or allow moves to and from the SR. The SR exists because it has
implicit involvement in some instructions, but it is opaque to the assembler and
to the user as a register. I mean, there are no instructions to directly move or
read it, or even access it directly.  So I think that implementing the
getCrossCopyRegClass -to tell LLVM that it can be copied to another register
class- is against the actual target features (unless I am missing the actual
purpose of the function)

I ended implementing target Branch and Select instructions that have a glue
input, and another pair that have a GR register as input. Both map to the same
machine instructions through Pattern matching in TargetInstrInfo.td. This is
what I mean, for the case of the Conditional Branch instruction :

let Uses = [SR] in {
  def BRCC  : T2ccbr< "br", "br"> ;
}

def : Pat<(CPU74brcc_g bb:$a, imm:$cc), (BRCC jmptarget:$a, imm:$cc, 0)>;
def : Pat<(CPU74brcc_i bb:$a, imm:$cc, GR16:$ins), (BRCC jmptarget:$a,
imm:$cc, GR16:$ins)>;

Then, in MyTargetISelDAGToDAG I create ‘CPU74brcc_g' instructions if they
are glued to a ‘compare’ instruction, or ‘CPU74brcc_i'  instructions if they
link to the result of an arithmetic instruction. During instruction selection
these are replaced by actual, physical, BRCC instructions. This solution has
proved to work fine without having to explicitly involve the SR. Of course, the
SR is still implicitly involved by the correct definition of ‘Uses’ and ‘Defs’
on the corresponding instructions.

Joan Lluch

On 3 Jun 2019, at 21:00, Eli Friedman <efriedma at
quicinc.com<mailto:efriedma at quicinc.com>> wrote:

Have you implemented TargetRegisterInfo::getCrossCopyRegClass?

-Eli

From: Joan Lluch <joan.lluch at icloud.com<mailto:joan.lluch at
icloud.com>>
Sent: Sunday, June 2, 2019 2:39 AM
To: Eli Friedman <efriedma at quicinc.com<mailto:efriedma at
quicinc.com>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Subject: [EXT] Re: [llvm-dev] Optimizing Compare instruction selection

Hi Eli,

Thank  you very much for your response.

In fact, I had already tried the X86 approach before, i.e explicitly using the
status register. This is the approach that appeals more to me. I left it parked
because it also produced some problems (but I left it commented out). So I have
now re-lived the code, and it works fine in most cases, but there’s a particular
case that causes LLVM to stop with an assertion. Apparently LLVM tries to use
the SR (Status Register) register as a general purpose one, which is not allowed
in my architecture. The problem exists even before I added the referred
optimisations. The code that causes the problem is this:

int doSmth( int y );

int test( int y )
{
  int neg = y < 0;
  if ( neg )
    y = - y;

  int rv = doSmth( y );

  return neg ? - rv : rv;
}

Apparently, LLVM attempts to physically use the result of a CMP instruction
through a function call by storing it on a temporary register. This is found
before the doSmth function call,

  t30: i16 = CMPkr16 t4, TargetConstant:i16<0>
    t36: ch,glue = CopyToReg t0, Register:i16 $sr, t30
  t32: i16 = NEGSETCC TargetConstant:i16<4>, t36:1

And this is generated after the call

      t35: ch,glue = CopyToReg t0, Register:i16 $sr, t30
    t31: i16 = SELCC t19, t18, TargetConstant:i16<4>, t35:1
  t21: ch,glue = CopyToReg t18:1, Register:i16 $r0, t31

NEGSETCC and SELCC are genuine instructions of my target architecture, they use
the SR along with operands to produce a result.

As you can see ‘t30’ is used both before and after the function call, which is
what I think that causes trouble. In particular, the assertion that I get is
this:

Assertion failed: (RegClass->isAllocatable() && "Virtual
register RegClass must be allocatable."), function createVirtualRegister,
file /Users/joan/LLVM+CLANG/llvm-7.0.1.src/lib/CodeGen/MachineRegisterInfo.cpp,
line 170.

If I remove the ‘isAllocatable=0’ setting on the SR register, then LLVM will try
to spill the SR by calling ‘storeRegToStackSlot’, which then I stop with my own
assertion because the SR can’t be spilled.

If I then remove also the ‘let CopCost = -1’ for the SR register, then LLVM
attempts to move it to a general purpose register by calling ‘copyPhysRegs’,
which again I stop with an assertion because it’s not legal in my architecture.

So, I’m stuck as well with this approach.

Any ideas, on what I can try?

I think, that at this point I need to stop LLVM from attempting to reuse the
result of the CMP instruction before and after the function call, and instead
make it generate a new cmp instruction after the call. But how do I achieve
that?

(maybe a difficulty is that I am Custom lowering ’’SELECT_CC’ and ‘BR_CC’, and
Expanding ’SELECT’ and ‘BRCOND’, instead of the opposite? Does this make a
difference?)

Thanks

Joan Lluch

Tel: 620 28 45 13

On 2 Jun 2019, at 03:23, Eli Friedman <efriedma at
quicinc.com<mailto:efriedma at quicinc.com>> wrote:

There are basically two possible approaches here.

One approach is to just wait until after isel to handle this: check for an
addition instruction immediately followed by an cmp; then erase the cmp. See,
for example, ARMBaseInstrInfo::optimizeCompareInstr.

Another, you make a dedicated target-specific ISel node that produces two
results: essentially, one is the result of the arithmetic, and the other is the
status register.  See, for example, X86TargetLowering::LowerBRCOND.

I don't think you'd want to try to do anything in Select(); I mean, you
could try to pattern-match (cmp x, (add x, y)) or whatever, but you'd
probably run into trouble if the add has multiple uses.

-Eli

________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> on behalf of Joan Lluch via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Sent: Saturday, June 1, 2019 10:09 AM
To: llvm-dev
Subject: [EXT] [llvm-dev] Optimizing Compare instruction selection

I attempt to optimize the use of the ‘CMP’ instruction on my architecture by
removing the instruction instances where the Status Register already had the
correct status flags.

The cmp instruction in my architecture is the typical one that compares two
registers, or a register with an immediate, and sets the Status Flags
accordingly. I implemented my ‘cmp’  instruction in LLVM by custom lowering the
SETCC, SELECT_CC and BR_CC in the usual way, mostly following the ARM
implementation. In particular the target cmp instruction is created and glued to
a target conditional instruction, such as a conditional branch, or a conditional
select.

The generated code may look like this:

add R0, R1, R2
cmp R0, #0
breq Label

This works fine, but now I would want to go one step further:

As part of the SETCC and SELECT_CC lowering, I can determine if the conditions
are met in my architecture to avoid generating a ‘cmp’ instruction at all. For
example, if the left operand of the SETCC is an ‘addition', the right
operand is constant zero, and the condition is 'ISD::CondCode::SETEQ, I can
safely assume that the Status Register in my architecture already has the right
condition just before the ‘cmp’, which has been set by the target ‘add’
instruction. So at this point I want to avoid generating the ‘cmp’ because it is
redundant.

My goal is to replace the assembly code shown above, by just this:

add R0, R1, R2
breq Label

The ‘cmp’ was redundant in this case because the add instruction already set the
Z (zero) flag

I tried several things, but I got stuck in some way in all of them. For example,
an approach that I tried is to have a ‘dummy_cmp’ instruction, that I generate
instead of the regular ‘cmp’ when the comparison is redundant. The ‘dummy_cmp’
must be removed at a latter stage, ideally during the selDAGToDAG phase, but I
failed to do so.

- So my first question is. How do I remove the ‘dummy_cmp’ node in the
MyTargetISelDAGtoDAG::Select() function?

Another approach that I tried, which I regard as preferable, is to directly
avoid the creation of a ‘cmp’ instruction during ISelLowering. Instead of ‘cmp’,
use the LHS operand of the SETCC operand if it mets the conditions described
above. The problem that I found with this approach is that the ‘cmp’ is just a
‘glue’ to the target branch or select instruction, however, the LHS is an actual
operand (for example an ‘add’ node), so I need to create a ‘glue’ for it, in
order to be attached to the target branch instruction.

- My second question is therefore, how do I create a ‘glue’ for an existing
‘add’ node that I can attach to the target conditional instruction as a
replacement of the ‘cmp’ instruction?

Joan Lluch

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190605/267b3fbd/attachment.html>

Joan Lluch via llvm-dev

2019-Jun-06 21:29 UTC

head link

[llvm-dev] Optimizing Compare instruction selection

Hi Eli,

I already got it working in the way I described, but thanks for the suggestion.
I will try this as well and if it works I may chose it as it appears to require
less custom code.

Joan Lluch

> On 6 Jun 2019, at 01:43, Eli Friedman <efriedma at quicinc.com>
wrote:
> 
> You’re allowed to return “nullptr” from getCrossCopyRegClass, and the
scheduler will try to do the right thing, possibly cloning the instruction that
produces the flag.
>  
> -Eli
>  
> From: Joan Lluch <joan.lluch at icloud.com> 
> Sent: Wednesday, June 5, 2019 8:25 AM
> To: Eli Friedman <efriedma at quicinc.com>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [EXT] Re: [llvm-dev] Optimizing Compare instruction selection
>  
> Hi Eli,
>  
> Thanks again for your reply. 
>  
> I am unsure about implementing the getCrossCopyRegClass for my target. My
target does not support or allow moves to and from the SR. The SR exists because
it has implicit involvement in some instructions, but it is opaque to the
assembler and to the user as a register. I mean, there are no instructions to
directly move or read it, or even access it directly.  So I think that
implementing the getCrossCopyRegClass -to tell LLVM that it can be copied to
another register class- is against the actual target features (unless I am
missing the actual purpose of the function)
>  
> I ended implementing target Branch and Select instructions that have a glue
input, and another pair that have a GR register as input. Both map to the same
machine instructions through Pattern matching in TargetInstrInfo.td. This is
what I mean, for the case of the Conditional Branch instruction :
>  
> let Uses = [SR] in {
>   def BRCC  : T2ccbr< "br", "br"> ;
> }
>  
> def : Pat<(CPU74brcc_g bb:$a, imm:$cc), (BRCC jmptarget:$a, imm:$cc,
0)>;
> def : Pat<(CPU74brcc_i bb:$a, imm:$cc, GR16:$ins), (BRCC jmptarget:$a,
imm:$cc, GR16:$ins)>;
>   
> Then, in MyTargetISelDAGToDAG I create ‘CPU74brcc_g' instructions if
they are glued to a ‘compare’ instruction, or ‘CPU74brcc_i'  instructions if
they link to the result of an arithmetic instruction. During instruction
selection these are replaced by actual, physical, BRCC instructions. This
solution has proved to work fine without having to explicitly involve the SR. Of
course, the SR is still implicitly involved by the correct definition of ‘Uses’
and ‘Defs’ on the corresponding instructions.
>  
> Joan Lluch
>  
>  
> On 3 Jun 2019, at 21:00, Eli Friedman <efriedma at quicinc.com
<mailto:efriedma at quicinc.com>> wrote:
>  
> Have you implemented TargetRegisterInfo::getCrossCopyRegClass?
>  
> -Eli
>  
> From: Joan Lluch <joan.lluch at icloud.com <mailto:joan.lluch at
icloud.com>>
> Sent: Sunday, June 2, 2019 2:39 AM
> To: Eli Friedman <efriedma at quicinc.com <mailto:efriedma at
quicinc.com>>
> Cc: llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>
> Subject: [EXT] Re: [llvm-dev] Optimizing Compare instruction selection
>  
> Hi Eli,
>  
> Thank  you very much for your response. 
>  
> In fact, I had already tried the X86 approach before, i.e explicitly using
the status register. This is the approach that appeals more to me. I left it
parked because it also produced some problems (but I left it commented out). So
I have now re-lived the code, and it works fine in most cases, but there’s a
particular case that causes LLVM to stop with an assertion. Apparently LLVM
tries to use the SR (Status Register) register as a general purpose one, which
is not allowed in my architecture. The problem exists even before I added the
referred optimisations. The code that causes the problem is this:
>  
> int doSmth( int y );
>  
> int test( int y )
> {
>   int neg = y < 0;
>   if ( neg )
>     y = - y;
>   
>   int rv = doSmth( y );
>   
>   return neg ? - rv : rv;
> }
>  
> Apparently, LLVM attempts to physically use the result of a CMP instruction
through a function call by storing it on a temporary register. This is found
before the doSmth function call,
>  
>   t30: i16 = CMPkr16 t4, TargetConstant:i16<0>
>     t36: ch,glue = CopyToReg t0, Register:i16 $sr, t30
>   t32: i16 = NEGSETCC TargetConstant:i16<4>, t36:1
> 
> 
> 
> And this is generated after the call
>  
>       t35: ch,glue = CopyToReg t0, Register:i16 $sr, t30
>     t31: i16 = SELCC t19, t18, TargetConstant:i16<4>, t35:1
>   t21: ch,glue = CopyToReg t18:1, Register:i16 $r0, t31
>  
> NEGSETCC and SELCC are genuine instructions of my target architecture, they
use the SR along with operands to produce a result.
>  
> As you can see ‘t30’ is used both before and after the function call, which
is what I think that causes trouble. In particular, the assertion that I get is
this:
>  
> Assertion failed: (RegClass->isAllocatable() && "Virtual
register RegClass must be allocatable."), function createVirtualRegister,
file /Users/joan/LLVM+CLANG/llvm-7.0.1.src/lib/CodeGen/MachineRegisterInfo.cpp,
line 170.
>  
> If I remove the ‘isAllocatable=0’ setting on the SR register, then LLVM
will try to spill the SR by calling ‘storeRegToStackSlot’, which then I stop
with my own assertion because the SR can’t be spilled.
>  
> If I then remove also the ‘let CopCost = -1’ for the SR register, then LLVM
attempts to move it to a general purpose register by calling ‘copyPhysRegs’,
which again I stop with an assertion because it’s not legal in my architecture.
>  
> So, I’m stuck as well with this approach.
>  
> Any ideas, on what I can try?
>  
> I think, that at this point I need to stop LLVM from attempting to reuse
the result of the CMP instruction before and after the function call, and
instead make it generate a new cmp instruction after the call. But how do I
achieve that?
>  
> (maybe a difficulty is that I am Custom lowering ’’SELECT_CC’ and ‘BR_CC’,
and Expanding ’SELECT’ and ‘BRCOND’, instead of the opposite? Does this make a
difference?)
>  
> Thanks
>  
> Joan Lluch
> 
> Tel: 620 28 45 13
>  
> On 2 Jun 2019, at 03:23, Eli Friedman <efriedma at quicinc.com
<mailto:efriedma at quicinc.com>> wrote:
>  
> There are basically two possible approaches here.
>  
> One approach is to just wait until after isel to handle this: check for an
addition instruction immediately followed by an cmp; then erase the cmp. See,
for example, ARMBaseInstrInfo::optimizeCompareInstr.
>  
> Another, you make a dedicated target-specific ISel node that produces two
results: essentially, one is the result of the arithmetic, and the other is the
status register.  See, for example, X86TargetLowering::LowerBRCOND.
>  
> I don't think you'd want to try to do anything in Select(); I mean,
you could try to pattern-match (cmp x, (add x, y)) or whatever, but you'd
probably run into trouble if the add has multiple uses.
>  
> -Eli
>  
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org
<mailto:llvm-dev-bounces at lists.llvm.org>> on behalf of Joan Lluch
via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>
> Sent: Saturday, June 1, 2019 10:09 AM
> To: llvm-dev
> Subject: [EXT] [llvm-dev] Optimizing Compare instruction selection
>  
> I attempt to optimize the use of the ‘CMP’ instruction on my architecture
by removing the instruction instances where the Status Register already had the
correct status flags.
>  
> The cmp instruction in my architecture is the typical one that compares two
registers, or a register with an immediate, and sets the Status Flags
accordingly. I implemented my ‘cmp’  instruction in LLVM by custom lowering the
SETCC, SELECT_CC and BR_CC in the usual way, mostly following the ARM
implementation. In particular the target cmp instruction is created and glued to
a target conditional instruction, such as a conditional branch, or a conditional
select.
>  
> The generated code may look like this:
>  
> add R0, R1, R2
> cmp R0, #0
> breq Label
>  
> This works fine, but now I would want to go one step further:
>  
> As part of the SETCC and SELECT_CC lowering, I can determine if the
conditions are met in my architecture to avoid generating a ‘cmp’ instruction at
all. For example, if the left operand of the SETCC is an ‘addition', the
right operand is constant zero, and the condition is 'ISD::CondCode::SETEQ,
I can safely assume that the Status Register in my architecture already has the
right condition just before the ‘cmp’, which has been set by the target ‘add’
instruction. So at this point I want to avoid generating the ‘cmp’ because it is
redundant.
>  
> My goal is to replace the assembly code shown above, by just this:
>  
> add R0, R1, R2
> breq Label
>  
> The ‘cmp’ was redundant in this case because the add instruction already
set the Z (zero) flag
>  
> I tried several things, but I got stuck in some way in all of them. For
example, an approach that I tried is to have a ‘dummy_cmp’ instruction, that I
generate instead of the regular ‘cmp’ when the comparison is redundant. The
‘dummy_cmp’ must be removed at a latter stage, ideally during the selDAGToDAG
phase, but I failed to do so.
>  
> - So my first question is. How do I remove the ‘dummy_cmp’ node in the
MyTargetISelDAGtoDAG::Select() function?
>  
> Another approach that I tried, which I regard as preferable, is to directly
avoid the creation of a ‘cmp’ instruction during ISelLowering. Instead of ‘cmp’,
use the LHS operand of the SETCC operand if it mets the conditions described
above. The problem that I found with this approach is that the ‘cmp’ is just a
‘glue’ to the target branch or select instruction, however, the LHS is an actual
operand (for example an ‘add’ node), so I need to create a ‘glue’ for it, in
order to be attached to the target branch instruction.
>  
> - My second question is therefore, how do I create a ‘glue’ for an existing
‘add’ node that I can attach to the target conditional instruction as a
replacement of the ‘cmp’ instruction?
>  
>  
> Joan Lluch
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190606/ddc77ba1/attachment.html>

llvm dev - Jun 2019 - Optimizing Compare instruction selection

[llvm-dev] Optimizing Compare instruction selection

[llvm-dev] Optimizing Compare instruction selection

[llvm-dev] Optimizing Compare instruction selection