thr3ads.net - llvm dev - [llvm-dev] [CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Evgeny Astigeevich via llvm-dev

2016-Mar-10 00:18 UTC

[llvm-dev] [CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions

Hi Quentin,

Yes, the code allows to process connected instructions. Although it should be
taken into account that the instruction next to the current processed
instruction must never be erased because this invalidates iterator.
I've been fixing a bug in AArch64InstrInfo::optimizeCompareInstr:
instructions are converted into S form but it's not checked that they
produce the same flags as CMP. The bug exists upstream as well.
Together with the fix I want to add some peephole rules for combinations CMP+BRC
and CMP+SEL. In the context of optimizeCmpInstr I have all information about
CmpInstr. I simply go down and check all instructions which use AArch64::NZCV
whether they can be substituted with the simpler version. After all I delete
CmpInstr. This approach contradicts with PeepholeOptimizer design because BRC
and SEL must be processed in corresponding functions. Yes,
'analyzeCompare' is cheap but in optimizeCondBranch and in
optimizeSelect we need to go up to find the instruction defining condition
flags. In case of BRC CMP should not be far from it but I am not sure about SEL.
Also when BRC is replaced with BR CMP can be removed (BTW processing of
instructions below BRC can be stopped). I don't know if there any
restrictions on instructions below BRC. Anyway I don't expect many of them.
In case of CMP+SEL we can not remove CMP after simplifying SEL because there can
be other SEL instructions using flags from CMP.
> I have to admit I don't see the concern with the instruction being
condition dependent; we don't want to call optimizeCondBranch :).
> I believe I missed your point.
I missed your point too :) I think it's always good to get rid of
CondBranch.
We have cases like:

SUBS Wd, Wn, 0
B.LO

As SUBS sets C to 1 B.LO will fall through. So we can substitute them with an
unconditional branch.

Thanks,
Evgeny

From: Quentin Colombet [mailto:qcolombet at apple.com]
Sent: 09 March 2016 18:04
To: Evgeny Astigeevich
Cc: llvm-dev at lists.llvm.org; nd
Subject: Re: [llvm-dev] [CodeGen] PeepholeOptimizer: optimizing condition
dependent instrunctions

Hi Evgeny,

On Mar 9, 2016, at 6:28 AM, Evgeny Astigeevich via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hi,

I find it's quite strange how condition dependent instructions are processed
in PeepholeOptimizer::runOnMachineFunction:

01577       if ((isUncoalescableCopy(*MI) &&
01578            optimizeUncoalescableCopy(MI, LocalMIs)) ||
01579           (MI->isCompare() && optimizeCmpInstr(MI, &MBB))
||
01580           (MI->isSelect() && optimizeSelect(MI, LocalMIs))) {
01581         // MI is deleted.
01582         LocalMIs.erase(MI);
01583         Changed = true;
01584         continue;
01585       }
01586
01587       if (MI->isConditionalBranch() && optimizeCondBranch(MI))
{
01588         Changed = true;
01589         continue;
01590       }

CmpInstr, SelectInstr and CondBranch are processed separately. It's assumed
that CmpInstr and SelectInstr are deleted but CondBranch is not.
In fact CmpInstr is always connected to SelectInstr or CondBranch or both of
them. So if such connection exists it should be processed as a whole.

This code allows you to do that, unless I am mistaken.

For example, there are cases when CMP+BRC can be replaced by BR. The same is
true for CMP+SEL.

I believe this should be done in respectively optimizeCondBranch and
optimizeSelect.

The main problem I have is that I have to find corresponding CmpInstr and to
repeat analysis of it in optimizeSelect and in optimizeCondBranch.

Ok, so basically your concern is that we may call twice analyzeCompare. Is that
it?
This function is probably cheap so I wouldn't be too concerned about that.
If I turn out to be wrong, then yes we can think of a better mechanism.

Any thoughts why it's implemented in such way.

The idea of the peephole optimizer is top-down approach and greedily applied
optimization.

I have to admit I don't see the concern with the instruction being condition
dependent; we don't want to call optimizeCondBranch :).
I believe I missed your point.

Cheers,
-Quentin

Kind regards,
Evgeny Astigeevich
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160310/466656c4/attachment.html>

Quentin Colombet via llvm-dev

2016-Mar-10 00:55 UTC

head link

[llvm-dev] [CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions

Hi Evgeny,
> On Mar 9, 2016, at 4:18 PM, Evgeny Astigeevich <Evgeny.Astigeevich at
arm.com> wrote:
> 
> Hi Quentin,
>  
> Yes, the code allows to process connected instructions. Although it should
be taken into account that the instruction next to the current processed
instruction must never be erased because this invalidates iterator.
Indeed.
> I’ve been fixing a bug in AArch64InstrInfo::optimizeCompareInstr:
instructions are converted into S form but it’s not checked that they produce
the same flags as CMP. The bug exists upstream as well.
Could you file a PR or just push the patch :).

> Together with the fix I want to add some peephole rules for combinations
CMP+BRC and CMP+SEL. In the context of optimizeCmpInstr I have all information
about CmpInstr. I simply go down and check all instructions which use
AArch64::NZCV whether they can be substituted with the simpler version. After
all I delete CmpInstr. This approach contradicts with PeepholeOptimizer design
because BRC and SEL must be processed in corresponding functions.
Ok I got your concern: basically you want to do the CMP+BRC or CMP+SEL inside
optimizeCmpInstr instead of having them into optimizeSelect and optimizeBranch
so that you don’t do the analysis twice.

Historically the peephole optimizer is processing patterns bottom-up (use to
def). The rationale is we only have one def but we may have several uses. In
other words, it is easy to replace a use after you prove it is correct, but what
you want is top down (def->use) and in that case, you need some extra checks
(the potential other uses) to prove that the def can optimized.

The bottom line, I believe this is not done this way because it is not
peephole-ish in terms of complexity.

> Yes, ‘analyzeCompare’ is cheap but in optimizeCondBranch and in
optimizeSelect we need to go up to find the instruction defining condition
flags.
Going up is generally cheap, we just ask for the unique definition of the vreg.
I believe in your case it is not cheap because you are tracking a physical reg
and not a vreg.
Is that the problem?
> In case of BRC CMP should not be far from it but I am not sure about SEL.
Also when BRC is replaced with BR CMP can be removed (BTW processing of
instructions below BRC can be stopped). I don’t know if there any restrictions
on instructions below BRC.
You should have only terminators at the end of the BB.
You may have another branch though.

> Anyway I don’t expect many of them. In case of CMP+SEL we can not remove
CMP after simplifying SEL because there can be other SEL instructions using
flags from CMP.
This is what I explained with the defs need more checks. That’s why
optimizeSelect seems a good fit for that.
>  
> > I have to admit I don’t see the concern with the instruction being
condition dependent; we don’t want to call optimizeCondBranch :).
> > I believe I missed your point.
>  
> I missed your point too J I think it’s always good to get rid of
CondBranch.
I was talking about the code in the peephole optimizer :).
Like:
if isCondBranch then optimizeCondBranch
We don’t want unconditional call to optimizeCondBranch. I.e., optimizeCondBranch
expects a condbranch as argument.

Cheers,
-Quentin
> We have cases like:
>  
> SUBS Wd, Wn, 0
> B.LO
>  
> As SUBS sets C to 1 B.LO will fall through. So we can substitute them with
an unconditional branch.
>  
> Thanks,
> Evgeny
>  
> From: Quentin Colombet [mailto:qcolombet at apple.com] 
> Sent: 09 March 2016 18:04
> To: Evgeny Astigeevich
> Cc: llvm-dev at lists.llvm.org; nd
> Subject: Re: [llvm-dev] [CodeGen] PeepholeOptimizer: optimizing condition
dependent instrunctions
>  
> Hi Evgeny,
>  
> On Mar 9, 2016, at 6:28 AM, Evgeny Astigeevich via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>  
> Hi,
>  
> I find it’s quite strange how condition dependent instructions are
processed in PeepholeOptimizer::runOnMachineFunction:
>  
> 01577       if ((isUncoalescableCopy(*MI) &&
> 01578            optimizeUncoalescableCopy(MI, LocalMIs)) ||
> 01579           (MI->isCompare() && optimizeCmpInstr(MI,
&MBB)) ||
> 01580           (MI->isSelect() && optimizeSelect(MI,
LocalMIs))) {
> 01581         // MI is deleted.
> 01582         LocalMIs.erase(MI);
> 01583         Changed = true;
> 01584         continue;
> 01585       }
> 01586 
> 01587       if (MI->isConditionalBranch() &&
optimizeCondBranch(MI)) {
> 01588         Changed = true;
> 01589         continue;
> 01590       }
>  
> CmpInstr, SelectInstr and CondBranch are processed separately. It’s assumed
that CmpInstr and SelectInstr are deleted but CondBranch is not.
> In fact CmpInstr is always connected to SelectInstr or CondBranch or both
of them. So if such connection exists it should be processed as a whole.
>  
> This code allows you to do that, unless I am mistaken.
> 
> 
> For example, there are cases when CMP+BRC can be replaced by BR. The same
is true for CMP+SEL.
>  
> I believe this should be done in respectively optimizeCondBranch and
optimizeSelect.
> 
> 
>  
> The main problem I have is that I have to find corresponding CmpInstr and
to repeat analysis of it in optimizeSelect and in optimizeCondBranch.
>  
> Ok, so basically your concern is that we may call twice analyzeCompare. Is
that it?
> This function is probably cheap so I wouldn’t be too concerned about that.
If I turn out to be wrong, then yes we can think of a better mechanism.
> 
> 
>  
> Any thoughts why it’s implemented in such way.
>  
> The idea of the peephole optimizer is top-down approach and greedily
applied optimization.
>  
> I have to admit I don’t see the concern with the instruction being
condition dependent; we don’t want to call optimizeCondBranch :).
> I believe I missed your point.
>  
> Cheers,
> -Quentin
> 
> 
>  
> Kind regards,
> Evgeny Astigeevich
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160309/078612d9/attachment-0001.html>

Evgeny Astigeevich via llvm-dev

2016-Mar-10 10:42 UTC

head link

[llvm-dev] [CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions

Hi Quentin,

Yes, in case of physical regs SSA rules are broken and getUniqueVRegDef can not
be used.

Thank you for clarifying things.
Now I have more understanding how it works.

I need to split code among these functions and cover it by tests. After that I
will submit it for review.

Thanks,
Evgeny

From: Quentin Colombet [mailto:qcolombet at apple.com]
Sent: 10 March 2016 00:55
To: Evgeny Astigeevich
Cc: llvm-dev at lists.llvm.org; nd
Subject: Re: [llvm-dev] [CodeGen] PeepholeOptimizer: optimizing condition
dependent instrunctions

Hi Evgeny,

On Mar 9, 2016, at 4:18 PM, Evgeny Astigeevich <Evgeny.Astigeevich at
arm.com<mailto:Evgeny.Astigeevich at arm.com>> wrote:

Hi Quentin,

Yes, the code allows to process connected instructions. Although it should be
taken into account that the instruction next to the current processed
instruction must never be erased because this invalidates iterator.

Indeed.

I've been fixing a bug in AArch64InstrInfo::optimizeCompareInstr:
instructions are converted into S form but it's not checked that they
produce the same flags as CMP. The bug exists upstream as well.

Could you file a PR or just push the patch :).

Together with the fix I want to add some peephole rules for combinations CMP+BRC
and CMP+SEL. In the context of optimizeCmpInstr I have all information about
CmpInstr. I simply go down and check all instructions which use AArch64::NZCV
whether they can be substituted with the simpler version. After all I delete
CmpInstr. This approach contradicts with PeepholeOptimizer design because BRC
and SEL must be processed in corresponding functions.

Ok I got your concern: basically you want to do the CMP+BRC or CMP+SEL inside
optimizeCmpInstr instead of having them into optimizeSelect and optimizeBranch
so that you don't do the analysis twice.

Historically the peephole optimizer is processing patterns bottom-up (use to
def). The rationale is we only have one def but we may have several uses. In
other words, it is easy to replace a use after you prove it is correct, but what
you want is top down (def->use) and in that case, you need some extra checks
(the potential other uses) to prove that the def can optimized.

The bottom line, I believe this is not done this way because it is not
peephole-ish in terms of complexity.

Yes, 'analyzeCompare' is cheap but in optimizeCondBranch and in
optimizeSelect we need to go up to find the instruction defining condition
flags.

Going up is generally cheap, we just ask for the unique definition of the vreg.
I believe in your case it is not cheap because you are tracking a physical reg
and not a vreg.
Is that the problem?

In case of BRC CMP should not be far from it but I am not sure about SEL. Also
when BRC is replaced with BR CMP can be removed (BTW processing of instructions
below BRC can be stopped). I don't know if there any restrictions on
instructions below BRC.

You should have only terminators at the end of the BB.
You may have another branch though.

Anyway I don't expect many of them. In case of CMP+SEL we can not remove CMP
after simplifying SEL because there can be other SEL instructions using flags
from CMP.

This is what I explained with the defs need more checks. That's why
optimizeSelect seems a good fit for that.

> I have to admit I don't see the concern with the instruction being
condition dependent; we don't want to call optimizeCondBranch :).
> I believe I missed your point.
I missed your point too :) I think it's always good to get rid of
CondBranch.

I was talking about the code in the peephole optimizer :).
Like:
if isCondBranch then optimizeCondBranch
We don't want unconditional call to optimizeCondBranch. I.e.,
optimizeCondBranch expects a condbranch as argument.

Cheers,
-Quentin

We have cases like:

SUBS Wd, Wn, 0
B.LO

As SUBS sets C to 1 B.LO will fall through. So we can substitute them with an
unconditional branch.

Thanks,
Evgeny

From: Quentin Colombet [mailto:qcolombet at apple.com]
Sent: 09 March 2016 18:04
To: Evgeny Astigeevich
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; nd
Subject: Re: [llvm-dev] [CodeGen] PeepholeOptimizer: optimizing condition
dependent instrunctions

Hi Evgeny,

On Mar 9, 2016, at 6:28 AM, Evgeny Astigeevich via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hi,

I find it's quite strange how condition dependent instructions are processed
in PeepholeOptimizer::runOnMachineFunction:

01577       if ((isUncoalescableCopy(*MI) &&
01578            optimizeUncoalescableCopy(MI, LocalMIs)) ||
01579           (MI->isCompare() && optimizeCmpInstr(MI, &MBB))
||
01580           (MI->isSelect() && optimizeSelect(MI, LocalMIs))) {
01581         // MI is deleted.
01582         LocalMIs.erase(MI);
01583         Changed = true;
01584         continue;
01585       }
01586
01587       if (MI->isConditionalBranch() && optimizeCondBranch(MI))
{
01588         Changed = true;
01589         continue;
01590       }

CmpInstr, SelectInstr and CondBranch are processed separately. It's assumed
that CmpInstr and SelectInstr are deleted but CondBranch is not.
In fact CmpInstr is always connected to SelectInstr or CondBranch or both of
them. So if such connection exists it should be processed as a whole.

This code allows you to do that, unless I am mistaken.

For example, there are cases when CMP+BRC can be replaced by BR. The same is
true for CMP+SEL.

I believe this should be done in respectively optimizeCondBranch and
optimizeSelect.

The main problem I have is that I have to find corresponding CmpInstr and to
repeat analysis of it in optimizeSelect and in optimizeCondBranch.

Ok, so basically your concern is that we may call twice analyzeCompare. Is that
it?
This function is probably cheap so I wouldn't be too concerned about that.
If I turn out to be wrong, then yes we can think of a better mechanism.

Any thoughts why it's implemented in such way.

The idea of the peephole optimizer is top-down approach and greedily applied
optimization.

I have to admit I don't see the concern with the instruction being condition
dependent; we don't want to call optimizeCondBranch :).
I believe I missed your point.

Cheers,
-Quentin

Kind regards,
Evgeny Astigeevich
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160310/56eb982a/attachment.html>

llvm dev - Mar 2016 - [CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions

[llvm-dev] [CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions

[llvm-dev] [CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions

[llvm-dev] [CodeGen] PeepholeOptimizer: optimizing condition dependent instrunctions