thr3ads.net - llvm dev - [llvm-dev] llvm combines "ADD frameindex, constant" to OR [Mar 2019]

If this information is useful, please help other people find it:
Share via:

Kazushi Marukawa via llvm-dev

2019-Mar-13 00:24 UTC

[llvm-dev] llvm combines "ADD frameindex, constant" to OR

Hi all,

I've been working on a backend of our architecture and noticed llvm performs
following combining although one of operands is FrameIndex.

    Combining: t114: i64 = add FrameIndex:i64<0>, Constant:i64<56>
    Creating new node: t121: i64 = or FrameIndex:i64<0>,
Constant:i64<56>
     ... into: t121: i64 = or FrameIndex:i64<0>, Constant:i64<56>

This caused problem if frame pointer points 0x60000038 at run time.


I checked DAGCombiner::visitADD.  It folds ADD to OR by following code
without considering about FrameIndex.  This haveNoCommonBitsSet says
it's safe since FrameIndex(0) is 0.

      // fold (a+b) -> (a|b) iff a and b share no bits.
      if ((!LegalOperations || TLI.isOperationLegal(ISD::OR, VT)) &&
          DAG.haveNoCommonBitsSet(N0, N1))
        return DAG.getNode(ISD::OR, DL, VT, N0, N1);

I checked visitADD more and find that it performs some kind of undo
like bellow if the input is (+ (+ FI <const>) <const>).

        // Undo the add -> or combine to merge constant offsets from a frame
index.
        if (N0.getOpcode() == ISD::OR &&
            isa<FrameIndexSDNode>(N0.getOperand(0)) &&
            isa<ConstantSDNode>(N0.getOperand(1)) &&
            DAG.haveNoCommonBitsSet(N0.getOperand(0), N0.getOperand(1))) {
          SDValue Add0 = DAG.getNode(ISD::ADD, DL, VT, N1, N0.getOperand(1));
          return DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(0), Add0);
        }

I think this code may work fine on an architecture using FI + A + B.

However, our architecture seems to use only FI + A and this ADD is
converted to OR permanently.  As a result, generated code use OR and
it causes run time error depends on the address of frame pointer.

I also checked visitOR, but I cannot find any undoing there.


My question is:

  1. Is there any way to control this to avoid OR folding,
     like TargetLowering::preferNotCombineToOrFrameIndex?
     (or should we add new?)
  2. How to optimize FrameIndex?  I desire DAGCombiner use alignment
     information on data, stack frame, or something.

Best Regards,
-- Kazushi

Tim Northover via llvm-dev

2019-Mar-13 10:09 UTC

head link

[llvm-dev] llvm combines "ADD frameindex, constant" to OR

Hi,

On Wed, 13 Mar 2019 at 00:25, Kazushi Marukawa via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> Hi all,
>
> I've been working on a backend of our architecture and noticed llvm
performs
> following combining although one of operands is FrameIndex.
>
>     Combining: t114: i64 = add FrameIndex:i64<0>,
Constant:i64<56>
>     Creating new node: t121: i64 = or FrameIndex:i64<0>,
Constant:i64<56>
>      ... into: t121: i64 = or FrameIndex:i64<0>,
Constant:i64<56>
>
> This caused problem if frame pointer points 0x60000038 at run time.
>
>
> I checked DAGCombiner::visitADD.  It folds ADD to OR by following code
> without considering about FrameIndex.  This haveNoCommonBitsSet says
> it's safe since FrameIndex(0) is 0.
This sounds like your fundamental problem to me. The generic code path
is computeKnownBits -> computeKnownBitsForFrameIndex ->
InferPtrAlignment. If your stack slot can legitimately end up at an
offset only aligned to 0x8 then that's going wrong somewhere and you
should probably work out why.

I don't really have the details to say for sure, but unless you've
overridden computeKnownBitsForFrameIndex a reasonably likely scenario
is that the alloca for that local variable requested 16-bit alignment
but your sp itself is only routinely aligned to 8 bytes (or less) and
you actually need to realign sp manually in XYZFrameLowering to
support that variable.

The ISD::OR thing can be annoying, but shouldn't affect correctness.
>         // Undo the add -> or combine to merge constant offsets from a
frame index.
>         if (N0.getOpcode() == ISD::OR &&
>             isa<FrameIndexSDNode>(N0.getOperand(0)) &&
>             isa<ConstantSDNode>(N0.getOperand(1)) &&
>             DAG.haveNoCommonBitsSet(N0.getOperand(0), N0.getOperand(1))) {
>           SDValue Add0 = DAG.getNode(ISD::ADD, DL, VT, N1,
N0.getOperand(1));
>           return DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(0), Add0);
>         }
>
> I think this code may work fine on an architecture using FI + A + B.
It's target independent, and again shouldn't affect correctness. The
entire transformation is (FI | A) + B -> (FI + (A+B)). Useful when it
triggers, but not much else.
>   1. Is there any way to control this to avoid OR folding,
>      like TargetLowering::preferNotCombineToOrFrameIndex?
>      (or should we add new?)
There isn't as far as I know. I've heard it justified as
canonicalization before, and not been entirely convinced myself. But
it's generally pretty easy to cope with in the backend because
SelectionDAG::isBaseWithConstantOffset  deals with it correctly.
>   2. How to optimize FrameIndex?  I desire DAGCombiner use alignment
>      information on data, stack frame, or something.
As I said above, I think it should already use that and something has
gone wrong there in your case.

Cheers.

Tim.

Kazushi Marukawa via llvm-dev

2019-Mar-13 23:30 UTC

head link

[llvm-dev] llvm combines "ADD frameindex, constant" to OR

Hi Tim,

Thank you very much for explaining the computeKnownBitsForFrameIndex
and related functions.  I even don't notice them.  As you say, I think
my backend has fundamental problems at that area.  I'll check it out.
> >   2. How to optimize FrameIndex?  I desire DAGCombiner use alignment
> >      information on data, stack frame, or something.
> 
> As I said above, I think it should already use that and something has
> gone wrong there in your case.
That's true.  The computeKnownBitsForFrameIndex and InferPtrAlignment
already use that.  It won't work for my case, though.

For example, a vector sized data like v8sd says 64 bytes alignment in
InferPtrAlignment, although I allocate v8sd using 8 bytes alignment on
stack.  This differences have caused the problem.  I'll check details.

Best Regards,
-- Kazushi
> -----Original Message-----
> From: Tim Northover [mailto:t.p.northover at gmail.com]
> Sent: Wednesday, March 13, 2019 7:09 PM
> To: Marukawa Kazushi(丸川 一志) <kaz-marukawa at xr.jp.nec.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] llvm combines "ADD frameindex, constant"
to OR
> 
> Hi,
> 
> On Wed, 13 Mar 2019 at 00:25, Kazushi Marukawa via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > Hi all,
> >
> > I've been working on a backend of our architecture and noticed
llvm performs
> > following combining although one of operands is FrameIndex.
> >
> >     Combining: t114: i64 = add FrameIndex:i64<0>,
Constant:i64<56>
> >     Creating new node: t121: i64 = or FrameIndex:i64<0>,
Constant:i64<56>
> >      ... into: t121: i64 = or FrameIndex:i64<0>,
Constant:i64<56>
> >
> > This caused problem if frame pointer points 0x60000038 at run time.
> >
> >
> > I checked DAGCombiner::visitADD.  It folds ADD to OR by following code
> > without considering about FrameIndex.  This haveNoCommonBitsSet says
> > it's safe since FrameIndex(0) is 0.
> 
> This sounds like your fundamental problem to me. The generic code path
> is computeKnownBits -> computeKnownBitsForFrameIndex ->
> InferPtrAlignment. If your stack slot can legitimately end up at an
> offset only aligned to 0x8 then that's going wrong somewhere and you
> should probably work out why.
> 
> I don't really have the details to say for sure, but unless you've
> overridden computeKnownBitsForFrameIndex a reasonably likely scenario
> is that the alloca for that local variable requested 16-bit alignment
> but your sp itself is only routinely aligned to 8 bytes (or less) and
> you actually need to realign sp manually in XYZFrameLowering to
> support that variable.
> 
> The ISD::OR thing can be annoying, but shouldn't affect correctness.
> 
> >         // Undo the add -> or combine to merge constant offsets
from a frame index.
> >         if (N0.getOpcode() == ISD::OR &&
> >             isa<FrameIndexSDNode>(N0.getOperand(0)) &&
> >             isa<ConstantSDNode>(N0.getOperand(1)) &&
> >             DAG.haveNoCommonBitsSet(N0.getOperand(0),
N0.getOperand(1))) {
> >           SDValue Add0 = DAG.getNode(ISD::ADD, DL, VT, N1,
N0.getOperand(1));
> >           return DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(0),
Add0);
> >         }
> >
> > I think this code may work fine on an architecture using FI + A + B.
> 
> It's target independent, and again shouldn't affect correctness.
The
> entire transformation is (FI | A) + B -> (FI + (A+B)). Useful when it
> triggers, but not much else.
> 
> >   1. Is there any way to control this to avoid OR folding,
> >      like TargetLowering::preferNotCombineToOrFrameIndex?
> >      (or should we add new?)
> 
> There isn't as far as I know. I've heard it justified as
> canonicalization before, and not been entirely convinced myself. But
> it's generally pretty easy to cope with in the backend because
> SelectionDAG::isBaseWithConstantOffset  deals with it correctly.
> 
> >   2. How to optimize FrameIndex?  I desire DAGCombiner use alignment
> >      information on data, stack frame, or something.
> 
> As I said above, I think it should already use that and something has
> gone wrong there in your case.
> 
> Cheers.
> 
> Tim.

llvm dev - Mar 2019 - llvm combines "ADD frameindex, constant" to OR

[llvm-dev] llvm combines "ADD frameindex, constant" to OR

[llvm-dev] llvm combines "ADD frameindex, constant" to OR

[llvm-dev] llvm combines "ADD frameindex, constant" to OR