thr3ads.net - llvm dev - [llvm-dev] Different SelectionDAGs for same CPU [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Josh Sharp via llvm-dev

2019-Jan-22 04:54 UTC

[llvm-dev] Different SelectionDAGs for same CPU

Hi,
I used 2 different compilers to compile the same IR for the same custom target.
The LLVM IR code is

define i32 @_Z9test_mathv() #0 {
  %a = alloca i32, align 4
  %1 = load i32, i32* %a, align 4
  ret i32 %1
}

Before instruction selection, the Selection DAGs are the same:

Optimized legalized selection DAG: %bb.0 '_Z9test_mathv:'
SelectionDAG has 7 nodes:
  t0: ch = EntryToken
    t4: i32,ch = load<(dereferenceable load 4 from %ir.a)> t0,
FrameIndex:i32<0>, undef:i32
  t6: ch,glue = CopyToReg t0, Register:i32 $r4, t4
  t7: ch = UISD::Ret t6, Register:i32 $r4, t6:1


But after it, one has 1 more node than the other

compiler 1
===== Instruction selection ends:
Selected selection DAG: %bb.0 '_Z9test_mathv:'
SelectionDAG has 8 nodes:
  t0: ch = EntryToken
      t1: i32 = add TargetFrameIndex:i32<0>, TargetConstant:i32<0>
    t4: i32,ch = LDWI<Mem:(dereferenceable load 4 from %ir.a)> t1, t0
  t6: ch,glue = CopyToReg t0, Register:i32 $r4, t4
  t7: ch = JLR Register:i32 $r4, t6, t6:1


compiler 2
===== Instruction selection ends:
Selected selection DAG: BB#0 '_Z9test_mathv:'
SelectionDAG has 7 nodes:
  t0: ch = EntryToken
    t4: i32,ch = LDWI<Mem:LD4[%a](dereferenceable)>
TargetFrameIndex:i32<0>, TargetConstant:i32<0>, t0
  t6: ch,glue = CopyToReg t0, Register:i32 %$r4, t4
  t7: ch = JLR Register:i32 %$r4, t6, t6:1

In the first case, node t1 is a separate node whereas in the second case, t1 is
inside t4. What difference in implementation could explain this difference in
behavior? Where in the code should I look into?
(Note that "LDWI" is an instruction that adds up a register and an
immediate and loads the memory content located at the address represented by the
sum into a register)

Thanks.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190122/2edd4ce0/attachment.html>

Tim Northover via llvm-dev

2019-Jan-22 07:52 UTC

head link

[llvm-dev] Different SelectionDAGs for same CPU

Hi Josh,

On Tue, 22 Jan 2019 at 04:54, Josh Sharp via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> In the first case, node t1 is a separate node whereas in the second case,
t1 is inside t4. What difference in implementation could explain this difference
in behavior?
The second compiler looks like someone has added extra code to fold a
stack address calculation into the load operation that accesses the
variable.
> Where in the code should I look into?
It could be implemented in a couple of places. Most likely is that
XYZInstrInfo.td (or some related TableGen file) defines a
ComplexPattern that is used by the LDWI instruction definition. That
ComplexPattern tells pattern matching to call a specific function in
XYZISelDAGToDAG.cpp when deciding what to use for the LDWI operands.
That C++ function is probably what looks for an FrameIndex node and
has been taught that it can be folded into the load.

If you just grep the target's code for FrameIndex or frameindex you
should find it pretty quickly though, even if they used some other
method. There don't tend to be many uses of that particular node.

Cheers.

Tim.

Josh Sharp via llvm-dev

2019-Jan-26 00:15 UTC

head link

[llvm-dev] Different SelectionDAGs for same CPU

Hi Tim,
>That C++ function is probably what looks for an FrameIndex node and
>has been taught that it can be folded into the load.
How do you teach a function that a node can be folded into an instruction?

________________________________
From: Tim Northover <t.p.northover at gmail.com>
Sent: Monday, January 21, 2019 11:52 PM
To: Josh Sharp
Cc: via llvm-dev
Subject: Re: [llvm-dev] Different SelectionDAGs for same CPU

Hi Josh,

On Tue, 22 Jan 2019 at 04:54, Josh Sharp via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> In the first case, node t1 is a separate node whereas in the second case,
t1 is inside t4. What difference in implementation could explain this difference
in behavior?
The second compiler looks like someone has added extra code to fold a
stack address calculation into the load operation that accesses the
variable.
> Where in the code should I look into?
It could be implemented in a couple of places. Most likely is that
XYZInstrInfo.td (or some related TableGen file) defines a
ComplexPattern that is used by the LDWI instruction definition. That
ComplexPattern tells pattern matching to call a specific function in
XYZISelDAGToDAG.cpp when deciding what to use for the LDWI operands.
That C++ function is probably what looks for an FrameIndex node and
has been taught that it can be folded into the load.

If you just grep the target's code for FrameIndex or frameindex you
should find it pretty quickly though, even if they used some other
method. There don't tend to be many uses of that particular node.

Cheers.

Tim.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190126/d08566a0/attachment-0001.html>

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Jan 2019 - Different SelectionDAGs for same CPU

[llvm-dev] Different SelectionDAGs for same CPU

[llvm-dev] Different SelectionDAGs for same CPU

[llvm-dev] Different SelectionDAGs for same CPU

Maybe Matching Threads