Josh Sharp via llvm-dev
2019-Jan-22 04:54 UTC
[llvm-dev] Different SelectionDAGs for same CPU
Hi, I used 2 different compilers to compile the same IR for the same custom target. The LLVM IR code is define i32 @_Z9test_mathv() #0 { %a = alloca i32, align 4 %1 = load i32, i32* %a, align 4 ret i32 %1 } Before instruction selection, the Selection DAGs are the same: Optimized legalized selection DAG: %bb.0 '_Z9test_mathv:' SelectionDAG has 7 nodes: t0: ch = EntryToken t4: i32,ch = load<(dereferenceable load 4 from %ir.a)> t0, FrameIndex:i32<0>, undef:i32 t6: ch,glue = CopyToReg t0, Register:i32 $r4, t4 t7: ch = UISD::Ret t6, Register:i32 $r4, t6:1 But after it, one has 1 more node than the other compiler 1 ===== Instruction selection ends: Selected selection DAG: %bb.0 '_Z9test_mathv:' SelectionDAG has 8 nodes: t0: ch = EntryToken t1: i32 = add TargetFrameIndex:i32<0>, TargetConstant:i32<0> t4: i32,ch = LDWI<Mem:(dereferenceable load 4 from %ir.a)> t1, t0 t6: ch,glue = CopyToReg t0, Register:i32 $r4, t4 t7: ch = JLR Register:i32 $r4, t6, t6:1 compiler 2 ===== Instruction selection ends: Selected selection DAG: BB#0 '_Z9test_mathv:' SelectionDAG has 7 nodes: t0: ch = EntryToken t4: i32,ch = LDWI<Mem:LD4[%a](dereferenceable)> TargetFrameIndex:i32<0>, TargetConstant:i32<0>, t0 t6: ch,glue = CopyToReg t0, Register:i32 %$r4, t4 t7: ch = JLR Register:i32 %$r4, t6, t6:1 In the first case, node t1 is a separate node whereas in the second case, t1 is inside t4. What difference in implementation could explain this difference in behavior? Where in the code should I look into? (Note that "LDWI" is an instruction that adds up a register and an immediate and loads the memory content located at the address represented by the sum into a register) Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190122/2edd4ce0/attachment.html>
Tim Northover via llvm-dev
2019-Jan-22 07:52 UTC
[llvm-dev] Different SelectionDAGs for same CPU
Hi Josh, On Tue, 22 Jan 2019 at 04:54, Josh Sharp via llvm-dev <llvm-dev at lists.llvm.org> wrote:> In the first case, node t1 is a separate node whereas in the second case, t1 is inside t4. What difference in implementation could explain this difference in behavior?The second compiler looks like someone has added extra code to fold a stack address calculation into the load operation that accesses the variable.> Where in the code should I look into?It could be implemented in a couple of places. Most likely is that XYZInstrInfo.td (or some related TableGen file) defines a ComplexPattern that is used by the LDWI instruction definition. That ComplexPattern tells pattern matching to call a specific function in XYZISelDAGToDAG.cpp when deciding what to use for the LDWI operands. That C++ function is probably what looks for an FrameIndex node and has been taught that it can be folded into the load. If you just grep the target's code for FrameIndex or frameindex you should find it pretty quickly though, even if they used some other method. There don't tend to be many uses of that particular node. Cheers. Tim.
Josh Sharp via llvm-dev
2019-Jan-26 00:15 UTC
[llvm-dev] Different SelectionDAGs for same CPU
Hi Tim,>That C++ function is probably what looks for an FrameIndex node and >has been taught that it can be folded into the load.How do you teach a function that a node can be folded into an instruction? ________________________________ From: Tim Northover <t.p.northover at gmail.com> Sent: Monday, January 21, 2019 11:52 PM To: Josh Sharp Cc: via llvm-dev Subject: Re: [llvm-dev] Different SelectionDAGs for same CPU Hi Josh, On Tue, 22 Jan 2019 at 04:54, Josh Sharp via llvm-dev <llvm-dev at lists.llvm.org> wrote:> In the first case, node t1 is a separate node whereas in the second case, t1 is inside t4. What difference in implementation could explain this difference in behavior?The second compiler looks like someone has added extra code to fold a stack address calculation into the load operation that accesses the variable.> Where in the code should I look into?It could be implemented in a couple of places. Most likely is that XYZInstrInfo.td (or some related TableGen file) defines a ComplexPattern that is used by the LDWI instruction definition. That ComplexPattern tells pattern matching to call a specific function in XYZISelDAGToDAG.cpp when deciding what to use for the LDWI operands. That C++ function is probably what looks for an FrameIndex node and has been taught that it can be folded into the load. If you just grep the target's code for FrameIndex or frameindex you should find it pretty quickly though, even if they used some other method. There don't tend to be many uses of that particular node. Cheers. Tim. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190126/d08566a0/attachment-0001.html>