Nagurne, James via llvm-dev
2020-Aug-07 22:57 UTC
[llvm-dev] Branches which return values in SelectionDAG
Hi all,
I am working on modeling an instruction similar to SystemZ's 'BRCT',
which takes a register, decrements it, and branches if the register is nonzero.
I saw that the LLVM backend for SystemZ generates the instruction in a
MachineFunctionPass as part of a pass intended to eliminate or combine compares.
I then looked at ARM, where it uses the HardwareLoops pass first, and then a
combine that occurs in the ARM ISel stage. It replaces branch instructions with
special 'WLS' and 'LE' nodes that are custom selected into
t2WhileLoopStart and t2LoopEnd pseudo instructions with isBranch and
isTerminator set. These pseudo instructions are finalized in a later
MachineFunctionPass.
I had originally intended to use the HardwareLoops pass to do most of the
initial transformation and bookkeeping, allowing me to utilize the generated
intrinsics in my own pass to further transform and customize the loop.
What I found out, however, is that I don't know enough about the
SelectionDAG to know if this is possible.
Trying to combine the two concepts (Value-returning branches and handling them
in the selection DAG), I wrote my backend to generate:
header:
%InitialVal = N
body:
%IndVar = PHI(%InitialVal, %header, %DecVal, %body)
...
%MultipleReturns = call {i32, i1} compare_and_maybe_decrement(%IndVar, 1)
%DecVal = extract {i32, i1} %MultipleReturns 0
%Cond = extract {i32, i1} %MultipleReturns 1
br %Cond, body, exit
exit:
...
Then, I attempted to combine the intrinsic, extractions, and branch together in
the SelectionDAG.
What I found, however, is that this concept, which seems fine in the LLVM IR, is
not fine in the DAG.
Specifically, there is a CopyToReg in the DAG that occurs between the intrinsic
and the branch that saves off %DecVal. I presume it's there because the
value is leaving the DAG (to be copied from in the next iteration). With the
branch node returning that value instead, it seems like there's no legal
location in which to place this necessary CopyToReg. If you order it after the
fused branch, I believe it's illegal because it's logically incorrect
(only copy if we're terminating the loop?). If you order it before, I
don't think the DAG makes sense anymore:
t1 = CopyToReg %1, t2 ; Copying a value before
it's defined???
t2 = Target::BR_DEC ...
Indeed, I get the abort "Operand not processed?" for the CopyToReg
when I tried it, indicating something was amiss.
I'm more than willing to provide more context such as DAG dumps if people
have ideas, I just didn't want to fill this email with debug.
Is what I'm doing possible? Or does it make sense to keep the special and
separate compare_and_maybe_decrement operation until after selection is finished
so that I can fuse using MachineInstrs instead?
Thanks for any help!
J.B. Nagurne
Code Generation
Texas Instruments
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200807/5ebc2826/attachment.html>
Sjoerd Meijer via llvm-dev
2020-Aug-08 07:36 UTC
[llvm-dev] Branches which return values in SelectionDAG
I don't know about SystemZ's instructions, but what you described
sounded exactly like a hardware loop construct to me. That's why I am
wondering why using the hardware loop pass (and some friends) isn't working
for you, that wasn't entirely clear to me. After the HardwareLoop pass we
have something like this:
@set.loop.iterations
hwloop:
..
@loop.decrement.reg
icmp
br hwloop
For ARM we indeed then have something like this using pseudos after isel:
t2DoLoopStart
hwloop:
..
$lr = t2LoopDec $lr
t2LoopEnd $lr, ...
tB %bb.2, ...
These pseudos do a decrement of the register holding the hwloop counter, which
is consumed by branch instruction. This seems to match the semantics that you
described: " which takes a register, decrements it, and branches if the
register is nonzero", unless I miss something of course... Very late in the
optimisation pipeline we have an ARM hardware loop pass that converts this in a
hwloop and we just have something like this left:
$lr = DLS $r2
hwloop:
..
$lr = LE $lr
And while I think the semantics of our LE instructions is slightly different I
think, I don't think it matters (again, unless I miss something).
Sorry for not answering your actual isel question. Can't answer that without
digging into it, perhaps someone else can.
Cheers,
Sjoerd.
________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Nagurne,
James via llvm-dev <llvm-dev at lists.llvm.org>
Sent: 07 August 2020 23:57
To: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Branches which return values in SelectionDAG
Hi all,
I am working on modeling an instruction similar to SystemZ’s ‘BRCT’, which takes
a register, decrements it, and branches if the register is nonzero. I saw that
the LLVM backend for SystemZ generates the instruction in a MachineFunctionPass
as part of a pass intended to eliminate or combine compares.
I then looked at ARM, where it uses the HardwareLoops pass first, and then a
combine that occurs in the ARM ISel stage. It replaces branch instructions with
special ‘WLS’ and ‘LE’ nodes that are custom selected into t2WhileLoopStart and
t2LoopEnd pseudo instructions with isBranch and isTerminator set. These pseudo
instructions are finalized in a later MachineFunctionPass.
I had originally intended to use the HardwareLoops pass to do most of the
initial transformation and bookkeeping, allowing me to utilize the generated
intrinsics in my own pass to further transform and customize the loop.
What I found out, however, is that I don’t know enough about the SelectionDAG to
know if this is possible.
Trying to combine the two concepts (Value-returning branches and handling them
in the selection DAG), I wrote my backend to generate:
header:
%InitialVal = N
body:
%IndVar = PHI(%InitialVal, %header, %DecVal, %body)
…
%MultipleReturns = call {i32, i1} compare_and_maybe_decrement(%IndVar, 1)
%DecVal = extract {i32, i1} %MultipleReturns 0
%Cond = extract {i32, i1} %MultipleReturns 1
br %Cond, body, exit
exit:
…
Then, I attempted to combine the intrinsic, extractions, and branch together in
the SelectionDAG.
What I found, however, is that this concept, which seems fine in the LLVM IR, is
not fine in the DAG.
Specifically, there is a CopyToReg in the DAG that occurs between the intrinsic
and the branch that saves off %DecVal. I presume it’s there because the value is
leaving the DAG (to be copied from in the next iteration). With the branch node
returning that value instead, it seems like there’s no legal location in which
to place this necessary CopyToReg. If you order it after the fused branch, I
believe it’s illegal because it’s logically incorrect (only copy if we’re
terminating the loop?). If you order it before, I don’t think the DAG makes
sense anymore:
t1 = CopyToReg %1, t2 ; Copying a value before
it’s defined???
t2 = Target::BR_DEC …
Indeed, I get the abort “Operand not processed?” for the CopyToReg when I tried
it, indicating something was amiss.
I’m more than willing to provide more context such as DAG dumps if people have
ideas, I just didn’t want to fill this email with debug.
Is what I’m doing possible? Or does it make sense to keep the special and
separate compare_and_maybe_decrement operation until after selection is finished
so that I can fuse using MachineInstrs instead?
Thanks for any help!
J.B. Nagurne
Code Generation
Texas Instruments
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200808/4059145d/attachment.html>
Nagurne, James via llvm-dev
2020-Aug-08 19:41 UTC
[llvm-dev] Branches which return values in SelectionDAG
I need to do fixups after hardwareloops because of the operation order.
Instead of "if (--x) goto body;"
My instruction is "if (x) { x--; goto body; }"
Thus, I have to model this conditional decrement operation (as well as fixing up
the starting value because it will iterate an extra time).
So, in the simplest terms, I'd have:
%reg = phi(N, %nextreg)
...
%cond = icmp ne, %reg, 0
%nextreg = loop.conditional.decrement.reg %cond, %reg, 1
br %cond, body, exit
I could (and probably will) take ARM's path here and match multiple psedo
instructions that get combined later, unless someone knows a way to avoid the
CopyToReg problem caused by a terminator generating a value.
In summary, though, I took ARM's implementation one step further and tried
to combine everything into a single branch-like node, and that's where the
issues lie.
J.B. Nagurne
Code Generation
Texas Instruments
________________________________
From: Sjoerd Meijer <Sjoerd.Meijer at arm.com>
Sent: Saturday, August 8, 2020 2:36:17 AM
To: llvm-dev at lists.llvm.org; Nagurne, James
Subject: [EXTERNAL] Re: Branches which return values in SelectionDAG
I don't know about SystemZ's instructions, but what you described
sounded exactly like a hardware loop construct to me. That's why I am
wondering why using the hardware loop pass (and some friends) isn't working
for you, that wasn't entirely clear to me. After the HardwareLoop pass we
have something like this:
@set.loop.iterations
hwloop:
..
@loop.decrement.reg
icmp
br hwloop
For ARM we indeed then have something like this using pseudos after isel:
t2DoLoopStart
hwloop:
..
$lr = t2LoopDec $lr
t2LoopEnd $lr, ...
tB %bb.2, ...
These pseudos do a decrement of the register holding the hwloop counter, which
is consumed by branch instruction. This seems to match the semantics that you
described: " which takes a register, decrements it, and branches if the
register is nonzero", unless I miss something of course... Very late in the
optimisation pipeline we have an ARM hardware loop pass that converts this in a
hwloop and we just have something like this left:
$lr = DLS $r2
hwloop:
..
$lr = LE $lr
And while I think the semantics of our LE instructions is slightly different I
think, I don't think it matters (again, unless I miss something).
Sorry for not answering your actual isel question. Can't answer that without
digging into it, perhaps someone else can.
Cheers,
Sjoerd.
________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Nagurne,
James via llvm-dev <llvm-dev at lists.llvm.org>
Sent: 07 August 2020 23:57
To: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Branches which return values in SelectionDAG
Hi all,
I am working on modeling an instruction similar to SystemZ’s ‘BRCT’, which takes
a register, decrements it, and branches if the register is nonzero. I saw that
the LLVM backend for SystemZ generates the instruction in a MachineFunctionPass
as part of a pass intended to eliminate or combine compares.
I then looked at ARM, where it uses the HardwareLoops pass first, and then a
combine that occurs in the ARM ISel stage. It replaces branch instructions with
special ‘WLS’ and ‘LE’ nodes that are custom selected into t2WhileLoopStart and
t2LoopEnd pseudo instructions with isBranch and isTerminator set. These pseudo
instructions are finalized in a later MachineFunctionPass.
I had originally intended to use the HardwareLoops pass to do most of the
initial transformation and bookkeeping, allowing me to utilize the generated
intrinsics in my own pass to further transform and customize the loop.
What I found out, however, is that I don’t know enough about the SelectionDAG to
know if this is possible.
Trying to combine the two concepts (Value-returning branches and handling them
in the selection DAG), I wrote my backend to generate:
header:
%InitialVal = N
body:
%IndVar = PHI(%InitialVal, %header, %DecVal, %body)
…
%MultipleReturns = call {i32, i1} compare_and_maybe_decrement(%IndVar, 1)
%DecVal = extract {i32, i1} %MultipleReturns 0
%Cond = extract {i32, i1} %MultipleReturns 1
br %Cond, body, exit
exit:
…
Then, I attempted to combine the intrinsic, extractions, and branch together in
the SelectionDAG.
What I found, however, is that this concept, which seems fine in the LLVM IR, is
not fine in the DAG.
Specifically, there is a CopyToReg in the DAG that occurs between the intrinsic
and the branch that saves off %DecVal. I presume it’s there because the value is
leaving the DAG (to be copied from in the next iteration). With the branch node
returning that value instead, it seems like there’s no legal location in which
to place this necessary CopyToReg. If you order it after the fused branch, I
believe it’s illegal because it’s logically incorrect (only copy if we’re
terminating the loop?). If you order it before, I don’t think the DAG makes
sense anymore:
t1 = CopyToReg %1, t2 ; Copying a value before
it’s defined???
t2 = Target::BR_DEC …
Indeed, I get the abort “Operand not processed?” for the CopyToReg when I tried
it, indicating something was amiss.
I’m more than willing to provide more context such as DAG dumps if people have
ideas, I just didn’t want to fill this email with debug.
Is what I’m doing possible? Or does it make sense to keep the special and
separate compare_and_maybe_decrement operation until after selection is finished
so that I can fuse using MachineInstrs instead?
Thanks for any help!
J.B. Nagurne
Code Generation
Texas Instruments
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200808/b881d05e/attachment.html>