Alex Susu via llvm-dev
2017-Mar-07 06:12 UTC
[llvm-dev] Specifying conditional blocks for the back end
Hello.
Because I experience optimizations (DCE, OoO schedule) which mess the
correct
semantics of the list of instructions lowered in ISelLowering from the VSELECT
LLVM
instruction, and these bad transformations happen even before scheduling, at
later I-sel
subpasses, I try to fix this problem by lowering VSELECT to only one
pseudo-instruction
and LATER translate it to a list of instructions and use bundles and maybe also
PredicateInstruction(), which is employed also in IfConversion.cpp.
More exactly I'm trying to use a pseudo-instruction that will get
translated to a
sequence of 4 MachineInstr, namely:
// These 4 instructions replace the pseudo-instruction I use for
LLVM's VSELECT
R31 = OR srcVselectFalse, srcVselectFalse
WHEREEQ
R31 = OR srcVselectTrue, srcVselectTrue
ENDWHERE
I plan to do this as early as possible, in a pass registered in
addInstSelector()
normally, which gets executed immediately after the first scheduling phase.
If anybody sees a problem with this, please let me know.
I think it is OK to specify an empty semantics (empty DAG pattern in
TableGen) for my
WHEREEQ/ENDWHERE instructions delimiting the predication/conditional block.
Eli, thank you for the pointers. The "it" ARM Thumb2 instruction
is very interesting,
maybe even unique among mainstream processors, handling predicated execution of
2
contiguous blocks of instructions; I found some specs for it at
https://community.arm.com/processors/b/blog/posts/condition-codes-3-conditional-execution-in-thumb-2.
This instruction is quite similar to my conditional-block instructions
WHERExy/ENDWHERE
(xy can be EQ, LT, CRY).
Thank you,
Alex
On 3/3/2017 8:59 PM, Friedman, Eli wrote:> On 3/2/2017 7:07 PM, Alex Susu via llvm-dev wrote:
>> Hello.
>> For my back end for the Connex SIMD research processor I want to
implement
>> conditional blocks (I guess the better term is predicated blocks).
Predicated blocks are
>> bordered by two instructions WHEREEQ (or WHERELT, etc) and ENDWHERE.
>> For example, the following code executes the instructions inside
the WHERE block
>> only for the lanes where R0 == R1:
>> EQ R0, R1;
>> WHEREEQ
>> vector_asm_instr1;
>> ...
>> vector_asm_instrk;
>> ENDWHERE
>>
>> I was able to generate at instruction selection such a block by
writing custom C++
>> selection code, but I don't know how can I inform the back end that
the instructions
>> inside the WHERE block get executed conditionally, not always.
>> This matters it seems only for optimization levels in llc -O1/2/3,
but not for O0.
>> For levels of optimization O1/2/3, I experienced cases where the
WHEREEQ and ENDWHERE
>> instructions were simply removed and the vector_asm_instr1..k became
executed
>> unconditionally, etc - and this is NOT good.
>>
>> Could you please tell me how can I inform the back end that the
instructions inside
>> my WHERE blocks get executed conditionally, not always.
>
> There's some existing infrastructure in the backend for predication;
see
> lib/CodeGen/IfConversion.cpp (and the target hooks PredicateInstruction
etc.). For
> forming blocks, you might want to follow what the ARM backend does for
Thumb2; see
> Thumb2ITBlockPass.cpp .
>
> -Eli
>
Alex Susu via llvm-dev
2017-Mar-12 00:36 UTC
[llvm-dev] Specifying conditional blocks for the back end
Hello.
I wanted to tell you that I managed to codegen correctly the LLVM VSELECT
instruction
by doing the steps described below.
Can somebody help me with the problems with the PredicateInstruction()
method I
describe below at point 3? Although I managed to avoid using
PredicateInstruction(), I am
curious why it doesn't work.
To codegen correctly the LLVM VSELECT instruction (I will be very explicit,
so bare
with me if you have similar issues):
- 1. I declare in TableGen an instruction WHERE_EQ (I assume without loss
of
generality that VSELECT has a seteq predicate), which will implement the VSELECT
in terms
of my processor's WHERE blocks.
- 2. in ISelLowering::Lower() I replace the VSELECT with WHERE_EQ. (note
that
before I was generating the entire list of MachineSDNode instructions equivalent
to
VSELECT in ISelLowering::Lower(), but the scheduler and the DCE (Dead Code
Elimination)
pass were messing up the order of instructions resulting in incorrect
semantics). Note
that I give to WHERE_EQ as inputs the SDNode operands of VSELECT, in order to be
able to
access them later in the PassCreateWhereBlocks pass mentioned below;
- 3. I registered a pass PassCreateWhereBlocks in addInstSelector() in
[Target]TargetMachine.cpp, which gets executed immediately after instruction
selection
followed by a first scheduling phase.
Even if I predicate in PassCreateWhereBlocks the instructions inside
the WHERE
block, the method PredicateInstruction() fails by returning false, which means
the method
did not add a predicated flag to the instructions I wanted to. This results, as
I said
before, in incorrect program optimizations such as useful instructions being
removed,
because the compiler does not understand that code in my WHERE blocks are
predicated
(conditional), so it assumes they are always being executed. As a side not, I
see the ARM
and SystemZ back ends are overriding the PredicateInstruction() method, but
their code is
a bit complex and I did not bother much to understand how they manage to
predicate their
instructions e.g., for ARM Thumb2 "it" instruction - are there some
links documenting
their work?
Therefore I started using bundles instead of making predicated
instructions - as
far as I can see DCE cannot be performed inside bundled instructions (see also
http://llvm.org/docs/doxygen/html/DeadMachineInstructionElim_8cpp_source.html
which does
NOT treat bundles, which implies it is not looking at the instruction inside a
bundle and
can only see the "header" instruction of a bundle; therefore, I
believe it is safe to
bundle instructions to avoid DCE as long as at least we can infer the
"header" instruction
of the bundle is not going to be ever DCE-ed). Using bundles also avoids that
the
scheduler changes the order of the bundled instructions. To create the bundle I
use
MIBundleBuilder, since using directly in this pass (PassCreateWhereBlocks) the
finalizeBundle() method results in an error like "llc:
/llvm/lib/CodeGen/MachineInstrBundle.cpp:149: void
llvm::finalizeBundle(llvm::MachineBasicBlock&,
llvm::MachineBasicBlock::instr_iterator,
llvm::MachineBasicBlock::instr_iterator): Assertion
`TargetRegisterInfo::isPhysicalRegister(Reg)' failed."
So I create for VSELECT pred, Vreg_true, Vreg_false an equivalent
sequence of
MachineInstr:
// pred is computed before
R31 = OR Rfalse, Rfalse // copy Rfalse to R31
WHERE_EQ
R31 = OR Rtrue, Rtrue // copy Rtrue to R31
ENDWHERE
Note that I create a physical register (R31, a vector register; I also
reserve
this register in [Target]RegisterInfo::getReservedRegs(), to avoid an error
which
sometimes happened due to MachineVerifier.cpp like "Bad machine code: Using
an undefined
physical register"). I cannot use instead of R31 a virtual register in
PassCreateWhereBlocks (and ISelLowering::Lower()) since I need to assign to it
twice (for
both the then and else branches of the VSELECT instruction) and virtual
registers follow
the SSA rule of single-assignment (so I get the following error if assigning
twice to a
virtual register: <<MachineRegisterInfo.cpp:339 [...] "getVRegDef
assumes a single
definition or no definition"' failed.>>). Also I tried without
success using
MachineRegisterInfo::leaveSSA() to avoid this problem with single-assignment,
but then
other passes like MachineLICM will give an error in llc like
<<MachineLICM.cpp:409: [...]
Assertion `TargetRegisterInfo::isPhysicalRegister(Reg) && "Not
expecting virtual
register!"' failed.>>, because MachineRegisterInfo::isSSA()
returns false, which makes the
pass assume that register allocation has finished and we have only physical
registers,
which unfortunately is NOT the case.
- 4. I also register a pass PassFinalizeBundles, in the addPreSched2()
method
[Target]TargetMachine.cpp and use finalizeBundle() on the instruction bundle I
created
earlier in PassCreateWhereBlocks because I want to avoid later errors like
<</llvm/lib/CodeGen/PostRASchedulerList.cpp:357: virtual bool
{anonymous}::PostRAScheduler::runOnMachineFunction(llvm::MachineFunction&):
Assertion
`Count == 0 && "Instruction count mismatch!"'
failed.>> (IIRC)
Best regards,
Alex
On 3/7/2017 8:12 AM, Alex Susu wrote:> Hello.
> Because I experience optimizations (DCE, OoO schedule) which mess the
correct
> semantics of the list of instructions lowered in ISelLowering from the
VSELECT LLVM
> instruction, and these bad transformations happen even before scheduling,
at later I-sel
> subpasses, I try to fix this problem by lowering VSELECT to only one
pseudo-instruction
> and LATER translate it to a list of instructions and use bundles and maybe
also
> PredicateInstruction(), which is employed also in IfConversion.cpp.
> More exactly I'm trying to use a pseudo-instruction that will get
translated to a
> sequence of 4 MachineInstr, namely:
> // These 4 instructions replace the pseudo-instruction I use for
LLVM's VSELECT
> R31 = OR srcVselectFalse, srcVselectFalse
> WHERE_EQ
> R31 = OR srcVselectTrue, srcVselectTrue
> ENDWHERE
> I plan to do this as early as possible, in a pass registered in
addInstSelector()
> normally, which gets executed immediately after the first scheduling phase.
> If anybody sees a problem with this, please let me know.
>
> I think it is OK to specify an empty semantics (empty DAG pattern in
TableGen) for my
> WHERE_EQ/ENDWHERE instructions delimiting the predication/conditional
block.
>
> Eli, thank you for the pointers. The "it" ARM Thumb2
instruction is very interesting,
> maybe even unique among mainstream processors, handling predicated
execution of 2
> contiguous blocks of instructions; I found some specs for it at
>
https://community.arm.com/processors/b/blog/posts/condition-codes-3-conditional-execution-in-thumb-2.
> This instruction is quite similar to my conditional-block instructions
WHERExy/ENDWHERE
> (xy can be EQ, LT, CRY).
>
> Thank you,
> Alex
>
>
> On 3/3/2017 8:59 PM, Friedman, Eli wrote:
>> On 3/2/2017 7:07 PM, Alex Susu via llvm-dev wrote:
>>> Hello.
>>> For my back end for the Connex SIMD research processor I want
to implement
>>> conditional blocks (I guess the better term is predicated blocks).
Predicated blocks are
>>> bordered by two instructions WHEREEQ (or WHERELT, etc) and
ENDWHERE.
>>> For example, the following code executes the instructions
inside the WHERE block
>>> only for the lanes where R0 == R1:
>>> EQ R0, R1;
>>> WHEREEQ
>>> vector_asm_instr1;
>>> ...
>>> vector_asm_instrk;
>>> ENDWHERE
>>>
>>> I was able to generate at instruction selection such a block by
writing custom C++
>>> selection code, but I don't know how can I inform the back end
that the instructions
>>> inside the WHERE block get executed conditionally, not always.
>>> This matters it seems only for optimization levels in llc
-O1/2/3, but not for O0.
>>> For levels of optimization O1/2/3, I experienced cases where the
WHEREEQ and ENDWHERE
>>> instructions were simply removed and the vector_asm_instr1..k
became executed
>>> unconditionally, etc - and this is NOT good.
>>>
>>> Could you please tell me how can I inform the back end that the
instructions inside
>>> my WHERE blocks get executed conditionally, not always.
>>
>> There's some existing infrastructure in the backend for
predication; see
>> lib/CodeGen/IfConversion.cpp (and the target hooks PredicateInstruction
etc.). For
>> forming blocks, you might want to follow what the ARM backend does for
Thumb2; see
>> Thumb2ITBlockPass.cpp .
>>
>> -Eli
>>
Friedman, Eli via llvm-dev
2017-Mar-13 17:29 UTC
[llvm-dev] Specifying conditional blocks for the back end
On 3/11/2017 4:36 PM, Alex Susu via llvm-dev wrote:> Hello. > I wanted to tell you that I managed to codegen correctly the LLVM > VSELECT instruction by doing the steps described below. > Can somebody help me with the problems with the > PredicateInstruction() method I describe below at point 3? Although I > managed to avoid using PredicateInstruction(), I am curious why it > doesn't work. > > To codegen correctly the LLVM VSELECT instruction (I will be very > explicit, so bare with me if you have similar issues): > - 1. I declare in TableGen an instruction WHERE_EQ (I assume > without loss of generality that VSELECT has a seteq predicate), which > will implement the VSELECT in terms of my processor's WHERE blocks. > - 2. in ISelLowering::Lower() I replace the VSELECT with > WHERE_EQ. (note that before I was generating the entire list of > MachineSDNode instructions equivalent to VSELECT in > ISelLowering::Lower(), but the scheduler and the DCE (Dead Code > Elimination) pass were messing up the order of instructions resulting > in incorrect semantics). Note that I give to WHERE_EQ as inputs the > SDNode operands of VSELECT, in order to be able to access them later > in the PassCreateWhereBlocks pass mentioned below; > > - 3. I registered a pass PassCreateWhereBlocks in > addInstSelector() in [Target]TargetMachine.cpp, which gets executed > immediately after instruction selection followed by a first scheduling > phase. > Even if I predicate in PassCreateWhereBlocks the instructions > inside the WHERE block, the method PredicateInstruction() fails by > returning false, which means the method did not add a predicated flag > to the instructions I wanted to.PredicateInstruction is a virtual method, and the default implementation always returns false; your target is supposed to override it.> This results, as I said before, in incorrect program optimizations > such as useful instructions being removed, because the compiler does > not understand that code in my WHERE blocks are predicated > (conditional), so it assumes they are always being executed. As a side > not, I see the ARM and SystemZ back ends are overriding the > PredicateInstruction() method, but their code is a bit complex and I > did not bother much to understand how they manage to predicate their > instructions e.g., for ARM Thumb2 "it" instruction - are there some > links documenting their work?Thumb2 models its predicated instructions the same way as non-Thumb ARM does until very late in the backend. Basically, the predicate is just an operand of the MachineInstr. But it's a bit simpler because we don't predicate instructions until after register allocation.> Therefore I started using bundles instead of making predicated > instructions - as far as I can see DCE cannot be performed inside > bundled instructions (see also > http://llvm.org/docs/doxygen/html/DeadMachineInstructionElim_8cpp_source.html > which does NOT treat bundles, which implies it is not looking at the > instruction inside a bundle and can only see the "header" instruction > of a bundle; therefore, I believe it is safe to bundle instructions to > avoid DCE as long as at least we can infer the "header" instruction of > the bundle is not going to be ever DCE-ed). Using bundles also avoids > that the scheduler changes the order of the bundled instructions. To > create the bundle I use MIBundleBuilder, since using directly in this > pass (PassCreateWhereBlocks) the finalizeBundle() method results in an > error like "llc: /llvm/lib/CodeGen/MachineInstrBundle.cpp:149: void > llvm::finalizeBundle(llvm::MachineBasicBlock&, > llvm::MachineBasicBlock::instr_iterator, > llvm::MachineBasicBlock::instr_iterator): Assertion > `TargetRegisterInfo::isPhysicalRegister(Reg)' failed." > So I create for VSELECT pred, Vreg_true, Vreg_false an > equivalent sequence of MachineInstr: > // pred is computed before > R31 = OR Rfalse, Rfalse // copy Rfalse to R31 > WHERE_EQ > R31 = OR Rtrue, Rtrue // copy Rtrue to R31 > ENDWHERE > > Note that I create a physical register (R31, a vector > register; I also reserve this register in > [Target]RegisterInfo::getReservedRegs(), to avoid an error which > sometimes happened due to MachineVerifier.cpp like "Bad machine code: > Using an undefined physical register"). I cannot use instead of R31 a > virtual register in PassCreateWhereBlocks (and ISelLowering::Lower()) > since I need to assign to it twice (for both the then and else > branches of the VSELECT instruction) and virtual registers follow the > SSA rule of single-assignment (so I get the following error if > assigning twice to a virtual register: <<MachineRegisterInfo.cpp:339 > [...] "getVRegDef assumes a single definition or no definition"' > failed.>>). Also I tried without success using > MachineRegisterInfo::leaveSSA() to avoid this problem with > single-assignment, but then other passes like MachineLICM will give an > error in llc like <<MachineLICM.cpp:409: [...] Assertion > `TargetRegisterInfo::isPhysicalRegister(Reg) && "Not expecting virtual > register!"' failed.>>, because MachineRegisterInfo::isSSA() returns > false, which makes the pass assume that register allocation has > finished and we have only physical registers, which unfortunately is > NOT the case.The right way to model this in SSA form would be something like this: Rresult1 = OR Rfalse, Rfalse Rresult2 = WHERE_EQ_OR flags, Rresult1, Rtrue, Rtrue You then tie the two virtual registers together so the register allocator knows they have to be allocated to same physical register (something like `let Constraints = "$Rresult1 = $Rresult2"` in TableGen). -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project