Matt Arsenault via llvm-dev
2018-Dec-20 05:25 UTC
[llvm-dev] RegBankSelect complex value mappings
Hi, I’m looking at RegBankSelect’s partially implemented support for deciding to split a value between multiple registers and I’m wondering if it’s actually intended to solve the problem I’m trying to use it for. RegisterBankInfo.h has this example mapping table: /// E.g., /// Let say we have a 32-bit add and a <2 x 32-bit> vadd. We /// can expand the /// <2 x 32-bit> add into 2 x 32-bit add. /// /// Currently the TableGen-like file would look like: /// \code /// PartialMapping[] = { /// /*32-bit add*/ {0, 32, GPR}, /// /*2x32-bit add*/ {0, 32, GPR}, {0, 32, GPR}, // <-- Same entry 3x /// /*<2x32-bit> vadd {0, 64, VPR} /// }; // PartialMapping duplicated. /// /// ValueMapping[] { /// /*plain 32-bit add*/ {&PartialMapping[0], 1}, /// /*expanded vadd on 2xadd*/ {&PartialMapping[1], 2}, /// /*plain <2x32-bit> vadd*/ {&PartialMapping[3], 1} /// }; This looks almost like the problem I want to solve for AMDGPU. There are 2 main register banks. On the SALU, some 64-bit operation are available which can only be 32-bit on the VALU. For example, if all of the input operands aren’t in the scalar bank, a 64-bit and needs to be split into 2 32-bit ands. It’s illegal to copy from the vector to the scalar bank, since these don’t mean what vector and scalar mean on other targets. The current code seems very operand centric and computes costs only based on copies. Decomposing the operation into 2 pieces requires rewriting the entire instruction, not just copying from one offending operand. Is this intended to handle this kind of case, or do I need to introduce a separate register bank aware legalizer pass? -Matt
Quentin Colombet via llvm-dev
2018-Dec-21 00:15 UTC
[llvm-dev] RegBankSelect complex value mappings
Hi Matt, Your use case falls definitely in what RegBankSelect meant to solve. That said, the support you need is not implemented because we didn't have use cases to test the code against. Regarding the cost, if the mapping produces more than 1 partial value, right now RegBankSelect::getRepairCost will say this is too expensive and this is actually where you need to patch the pass to add a target hook to compute something that would use instruction to decompose the value. Le mer. 19 déc. 2018 à 21:25, Matt Arsenault <arsenm2 at gmail.com> a écrit :> > Hi, > > I’m looking at RegBankSelect’s partially implemented support for deciding to split a value between multiple registers and I’m wondering if it’s actually intended to solve the problem I’m trying to use it for. RegisterBankInfo.h has this example mapping table: > /// E.g., > /// Let say we have a 32-bit add and a <2 x 32-bit> vadd. We > /// can expand the > /// <2 x 32-bit> add into 2 x 32-bit add. > /// > /// Currently the TableGen-like file would look like: > /// \code > /// PartialMapping[] = { > /// /*32-bit add*/ {0, 32, GPR}, > /// /*2x32-bit add*/ {0, 32, GPR}, {0, 32, GPR}, // <-- Same entry 3x > /// /*<2x32-bit> vadd {0, 64, VPR} > /// }; // PartialMapping duplicated. > /// > /// ValueMapping[] { > /// /*plain 32-bit add*/ {&PartialMapping[0], 1}, > /// /*expanded vadd on 2xadd*/ {&PartialMapping[1], 2}, > /// /*plain <2x32-bit> vadd*/ {&PartialMapping[3], 1} > /// }; > > This looks almost like the problem I want to solve for AMDGPU. There are 2 main register banks. On the SALU, some 64-bit operation are available which can only be 32-bit on the VALU. For example, if all of the input operands aren’t in the scalar bank, a 64-bit and needs to be split into 2 32-bit ands. It’s illegal to copy from the vector to the scalar bank, since these don’t mean what vector and scalar mean on other targets. > > The current code seems very operand centric and computes costs only based on copies. Decomposing the operation into 2 pieces requires rewriting the entire instruction,So the copy part cost I covered it. For the cost of rewriting the instruction completely, this is captured by InstructionMapping::getCost. The idea of InstructionMapping::getCost is to reflect the cost for transforming the current instruction into the instruction after we apply this mapping. Then the RepairCost is here to account for the cost of "bringing" every operand to the right place for this mapping using copy or some target specific sequence. Like the cost computation, the target specific sequences are not implemented, but should happen in RegBankSelect::repairReg. Right now, this will assert that the number of break downs should be == 1 but the code to decompose the operand should happen there. Finally, the rewriting of the current instruction is supposed to happen in RegisterBankInfo::applyMapping. If you have an example (.mir) that you can share, we can work together to make this happen. Cheers, -Quentin> not just copying from one offending operand. Is this intended to handle this kind of case, or do I need to introduce a separate register bank aware legalizer pass? > > -Matt >
Matt Arsenault via llvm-dev
2018-Dec-21 07:51 UTC
[llvm-dev] RegBankSelect complex value mappings
> On Dec 21, 2018, at 11:15 AM, Quentin Colombet <quentin.colombet at gmail.com> wrote: > > Hi Matt, > > Your use case falls definitely in what RegBankSelect meant to solve. > That said, the support you need is not implemented because we didn't > have use cases to test the code against. > > Regarding the cost, if the mapping produces more than 1 partial value, > right now RegBankSelect::getRepairCost will say this is too expensive > and this is actually where you need to patch the pass to add a target > hook to compute something that would use instruction to decompose the > value.Yes, this is what happens with greedy. With fast I get a little further.> > So the copy part cost I covered it. For the cost of rewriting the > instruction completely, this is captured by > InstructionMapping::getCost. > The idea of InstructionMapping::getCost is to reflect the cost for > transforming the current instruction into the instruction after we > apply this mapping. Then the RepairCost is here to account for the > cost of "bringing" every operand to the right place for this mapping > using copy or some target specific sequence. > Like the cost computation, the target specific sequences are not > implemented, but should happen in RegBankSelect::repairReg.This seems to contradict the comment on repairReg? /// \note The caller is supposed to do the rewriting of op if need be. /// I.e., Reg = op ... => <NewRegs> = NewOp …> Right now, > this will assert that the number of break downs should be == 1 but the > code to decompose the operand should happen there. > Finally, the rewriting of the current instruction is supposed to > happen in RegisterBankInfo::applyMapping. > > If you have an example (.mir) that you can share, we can work together > to make this happen. > > Cheers, > -QuentinThe simplest case is this, where there’s only one register bank involved. The cost of the unmerge and merge should be 0, there’s only a real cost from the fact that it is now 2 operations. --- name: and_i64_vv legalized: true body: | bb.0: ; Should turn into something like this, although the merge_values and unmerge_values can be optimized out ; %0:vgpr(s64) = COPY $vgpr0_vgpr1 ; %1:vgpr(s64) = COPY $vgpr2_vgpr3 ; %2:vgpr(s32), %3:vgpr(s32) = G_UNMERGE_VALUES %0 ; %4:vgpr(s32), %5:vgpr(s32) = G_UNMERGE_VALUES %1 ; %6:vgpr(s32) = G_AND %2, %3 ; %7:vgpr(s32) = G_AND %4, %5 ; %8:vgpr(s64) = G_MERGE_VALUES %6, %7 liveins: $vgpr0_vgpr1, $vgpr2_vgpr3 %0:_(s64) = COPY $vgpr0_vgpr1 %1:_(s64) = COPY $vgpr2_vgpr3 %2:_(s64) = G_AND %0, %1 … Part of my confusion about the operand focus is the use of RepairPts. In this case the inputs %0 and %1 have been trivially assigned already, but I kind of expected those to be present as something to handle here if that makes sense. -Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181221/8de23801/attachment-0001.html>