Jay Foad via llvm-dev
2021-Mar-27 08:56 UTC
[llvm-dev] globalisel: cross-bank constant propagation?
Hi Nicolai! For simplicity our regbankselect says that all operands of VALU instructions have to go in vgprs. Moving some of them into sgprs is left as an optimisation for a later pass. As you know there are limits on //how many// operands of a VALU instruction can be sgprs or constants, which are not simple to express in terms of alternative operand mappings. Thanks, Jay. On Sat, 27 Mar 2021 at 07:58, Nicolai Hähnle <nhaehnle at gmail.com> wrote:> > Hi Jay, > > Where does the copy of a constant from SGPR to VGPR come from? Naively, I'd think that better GMIR would be > > %3:sgpr(s32) = G_CONSTANT i32 5 > ... > %8:vgpr(s64) = G_LSHR %7, %3(s32) > > Cheers, > Nicolai > > On Fri, Mar 26, 2021 at 3:43 PM Jay Foad via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hi, >> >> While experimenting with alternative ways of lowering funnel shifts on >> AMDGPU I have run into a case where this GMIR: >> >> %0:vgpr(s32) = COPY $vgpr0 >> %1:vgpr(s32) = COPY $vgpr1 >> %2:sgpr_64 = COPY $sgpr30_sgpr31 >> %3:sgpr(s32) = G_CONSTANT i32 5 >> %7:vgpr(s64) = G_MERGE_VALUES %1(s32), %0(s32) >> %9:vgpr(s32) = COPY %3(s32) >> %8:vgpr(s64) = G_LSHR %7, %9(s32) >> %4:vgpr(s32) = G_TRUNC %8(s64) >> $vgpr0 = COPY %4(s32) >> %5:ccr_sgpr_64 = COPY %2 >> S_SETPC_B64_return %5, implicit $vgpr0 >> >> does not get matched by this selection pattern: >> >> def : GCNPat<(i32 (trunc (srl i64:$src0, (i32 ShiftAmt32Imm:$src1)))), >> (V_ALIGNBIT_B32_e64 (i32 (EXTRACT_SUBREG VReg_64:$src0, sub1)), >> (i32 (EXTRACT_SUBREG VReg_64:$src0, sub0)), >> VSrc_b32:$src1)>; >> >> because the COPY of G_CONSTANT 5 from sgpr to vgpr gets in the way of >> matching the (i32 ShiftAmt32Imm:$src1). >> >> What's a good way of fixing this? I know there has been some >> discussion before about whether the generated matcher should look >> through copies, either automatically or only when the tablegen pattern >> is augmented with some kind of "please look through copies here" node. >> >> As an alternative, would it make sense to run some cross-bank constant >> propagation after regbankselect? It seems pretty obvious to me that %9 >> above should be simplified to: >> >> %9:vgpr(s32) = G_CONSTANT i32 5 >> >> so long as it's legal, which we can easily check, can't we? >> >> Thanks, >> Jay. >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > -- > Lerne, wie die Welt wirklich ist, > aber vergiss niemals, wie sie sein sollte.
Nicolai Hähnle via llvm-dev
2021-Mar-27 13:40 UTC
[llvm-dev] globalisel: cross-bank constant propagation?
Hi Jay, On Sat, Mar 27, 2021 at 9:56 AM Jay Foad <jay.foad at gmail.com> wrote:> For simplicity our regbankselect says that all operands of VALU > instructions have to go in vgprs. Moving some of them into sgprs is > left as an optimisation for a later pass. As you know there are limits > on //how many// operands of a VALU instruction can be sgprs or > constants, which are not simple to express in terms of alternative > operand mappings. >Okay, that makes sense: the initial register bank selection is biased to generate legal code. It sounds likely that similar issues of obstructive COPYs will appear in other patterns as well, right? For example, if you have original code of the form (x * (-y)), where x is divergent and y is uniform, it seems likely you'd get something like (typed completely off the top of my head): t:sgpr = G_FNEG y:sgpr t':vgpr = COPY t:sgpr r:vgpr = G_FMUL x:vgpt, t':vgpr ... and then an instruction selection pattern that pulls the negation into a source modifier would fail due to the COPY? The point is that this is not just an issue for constants, so a constant-specific pass seems the wrong way to go. The remaining options I see right now are: (1) Allow pattern matching to look through COPYs. (2) Eliminate those COPYs before instruction selection. Not having thought about this very long, the relative merits seem to be: - (2) results in smaller GMIR and a simpler pattern matcher, which is better for compile time - (2) requires legalization after(?) instruction selection (constant bus limit etc.), which is going to be a pain in the existing CodeGen framework Neither option makes me happy :) I guess I would go with (2) if this was a greenfield development, but (1) fits better in the existing LLVM code base. Cheers, Nicolai Thanks,> Jay. > > On Sat, 27 Mar 2021 at 07:58, Nicolai Hähnle <nhaehnle at gmail.com> wrote: > > > > Hi Jay, > > > > Where does the copy of a constant from SGPR to VGPR come from? Naively, > I'd think that better GMIR would be > > > > %3:sgpr(s32) = G_CONSTANT i32 5 > > ... > > %8:vgpr(s64) = G_LSHR %7, %3(s32) > > > > Cheers, > > Nicolai > > > > On Fri, Mar 26, 2021 at 3:43 PM Jay Foad via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> > >> Hi, > >> > >> While experimenting with alternative ways of lowering funnel shifts on > >> AMDGPU I have run into a case where this GMIR: > >> > >> %0:vgpr(s32) = COPY $vgpr0 > >> %1:vgpr(s32) = COPY $vgpr1 > >> %2:sgpr_64 = COPY $sgpr30_sgpr31 > >> %3:sgpr(s32) = G_CONSTANT i32 5 > >> %7:vgpr(s64) = G_MERGE_VALUES %1(s32), %0(s32) > >> %9:vgpr(s32) = COPY %3(s32) > >> %8:vgpr(s64) = G_LSHR %7, %9(s32) > >> %4:vgpr(s32) = G_TRUNC %8(s64) > >> $vgpr0 = COPY %4(s32) > >> %5:ccr_sgpr_64 = COPY %2 > >> S_SETPC_B64_return %5, implicit $vgpr0 > >> > >> does not get matched by this selection pattern: > >> > >> def : GCNPat<(i32 (trunc (srl i64:$src0, (i32 ShiftAmt32Imm:$src1)))), > >> (V_ALIGNBIT_B32_e64 (i32 (EXTRACT_SUBREG VReg_64:$src0, > sub1)), > >> (i32 (EXTRACT_SUBREG VReg_64:$src0, sub0)), > >> VSrc_b32:$src1)>; > >> > >> because the COPY of G_CONSTANT 5 from sgpr to vgpr gets in the way of > >> matching the (i32 ShiftAmt32Imm:$src1). > >> > >> What's a good way of fixing this? I know there has been some > >> discussion before about whether the generated matcher should look > >> through copies, either automatically or only when the tablegen pattern > >> is augmented with some kind of "please look through copies here" node. > >> > >> As an alternative, would it make sense to run some cross-bank constant > >> propagation after regbankselect? It seems pretty obvious to me that %9 > >> above should be simplified to: > >> > >> %9:vgpr(s32) = G_CONSTANT i32 5 > >> > >> so long as it's legal, which we can easily check, can't we? > >> > >> Thanks, > >> Jay. > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > > > -- > > Lerne, wie die Welt wirklich ist, > > aber vergiss niemals, wie sie sein sollte. >-- Lerne, wie die Welt wirklich ist, aber vergiss niemals, wie sie sein sollte. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210327/3a573342/attachment.html>
Matt Arsenault via llvm-dev
2021-Mar-29 13:04 UTC
[llvm-dev] globalisel: cross-bank constant propagation?
> On Mar 27, 2021, at 04:56, Jay Foad via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi Nicolai! > > For simplicity our regbankselect says that all operands of VALU > instructions have to go in vgprs. Moving some of them into sgprs is > left as an optimisation for a later pass. As you know there are limits > on //how many// operands of a VALU instruction can be sgprs or > constants, which are not simple to express in terms of alternative > operand mappings. > > Thanks, > Jay.There are 2 issues: 1. Current RegBankSelect does not consider the uses when selecting the bank. This is a general missing optimization 2. For the AMDGPU case, I think we should have a post-regbankselect combiner for this. It’s often better to materialize constants for each bank I don’t think we actually want to have to look through copies, and the places we do are just working around the status quo. The folding SGPR/constants into instructions should be a new and improved version of SIFoldOperands. I think optimizing this is beyond the scope of what RegBankSelect and selection patterns. Far too much code would need to be taught to respect and preserve the constant bus limitation otherwise, so that’s why everything uses VGPRs. -Matt