Tobias Klausmann
2017-Aug-12 19:33 UTC
[Nouveau] [PATCH] nvc0/ir: propagate immediates to CALL input MOVs
On using builtin functions we have to move the input to registers $0 and $1, if one of the input value is an immediate, we fail to propagate the immediate: ... mov u32 $r477 0x00000003 (0) ... mov u32 $r0 %r473 (0) mov u32 $r1 $r477 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... With this patch the immediate is propagated, potentially causing the first MOV to be superfluous, which we'd remove in that case: ... mov u32 $r0 %r473 (0) mov u32 $r1 0x00000003 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... Shaderdb stats: total instructions in shared programs : 4893460 -> 4893324 (-0.00%) total gprs used in shared programs : 582972 -> 582881 (-0.02%) total local used in shared programs : 17960 -> 17960 (0.00%) local gpr inst bytes helped 0 91 112 112 hurt 0 0 0 0 Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index c8f0701572..861d08af24 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp @@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i) int builtin; bld.setPosition(i, false); - bld.mkMovToReg(0, i->getSrc(0)); - bld.mkMovToReg(1, i->getSrc(1)); + + // Generate movs to the input regs for the call we want to generate + for (int s = 0; i->srcExists(s); ++s) { + Instruction *ld = i->getSrc(s)->getInsn(); + ImmediateValue imm; + // check if we are moving an immediate, propagate it in that case + if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) || + !ld->src(0).getImmediate(imm)) + bld.mkMovToReg(s, i->getSrc(s)); + else { + bld.mkMovToReg(s, ld->getSrc(0)); + // Clear the src, to make code elimination possible here before we + // delete the instruction i later + i->setSrc(s, NULL); + if (ld->getDef(0)->refCount() == 0) + delete_Instruction(prog, ld); + } + } + switch (i->dType) { case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break; case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break; -- 2.14.0
Ilia Mirkin
2017-Aug-12 20:20 UTC
[Nouveau] [PATCH] nvc0/ir: propagate immediates to CALL input MOVs
On Sat, Aug 12, 2017 at 3:33 PM, Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> wrote:> On using builtin functions we have to move the input to registers $0 and $1, if > one of the input value is an immediate, we fail to propagate the immediate: > > ... > mov u32 $r477 0x00000003 (0) > ... > mov u32 $r0 %r473 (0) > mov u32 $r1 $r477 (0) > call abs BUILTIN:0 (0) > mov u32 %r495 $r1 (0) > ... > > With this patch the immediate is propagated, potentially causing the first MOV > to be superfluous, which we'd remove in that case: > > ... > > mov u32 $r0 %r473 (0) > mov u32 $r1 0x00000003 (0) > call abs BUILTIN:0 (0) > mov u32 %r495 $r1 (0) > ... > > Shaderdb stats: > total instructions in shared programs : 4893460 -> 4893324 (-0.00%) > total gprs used in shared programs : 582972 -> 582881 (-0.02%) > total local used in shared programs : 17960 -> 17960 (0.00%) > > local gpr inst bytes > helped 0 91 112 112 > hurt 0 0 0 0 > > Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> > --- > .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 21 +++++++++++++++++++-- > 1 file changed, 19 insertions(+), 2 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > index c8f0701572..861d08af24 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > @@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i) > int builtin; > > bld.setPosition(i, false); > - bld.mkMovToReg(0, i->getSrc(0)); > - bld.mkMovToReg(1, i->getSrc(1)); > + > + // Generate movs to the input regs for the call we want to generate > + for (int s = 0; i->srcExists(s); ++s) { > + Instruction *ld = i->getSrc(s)->getInsn(); > + ImmediateValue imm; > + // check if we are moving an immediate, propagate it in that case > + if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) || > + !ld->src(0).getImmediate(imm))At this point you don't even have to use getImmediate - you can just look at ld->src(0).getFile() == FILE_IMMEDIATE. Normally you'd just do i->src(s).getImmediate(imm) and moved on with life. But you kinda need the ld here, which is annoying. Perhaps you can just drop the manual deletion of the op here... which would let you do the much simpler thing. Can you see if there's any effect from that?> + bld.mkMovToReg(s, i->getSrc(s)); > + else { > + bld.mkMovToReg(s, ld->getSrc(0)); > + // Clear the src, to make code elimination possible here before we > + // delete the instruction i later > + i->setSrc(s, NULL);i gets deleted later on. move the deletion of ld after that happens?> + if (ld->getDef(0)->refCount() == 0)ld->isDead()> + delete_Instruction(prog, ld); > + } > + } > + > switch (i->dType) { > case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break; > case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break; > -- > 2.14.0 > > _______________________________________________ > Nouveau mailing list > Nouveau at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau
Tobias Klausmann
2017-Aug-12 21:05 UTC
[Nouveau] [PATCH] nvc0/ir: propagate immediates to CALL input MOVs
On 8/12/17 10:20 PM, Ilia Mirkin wrote:> On Sat, Aug 12, 2017 at 3:33 PM, Tobias Klausmann > <tobias.johannes.klausmann at mni.thm.de> wrote: >> On using builtin functions we have to move the input to registers $0 and $1, if >> one of the input value is an immediate, we fail to propagate the immediate: >> >> ... >> mov u32 $r477 0x00000003 (0) >> ... >> mov u32 $r0 %r473 (0) >> mov u32 $r1 $r477 (0) >> call abs BUILTIN:0 (0) >> mov u32 %r495 $r1 (0) >> ... >> >> With this patch the immediate is propagated, potentially causing the first MOV >> to be superfluous, which we'd remove in that case: >> >> ... >> >> mov u32 $r0 %r473 (0) >> mov u32 $r1 0x00000003 (0) >> call abs BUILTIN:0 (0) >> mov u32 %r495 $r1 (0) >> ... >> >> Shaderdb stats: >> total instructions in shared programs : 4893460 -> 4893324 (-0.00%) >> total gprs used in shared programs : 582972 -> 582881 (-0.02%) >> total local used in shared programs : 17960 -> 17960 (0.00%) >> >> local gpr inst bytes >> helped 0 91 112 112 >> hurt 0 0 0 0 >> >> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> >> --- >> .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 21 +++++++++++++++++++-- >> 1 file changed, 19 insertions(+), 2 deletions(-) >> >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp >> index c8f0701572..861d08af24 100644 >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp >> @@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i) >> int builtin; >> >> bld.setPosition(i, false); >> - bld.mkMovToReg(0, i->getSrc(0)); >> - bld.mkMovToReg(1, i->getSrc(1)); >> + >> + // Generate movs to the input regs for the call we want to generate >> + for (int s = 0; i->srcExists(s); ++s) { >> + Instruction *ld = i->getSrc(s)->getInsn(); >> + ImmediateValue imm; >> + // check if we are moving an immediate, propagate it in that case >> + if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) || >> + !ld->src(0).getImmediate(imm)) > At this point you don't even have to use getImmediate - you can just > look at ld->src(0).getFile() == FILE_IMMEDIATE.That was actually the fallback to the before non-working getImmediate() and is easily doable if favored.> > Normally you'd just do i->src(s).getImmediate(imm) and moved on with > life. But you kinda need the ld here, which is annoying. Perhaps you > can just drop the manual deletion of the op here... which would let > you do the much simpler thing. Can you see if there's any effect from > that?Will do!>> + bld.mkMovToReg(s, i->getSrc(s)); >> + else { >> + bld.mkMovToReg(s, ld->getSrc(0)); >> + // Clear the src, to make code elimination possible here before we >> + // delete the instruction i later >> + i->setSrc(s, NULL); > i gets deleted later on. move the deletion of ld after that happens?this would cause more indirection (saving of the insn for later), not sure if that makes the code more readable or if this clear is more straight forward. As you see i'd go with the clear, but if you really want i can add the extra save and delete.> >> + if (ld->getDef(0)->refCount() == 0) > ld->isDead()ok> >> + delete_Instruction(prog, ld); >> + } >> + } >> + >> switch (i->dType) { >> case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break; >> case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break; >> -- >> 2.14.0 >> >> _______________________________________________ >> Nouveau mailing list >> Nouveau at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/nouveau
Tobias Klausmann
2017-Aug-13 01:02 UTC
[Nouveau] [PATCH v2] nvc0/ir: propagate immediates to CALL input MOVs
On using builtin functions we have to move the input to registers $0 and $1, if one of the input value is an immediate, we fail to propagate the immediate: ... mov u32 $r477 0x00000003 (0) ... mov u32 $r0 %r473 (0) mov u32 $r1 $r477 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... With this patch the immediate is propagated, potentially causing the first MOV to be superfluous, which we'd remove in that case: ... mov u32 $r0 %r473 (0) mov u32 $r1 0x00000003 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... Shaderdb stats: total instructions in shared programs : 4893460 -> 4893324 (-0.00%) total gprs used in shared programs : 582972 -> 582881 (-0.02%) total local used in shared programs : 17960 -> 17960 (0.00%) local gpr inst bytes helped 0 91 112 112 hurt 0 0 0 0 v2: implement some changes proposed by imirkin, the manual deletion of the dead mov is necessary after ea22ac23e0 ("nvc0/ir: unlink values pre- and post-call to division function") as the potentially dead mov is unlinked properly, causing later passes to not notice the mov op at all and thus not cleaning it up. That makes up a big chunk of the regression the above commit caused. Keep the deletion of the op where it is, deleting it later unnecessarily blows up size of the change. Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index c8f0701572..7243b1d2e4 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp @@ -47,8 +47,25 @@ NVC0LegalizeSSA::handleDIV(Instruction *i) int builtin; bld.setPosition(i, false); - bld.mkMovToReg(0, i->getSrc(0)); - bld.mkMovToReg(1, i->getSrc(1)); + + // Generate movs to the input regs for the call we want to generate + for (int s = 0; i->srcExists(s); ++s) { + Instruction *ld = i->getSrc(s)->getInsn(); + assert(ld->getSrc(0) != NULL); + // check if we are moving an immediate, propagate it in that case + if (!ld || ld->fixed || (ld->op != OP_LOAD && ld->op != OP_MOV) || + !(ld->src(0).getFile() == FILE_IMMEDIATE)) + bld.mkMovToReg(s, i->getSrc(s)); + else { + bld.mkMovToReg(s, ld->getSrc(0)); + // Clear the src, to make code elimination possible here before we + // delete the instruction i later + i->setSrc(s, NULL); + if (ld->isDead()) + delete_Instruction(prog, ld); + } + } + switch (i->dType) { case TYPE_U32: builtin = NVC0_BUILTIN_DIV_U32; break; case TYPE_S32: builtin = NVC0_BUILTIN_DIV_S32; break; -- 2.14.0
Apparently Analagous Threads
- [PATCH] nvc0/ir: propagate immediates to CALL input MOVs
- [PATCH] nvc0/ir: propagate immediates to CALL input MOVs
- [PATCH 1/3] nvc0/ir: add base tex offset for fermi indirect tex case
- [PATCH mesa v2 1/2] nouveau: codegen: Use FILE_MEMORY_BUFFER for buffers
- [PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax