Tobias Klausmann
2016-Oct-02 17:58 UTC
[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
Previously we'd end up with an unnecessary mov for the thirs immediate value. total instructions in shared programs : 851881 -> 851864 (-0.00%) total gprs used in shared programs : 110295 -> 110295 (0.00%) total local used in shared programs : 1020 -> 1020 (0.00%) local gpr inst bytes helped 0 0 17 17 hurt 0 0 0 0 Suggested-by: Karol Herbst <nouveau at karolherbst.de> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 9875738..8bb5cf9 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s) break; case OP_MAD: if (imm0.isInteger(0)) { + ImmediateValue imm1; i->setSrc(0, i->getSrc(2)); i->src(0).mod = i->src(2).mod; i->setSrc(1, NULL); i->setSrc(2, NULL); - i->op = i->src(0).mod.getOp(); - if (i->op != OP_CVT) - i->src(0).mod = 0; + if (i->src(0).getImmediate(imm1)) { + bld.setPosition(i, false); + newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64), + i->dType); + delete_Instruction(prog, i); + } + else { + i->op = i->src(0).mod.getOp(); + if (i->op != OP_CVT) + i->src(0).mod = 0; + } } else if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && (imm0.isInteger(1) || imm0.isInteger(-1))) { -- 2.10.0
Ilia Mirkin
2016-Oct-02 18:03 UTC
[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> wrote:> Previously we'd end up with an unnecessary mov for the thirs immediate value. > > total instructions in shared programs : 851881 -> 851864 (-0.00%) > total gprs used in shared programs : 110295 -> 110295 (0.00%) > total local used in shared programs : 1020 -> 1020 (0.00%) > > local gpr inst bytes > helped 0 0 17 17 > hurt 0 0 0 0 > > Suggested-by: Karol Herbst <nouveau at karolherbst.de> > Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> > --- > src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > index 9875738..8bb5cf9 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s) > break; > case OP_MAD: > if (imm0.isInteger(0)) { > + ImmediateValue imm1; > i->setSrc(0, i->getSrc(2)); > i->src(0).mod = i->src(2).mod; > i->setSrc(1, NULL); > i->setSrc(2, NULL); > - i->op = i->src(0).mod.getOp(); > - if (i->op != OP_CVT) > - i->src(0).mod = 0; > + if (i->src(0).getImmediate(imm1)) { > + bld.setPosition(i, false); > + newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64), > + i->dType); > + delete_Instruction(prog, i);What's an example of a situation where this helps? It shouldn't matter, the mov's should get cleaned up. [Clearly 17 shaders disagree...] Is this just a side-effect of the fact that we don't run the opts to a fixed point?> + } > + else { > + i->op = i->src(0).mod.getOp(); > + if (i->op != OP_CVT) > + i->src(0).mod = 0; > + } > } else > if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && > (imm0.isInteger(1) || imm0.isInteger(-1))) { > -- > 2.10.0 > > _______________________________________________ > Nouveau mailing list > Nouveau at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau
Tobias Klausmann
2016-Oct-02 18:24 UTC
[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On 02.10.2016 20:03, Ilia Mirkin wrote:> On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann > <tobias.johannes.klausmann at mni.thm.de> wrote: >> Previously we'd end up with an unnecessary mov for the thirs immediate value. >> >> total instructions in shared programs : 851881 -> 851864 (-0.00%) >> total gprs used in shared programs : 110295 -> 110295 (0.00%) >> total local used in shared programs : 1020 -> 1020 (0.00%) >> >> local gpr inst bytes >> helped 0 0 17 17 >> hurt 0 0 0 0 >> >> Suggested-by: Karol Herbst <nouveau at karolherbst.de> >> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> >> --- >> src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 ++++++++++++--- >> 1 file changed, 12 insertions(+), 3 deletions(-) >> >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >> index 9875738..8bb5cf9 100644 >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >> @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s) >> break; >> case OP_MAD: >> if (imm0.isInteger(0)) { >> + ImmediateValue imm1; >> i->setSrc(0, i->getSrc(2)); >> i->src(0).mod = i->src(2).mod; >> i->setSrc(1, NULL); >> i->setSrc(2, NULL); >> - i->op = i->src(0).mod.getOp(); >> - if (i->op != OP_CVT) >> - i->src(0).mod = 0; >> + if (i->src(0).getImmediate(imm1)) { >> + bld.setPosition(i, false); >> + newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64), >> + i->dType); >> + delete_Instruction(prog, i); > What's an example of a situation where this helps? It shouldn't > matter, the mov's should get cleaned up. [Clearly 17 shaders > disagree...] Is this just a side-effect of the fact that we don't run > the opts to a fixed point?It is a second mov that causes a problem for later folding in the imm, here output of a testshader[1]: 0: nop u32 %r56 (0) 1: ld u32 %r31 c0[0x0] (0) 2: ld u32 %r37 c0[0x140] (0) 3: mov u32 %r38 0x00000000 (0) 4: mov u32 %r39 0x3f800000 (0) 5: mad f32 %r40 %r37 %r38 %r39 (0) 6: mad f32 %r44 %r37 %r38 %r38 (0) 7: add f32 %r53 %r31 %r40 (0) 8: add f32 %r54 %r31 %r44 (0) 9: add f32 %r57 %r56 %r44 (0) Constantfolding... MAIN:-1 () BB:0 (14 instructions) - df = { } -> BB:1 (tree) 0: nop u32 %r56 (0) 1: ld u32 %r31 c0[0x0] (0) 2: ld u32 %r37 c0[0x140] (0) 3: mov u32 %r38 0x00000000 (0) 4: mov u32 %r39 0x3f800000 (0) 5: mov f32 %r40 %r39 (0) 6: mov f32 %r44 %r38 (0) 7: add f32 %r53 %r31 %r40 (0) 8: mov f32 %r54 %r31 (0) 9: mov f32 %r57 %r56 (0) The outcome: 0: ld u32 $r2 c0[0x0] (8) 1: mov u32 $r0 0x3f800000 (8) 2: add ftz f32 $r0 $r2 $r0 (8) 3: mov f32 $r3 $r1 (8) 4: mov u32 $r1 $r2 (8) 5: export b128 # o[0x0] $r0q (8) With patch: 0: ld u32 $r2 c0[0x0] (8) 1: add ftz f32 $r0 $r2 1.000000 (8) 2: mov f32 $r3 $r1 (8) 3: mov u32 $r1 $r2 (8) 4: export b128 # o[0x0] $r0q (8) [1]: VERT PROPERTY NEXT_SHADER FRAG DCL OUT[0], GENERIC[0] DCL CONST[0] DCL TEMP[0..1], LOCAL IMM[0] FLT32 { 0.0078, -1.0000, 0.0000, 0.5000} IMM[1] FLT32 { 1.0000, 0.0000, 65535.0000, 0.0100} 0: MOV TEMP[0].xyz, CONST[0].xxxx 39: MAD TEMP[1], CONST[20].xxxx, IMM[1].yyyy, IMM[1].xyyy 41: ADD TEMP[1], TEMP[0], TEMP[1] 208: MOV OUT[0], TEMP[1] 211: END> >> + } >> + else { >> + i->op = i->src(0).mod.getOp(); >> + if (i->op != OP_CVT) >> + i->src(0).mod = 0; >> + } >> } else >> if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && >> (imm0.isInteger(1) || imm0.isInteger(-1))) { >> -- >> 2.10.0 >> >> _______________________________________________ >> Nouveau mailing list >> Nouveau at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/nouveau
Possibly Parallel Threads
- [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
- [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
- [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
- [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
- [PATCH v2] nv50/ir: optimize shl(a, 0) to a