Tobias Klausmann
2016-Oct-02 18:24 UTC
[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On 02.10.2016 20:03, Ilia Mirkin wrote:> On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann > <tobias.johannes.klausmann at mni.thm.de> wrote: >> Previously we'd end up with an unnecessary mov for the thirs immediate value. >> >> total instructions in shared programs : 851881 -> 851864 (-0.00%) >> total gprs used in shared programs : 110295 -> 110295 (0.00%) >> total local used in shared programs : 1020 -> 1020 (0.00%) >> >> local gpr inst bytes >> helped 0 0 17 17 >> hurt 0 0 0 0 >> >> Suggested-by: Karol Herbst <nouveau at karolherbst.de> >> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> >> --- >> src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 ++++++++++++--- >> 1 file changed, 12 insertions(+), 3 deletions(-) >> >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >> index 9875738..8bb5cf9 100644 >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >> @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s) >> break; >> case OP_MAD: >> if (imm0.isInteger(0)) { >> + ImmediateValue imm1; >> i->setSrc(0, i->getSrc(2)); >> i->src(0).mod = i->src(2).mod; >> i->setSrc(1, NULL); >> i->setSrc(2, NULL); >> - i->op = i->src(0).mod.getOp(); >> - if (i->op != OP_CVT) >> - i->src(0).mod = 0; >> + if (i->src(0).getImmediate(imm1)) { >> + bld.setPosition(i, false); >> + newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64), >> + i->dType); >> + delete_Instruction(prog, i); > What's an example of a situation where this helps? It shouldn't > matter, the mov's should get cleaned up. [Clearly 17 shaders > disagree...] Is this just a side-effect of the fact that we don't run > the opts to a fixed point?It is a second mov that causes a problem for later folding in the imm, here output of a testshader[1]: 0: nop u32 %r56 (0) 1: ld u32 %r31 c0[0x0] (0) 2: ld u32 %r37 c0[0x140] (0) 3: mov u32 %r38 0x00000000 (0) 4: mov u32 %r39 0x3f800000 (0) 5: mad f32 %r40 %r37 %r38 %r39 (0) 6: mad f32 %r44 %r37 %r38 %r38 (0) 7: add f32 %r53 %r31 %r40 (0) 8: add f32 %r54 %r31 %r44 (0) 9: add f32 %r57 %r56 %r44 (0) Constantfolding... MAIN:-1 () BB:0 (14 instructions) - df = { } -> BB:1 (tree) 0: nop u32 %r56 (0) 1: ld u32 %r31 c0[0x0] (0) 2: ld u32 %r37 c0[0x140] (0) 3: mov u32 %r38 0x00000000 (0) 4: mov u32 %r39 0x3f800000 (0) 5: mov f32 %r40 %r39 (0) 6: mov f32 %r44 %r38 (0) 7: add f32 %r53 %r31 %r40 (0) 8: mov f32 %r54 %r31 (0) 9: mov f32 %r57 %r56 (0) The outcome: 0: ld u32 $r2 c0[0x0] (8) 1: mov u32 $r0 0x3f800000 (8) 2: add ftz f32 $r0 $r2 $r0 (8) 3: mov f32 $r3 $r1 (8) 4: mov u32 $r1 $r2 (8) 5: export b128 # o[0x0] $r0q (8) With patch: 0: ld u32 $r2 c0[0x0] (8) 1: add ftz f32 $r0 $r2 1.000000 (8) 2: mov f32 $r3 $r1 (8) 3: mov u32 $r1 $r2 (8) 4: export b128 # o[0x0] $r0q (8) [1]: VERT PROPERTY NEXT_SHADER FRAG DCL OUT[0], GENERIC[0] DCL CONST[0] DCL TEMP[0..1], LOCAL IMM[0] FLT32 { 0.0078, -1.0000, 0.0000, 0.5000} IMM[1] FLT32 { 1.0000, 0.0000, 65535.0000, 0.0100} 0: MOV TEMP[0].xyz, CONST[0].xxxx 39: MAD TEMP[1], CONST[20].xxxx, IMM[1].yyyy, IMM[1].xyyy 41: ADD TEMP[1], TEMP[0], TEMP[1] 208: MOV OUT[0], TEMP[1] 211: END> >> + } >> + else { >> + i->op = i->src(0).mod.getOp(); >> + if (i->op != OP_CVT) >> + i->src(0).mod = 0; >> + } >> } else >> if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && >> (imm0.isInteger(1) || imm0.isInteger(-1))) { >> -- >> 2.10.0 >> >> _______________________________________________ >> Nouveau mailing list >> Nouveau at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/nouveau
Ilia Mirkin
2016-Oct-02 18:26 UTC
[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
That's very odd. LoadPropagation should have picked that up even in its current form. Should try to figure out why it didn't and that is likely to "fix" a *lot* more situations. On Sun, Oct 2, 2016 at 2:24 PM, Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> wrote:> > > On 02.10.2016 20:03, Ilia Mirkin wrote: >> >> On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann >> <tobias.johannes.klausmann at mni.thm.de> wrote: >>> >>> Previously we'd end up with an unnecessary mov for the thirs immediate >>> value. >>> >>> total instructions in shared programs : 851881 -> 851864 (-0.00%) >>> total gprs used in shared programs : 110295 -> 110295 (0.00%) >>> total local used in shared programs : 1020 -> 1020 (0.00%) >>> >>> local gpr inst bytes >>> helped 0 0 17 17 >>> hurt 0 0 0 0 >>> >>> Suggested-by: Karol Herbst <nouveau at karolherbst.de> >>> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> >>> --- >>> src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 >>> ++++++++++++--- >>> 1 file changed, 12 insertions(+), 3 deletions(-) >>> >>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>> index 9875738..8bb5cf9 100644 >>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>> @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, >>> ImmediateValue &imm0, int s) >>> break; >>> case OP_MAD: >>> if (imm0.isInteger(0)) { >>> + ImmediateValue imm1; >>> i->setSrc(0, i->getSrc(2)); >>> i->src(0).mod = i->src(2).mod; >>> i->setSrc(1, NULL); >>> i->setSrc(2, NULL); >>> - i->op = i->src(0).mod.getOp(); >>> - if (i->op != OP_CVT) >>> - i->src(0).mod = 0; >>> + if (i->src(0).getImmediate(imm1)) { >>> + bld.setPosition(i, false); >>> + newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64), >>> + i->dType); >>> + delete_Instruction(prog, i); >> >> What's an example of a situation where this helps? It shouldn't >> matter, the mov's should get cleaned up. [Clearly 17 shaders >> disagree...] Is this just a side-effect of the fact that we don't run >> the opts to a fixed point? > > > It is a second mov that causes a problem for later folding in the imm, here > output of a testshader[1]: > > 0: nop u32 %r56 (0) > 1: ld u32 %r31 c0[0x0] (0) > 2: ld u32 %r37 c0[0x140] (0) > 3: mov u32 %r38 0x00000000 (0) > 4: mov u32 %r39 0x3f800000 (0) > 5: mad f32 %r40 %r37 %r38 %r39 (0) > 6: mad f32 %r44 %r37 %r38 %r38 (0) > 7: add f32 %r53 %r31 %r40 (0) > 8: add f32 %r54 %r31 %r44 (0) > 9: add f32 %r57 %r56 %r44 (0) > > Constantfolding... > > MAIN:-1 () > BB:0 (14 instructions) - df = { } > -> BB:1 (tree) > 0: nop u32 %r56 (0) > 1: ld u32 %r31 c0[0x0] (0) > 2: ld u32 %r37 c0[0x140] (0) > 3: mov u32 %r38 0x00000000 (0) > 4: mov u32 %r39 0x3f800000 (0) > 5: mov f32 %r40 %r39 (0) > 6: mov f32 %r44 %r38 (0) > 7: add f32 %r53 %r31 %r40 (0) > 8: mov f32 %r54 %r31 (0) > 9: mov f32 %r57 %r56 (0) > > > The outcome: > 0: ld u32 $r2 c0[0x0] (8) > 1: mov u32 $r0 0x3f800000 (8) > 2: add ftz f32 $r0 $r2 $r0 (8) > 3: mov f32 $r3 $r1 (8) > 4: mov u32 $r1 $r2 (8) > 5: export b128 # o[0x0] $r0q (8) > > With patch: > 0: ld u32 $r2 c0[0x0] (8) > 1: add ftz f32 $r0 $r2 1.000000 (8) > 2: mov f32 $r3 $r1 (8) > 3: mov u32 $r1 $r2 (8) > 4: export b128 # o[0x0] $r0q (8) > > > [1]: > VERT > PROPERTY NEXT_SHADER FRAG > DCL OUT[0], GENERIC[0] > DCL CONST[0] > DCL TEMP[0..1], LOCAL > IMM[0] FLT32 { 0.0078, -1.0000, 0.0000, 0.5000} > IMM[1] FLT32 { 1.0000, 0.0000, 65535.0000, 0.0100} > 0: MOV TEMP[0].xyz, CONST[0].xxxx > 39: MAD TEMP[1], CONST[20].xxxx, IMM[1].yyyy, IMM[1].xyyy > 41: ADD TEMP[1], TEMP[0], TEMP[1] > 208: MOV OUT[0], TEMP[1] > 211: END > > > > >> >>> + } >>> + else { >>> + i->op = i->src(0).mod.getOp(); >>> + if (i->op != OP_CVT) >>> + i->src(0).mod = 0; >>> + } >>> } else >>> if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && >>> (imm0.isInteger(1) || imm0.isInteger(-1))) { >>> -- >>> 2.10.0 >>> >>> _______________________________________________ >>> Nouveau mailing list >>> Nouveau at lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/nouveau > >
Tobias Klausmann
2016-Oct-02 18:43 UTC
[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On 02.10.2016 20:26, Ilia Mirkin wrote:> That's very odd. LoadPropagation should have picked that up even in > its current form. Should try to figure out why it didn't and that is > likely to "fix" a *lot* more situations.Actually i was coming from an, given really constrained, addition to the LoadPropagation pass, where i was told to fix it within OP_MAD :/> On Sun, Oct 2, 2016 at 2:24 PM, Tobias Klausmann > <tobias.johannes.klausmann at mni.thm.de> wrote: >> >> On 02.10.2016 20:03, Ilia Mirkin wrote: >>> On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann >>> <tobias.johannes.klausmann at mni.thm.de> wrote: >>>> Previously we'd end up with an unnecessary mov for the thirs immediate >>>> value. >>>> >>>> total instructions in shared programs : 851881 -> 851864 (-0.00%) >>>> total gprs used in shared programs : 110295 -> 110295 (0.00%) >>>> total local used in shared programs : 1020 -> 1020 (0.00%) >>>> >>>> local gpr inst bytes >>>> helped 0 0 17 17 >>>> hurt 0 0 0 0 >>>> >>>> Suggested-by: Karol Herbst <nouveau at karolherbst.de> >>>> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> >>>> --- >>>> src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 >>>> ++++++++++++--- >>>> 1 file changed, 12 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>> index 9875738..8bb5cf9 100644 >>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>> @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, >>>> ImmediateValue &imm0, int s) >>>> break; >>>> case OP_MAD: >>>> if (imm0.isInteger(0)) { >>>> + ImmediateValue imm1; >>>> i->setSrc(0, i->getSrc(2)); >>>> i->src(0).mod = i->src(2).mod; >>>> i->setSrc(1, NULL); >>>> i->setSrc(2, NULL); >>>> - i->op = i->src(0).mod.getOp(); >>>> - if (i->op != OP_CVT) >>>> - i->src(0).mod = 0; >>>> + if (i->src(0).getImmediate(imm1)) { >>>> + bld.setPosition(i, false); >>>> + newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64), >>>> + i->dType); >>>> + delete_Instruction(prog, i); >>> What's an example of a situation where this helps? It shouldn't >>> matter, the mov's should get cleaned up. [Clearly 17 shaders >>> disagree...] Is this just a side-effect of the fact that we don't run >>> the opts to a fixed point? >> >> It is a second mov that causes a problem for later folding in the imm, here >> output of a testshader[1]: >> >> 0: nop u32 %r56 (0) >> 1: ld u32 %r31 c0[0x0] (0) >> 2: ld u32 %r37 c0[0x140] (0) >> 3: mov u32 %r38 0x00000000 (0) >> 4: mov u32 %r39 0x3f800000 (0) >> 5: mad f32 %r40 %r37 %r38 %r39 (0) >> 6: mad f32 %r44 %r37 %r38 %r38 (0) >> 7: add f32 %r53 %r31 %r40 (0) >> 8: add f32 %r54 %r31 %r44 (0) >> 9: add f32 %r57 %r56 %r44 (0) >> >> Constantfolding... >> >> MAIN:-1 () >> BB:0 (14 instructions) - df = { } >> -> BB:1 (tree) >> 0: nop u32 %r56 (0) >> 1: ld u32 %r31 c0[0x0] (0) >> 2: ld u32 %r37 c0[0x140] (0) >> 3: mov u32 %r38 0x00000000 (0) >> 4: mov u32 %r39 0x3f800000 (0) >> 5: mov f32 %r40 %r39 (0) >> 6: mov f32 %r44 %r38 (0) >> 7: add f32 %r53 %r31 %r40 (0) >> 8: mov f32 %r54 %r31 (0) >> 9: mov f32 %r57 %r56 (0) >> >> >> The outcome: >> 0: ld u32 $r2 c0[0x0] (8) >> 1: mov u32 $r0 0x3f800000 (8) >> 2: add ftz f32 $r0 $r2 $r0 (8) >> 3: mov f32 $r3 $r1 (8) >> 4: mov u32 $r1 $r2 (8) >> 5: export b128 # o[0x0] $r0q (8) >> >> With patch: >> 0: ld u32 $r2 c0[0x0] (8) >> 1: add ftz f32 $r0 $r2 1.000000 (8) >> 2: mov f32 $r3 $r1 (8) >> 3: mov u32 $r1 $r2 (8) >> 4: export b128 # o[0x0] $r0q (8) >> >> >> [1]: >> VERT >> PROPERTY NEXT_SHADER FRAG >> DCL OUT[0], GENERIC[0] >> DCL CONST[0] >> DCL TEMP[0..1], LOCAL >> IMM[0] FLT32 { 0.0078, -1.0000, 0.0000, 0.5000} >> IMM[1] FLT32 { 1.0000, 0.0000, 65535.0000, 0.0100} >> 0: MOV TEMP[0].xyz, CONST[0].xxxx >> 39: MAD TEMP[1], CONST[20].xxxx, IMM[1].yyyy, IMM[1].xyyy >> 41: ADD TEMP[1], TEMP[0], TEMP[1] >> 208: MOV OUT[0], TEMP[1] >> 211: END >> >> >> >> >>>> + } >>>> + else { >>>> + i->op = i->src(0).mod.getOp(); >>>> + if (i->op != OP_CVT) >>>> + i->src(0).mod = 0; >>>> + } >>>> } else >>>> if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && >>>> (imm0.isInteger(1) || imm0.isInteger(-1))) { >>>> -- >>>> 2.10.0 >>>> >>>> _______________________________________________ >>>> Nouveau mailing list >>>> Nouveau at lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/nouveau >>
Apparently Analagous Threads
- [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
- [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
- [LLVMdev] Question on X86 backend
- [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
- New winetricks 20080522: added directx9, wmp10, mono19; lots of bug fixes, nicer gui