thr3ads.net - Nouveau - [Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP

If this information is useful, please help other people find it:
Share via:

Tobias Klausmann

2016-Oct-02 18:24 UTC

[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

On 02.10.2016 20:03, Ilia Mirkin wrote:> On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann
> <tobias.johannes.klausmann at mni.thm.de> wrote:
>> Previously we'd end up with an unnecessary mov for the thirs
immediate value.
>>
>> total instructions in shared programs : 851881 -> 851864 (-0.00%)
>> total gprs used in shared programs    : 110295 -> 110295 (0.00%)
>> total local used in shared programs   : 1020 -> 1020 (0.00%)
>>
>>                  local        gpr       inst      bytes
>>      helped           0           0          17          17
>>        hurt           0           0           0           0
>>
>> Suggested-by: Karol Herbst <nouveau at karolherbst.de>
>> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at
mni.thm.de>
>> ---
>>   src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15
++++++++++++---
>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>> index 9875738..8bb5cf9 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>> @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i,
ImmediateValue &imm0, int s)
>>         break;
>>      case OP_MAD:
>>         if (imm0.isInteger(0)) {
>> +         ImmediateValue imm1;
>>            i->setSrc(0, i->getSrc(2));
>>            i->src(0).mod = i->src(2).mod;
>>            i->setSrc(1, NULL);
>>            i->setSrc(2, NULL);
>> -         i->op = i->src(0).mod.getOp();
>> -         if (i->op != OP_CVT)
>> -            i->src(0).mod = 0;
>> +         if (i->src(0).getImmediate(imm1)) {
>> +            bld.setPosition(i, false);
>> +            newi = bld.mkMov(i->getDef(0),
bld.mkImm(imm1.reg.data.u64),
>> +                             i->dType);
>> +            delete_Instruction(prog, i);
> What's an example of a situation where this helps? It shouldn't
> matter, the mov's should get cleaned up. [Clearly 17 shaders
> disagree...] Is this just a side-effect of the fact that we don't run
> the opts to a fixed point?
It is a second mov that causes a problem for later folding in the imm, 
here output of a testshader[1]:

   0: nop u32 %r56 (0)
   1: ld  u32 %r31 c0[0x0] (0)
   2: ld  u32 %r37 c0[0x140] (0)
   3: mov u32 %r38 0x00000000 (0)
   4: mov u32 %r39 0x3f800000 (0)
   5: mad f32 %r40 %r37 %r38 %r39 (0)
   6: mad f32 %r44 %r37 %r38 %r38 (0)
   7: add f32 %r53 %r31 %r40 (0)
   8: add f32 %r54 %r31 %r44 (0)
   9: add f32 %r57 %r56 %r44 (0)

Constantfolding...

MAIN:-1 ()
BB:0 (14 instructions) - df = { }
  -> BB:1 (tree)
   0: nop u32 %r56 (0)
   1: ld  u32 %r31 c0[0x0] (0)
   2: ld  u32 %r37 c0[0x140] (0)
   3: mov u32 %r38 0x00000000 (0)
   4: mov u32 %r39 0x3f800000 (0)
   5: mov f32 %r40 %r39 (0)
   6: mov f32 %r44 %r38 (0)
   7: add f32 %r53 %r31 %r40 (0)
   8: mov f32 %r54 %r31 (0)
   9: mov f32 %r57 %r56 (0)


The outcome:
   0: ld  u32 $r2 c0[0x0] (8)
   1: mov u32 $r0 0x3f800000 (8)
   2: add ftz f32 $r0 $r2 $r0 (8)
   3: mov f32 $r3 $r1 (8)
   4: mov u32 $r1 $r2 (8)
   5: export b128 # o[0x0] $r0q (8)

With patch:
   0: ld  u32 $r2 c0[0x0] (8)
   1: add ftz f32 $r0 $r2 1.000000 (8)
   2: mov f32 $r3 $r1 (8)
   3: mov u32 $r1 $r2 (8)
   4: export b128 # o[0x0] $r0q (8)


[1]:
VERT
PROPERTY NEXT_SHADER FRAG
DCL OUT[0], GENERIC[0]
DCL CONST[0]
DCL TEMP[0..1], LOCAL
IMM[0] FLT32 {    0.0078,    -1.0000,     0.0000,     0.5000}
IMM[1] FLT32 {    1.0000,     0.0000, 65535.0000,     0.0100}
   0: MOV TEMP[0].xyz, CONST[0].xxxx
  39: MAD TEMP[1], CONST[20].xxxx, IMM[1].yyyy, IMM[1].xyyy
  41: ADD TEMP[1], TEMP[0], TEMP[1]
208: MOV OUT[0], TEMP[1]
211: END


>
>> +         }
>> +         else {
>> +            i->op = i->src(0).mod.getOp();
>> +            if (i->op != OP_CVT)
>> +               i->src(0).mod = 0;
>> +         }
>>         } else
>>         if (i->subOp != NV50_IR_SUBOP_MUL_HIGH &&
>>             (imm0.isInteger(1) || imm0.isInteger(-1))) {
>> --
>> 2.10.0
>>
>> _______________________________________________
>> Nouveau mailing list
>> Nouveau at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/nouveau

Ilia Mirkin

2016-Oct-02 18:26 UTC

head link

[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

That's very odd. LoadPropagation should have picked that up even in
its current form. Should try to figure out why it didn't and that is
likely to "fix" a *lot* more situations.

On Sun, Oct 2, 2016 at 2:24 PM, Tobias Klausmann
<tobias.johannes.klausmann at mni.thm.de> wrote:>
>
> On 02.10.2016 20:03, Ilia Mirkin wrote:
>>
>> On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann
>> <tobias.johannes.klausmann at mni.thm.de> wrote:
>>>
>>> Previously we'd end up with an unnecessary mov for the thirs
immediate
>>> value.
>>>
>>> total instructions in shared programs : 851881 -> 851864
(-0.00%)
>>> total gprs used in shared programs    : 110295 -> 110295 (0.00%)
>>> total local used in shared programs   : 1020 -> 1020 (0.00%)
>>>
>>>                  local        gpr       inst      bytes
>>>      helped           0           0          17          17
>>>        hurt           0           0           0           0
>>>
>>> Suggested-by: Karol Herbst <nouveau at karolherbst.de>
>>> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at
mni.thm.de>
>>> ---
>>>   src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15
>>> ++++++++++++---
>>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>>
>>> diff --git
a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> index 9875738..8bb5cf9 100644
>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i,
>>> ImmediateValue &imm0, int s)
>>>         break;
>>>      case OP_MAD:
>>>         if (imm0.isInteger(0)) {
>>> +         ImmediateValue imm1;
>>>            i->setSrc(0, i->getSrc(2));
>>>            i->src(0).mod = i->src(2).mod;
>>>            i->setSrc(1, NULL);
>>>            i->setSrc(2, NULL);
>>> -         i->op = i->src(0).mod.getOp();
>>> -         if (i->op != OP_CVT)
>>> -            i->src(0).mod = 0;
>>> +         if (i->src(0).getImmediate(imm1)) {
>>> +            bld.setPosition(i, false);
>>> +            newi = bld.mkMov(i->getDef(0),
bld.mkImm(imm1.reg.data.u64),
>>> +                             i->dType);
>>> +            delete_Instruction(prog, i);
>>
>> What's an example of a situation where this helps? It shouldn't
>> matter, the mov's should get cleaned up. [Clearly 17 shaders
>> disagree...] Is this just a side-effect of the fact that we don't
run
>> the opts to a fixed point?
>
>
> It is a second mov that causes a problem for later folding in the imm, here
> output of a testshader[1]:
>
>   0: nop u32 %r56 (0)
>   1: ld  u32 %r31 c0[0x0] (0)
>   2: ld  u32 %r37 c0[0x140] (0)
>   3: mov u32 %r38 0x00000000 (0)
>   4: mov u32 %r39 0x3f800000 (0)
>   5: mad f32 %r40 %r37 %r38 %r39 (0)
>   6: mad f32 %r44 %r37 %r38 %r38 (0)
>   7: add f32 %r53 %r31 %r40 (0)
>   8: add f32 %r54 %r31 %r44 (0)
>   9: add f32 %r57 %r56 %r44 (0)
>
> Constantfolding...
>
> MAIN:-1 ()
> BB:0 (14 instructions) - df = { }
>  -> BB:1 (tree)
>   0: nop u32 %r56 (0)
>   1: ld  u32 %r31 c0[0x0] (0)
>   2: ld  u32 %r37 c0[0x140] (0)
>   3: mov u32 %r38 0x00000000 (0)
>   4: mov u32 %r39 0x3f800000 (0)
>   5: mov f32 %r40 %r39 (0)
>   6: mov f32 %r44 %r38 (0)
>   7: add f32 %r53 %r31 %r40 (0)
>   8: mov f32 %r54 %r31 (0)
>   9: mov f32 %r57 %r56 (0)
>
>
> The outcome:
>   0: ld  u32 $r2 c0[0x0] (8)
>   1: mov u32 $r0 0x3f800000 (8)
>   2: add ftz f32 $r0 $r2 $r0 (8)
>   3: mov f32 $r3 $r1 (8)
>   4: mov u32 $r1 $r2 (8)
>   5: export b128 # o[0x0] $r0q (8)
>
> With patch:
>   0: ld  u32 $r2 c0[0x0] (8)
>   1: add ftz f32 $r0 $r2 1.000000 (8)
>   2: mov f32 $r3 $r1 (8)
>   3: mov u32 $r1 $r2 (8)
>   4: export b128 # o[0x0] $r0q (8)
>
>
> [1]:
> VERT
> PROPERTY NEXT_SHADER FRAG
> DCL OUT[0], GENERIC[0]
> DCL CONST[0]
> DCL TEMP[0..1], LOCAL
> IMM[0] FLT32 {    0.0078,    -1.0000,     0.0000,     0.5000}
> IMM[1] FLT32 {    1.0000,     0.0000, 65535.0000,     0.0100}
>   0: MOV TEMP[0].xyz, CONST[0].xxxx
>  39: MAD TEMP[1], CONST[20].xxxx, IMM[1].yyyy, IMM[1].xyyy
>  41: ADD TEMP[1], TEMP[0], TEMP[1]
> 208: MOV OUT[0], TEMP[1]
> 211: END
>
>
>
>
>>
>>> +         }
>>> +         else {
>>> +            i->op = i->src(0).mod.getOp();
>>> +            if (i->op != OP_CVT)
>>> +               i->src(0).mod = 0;
>>> +         }
>>>         } else
>>>         if (i->subOp != NV50_IR_SUBOP_MUL_HIGH &&
>>>             (imm0.isInteger(1) || imm0.isInteger(-1))) {
>>> --
>>> 2.10.0
>>>
>>> _______________________________________________
>>> Nouveau mailing list
>>> Nouveau at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/nouveau
>
>

Tobias Klausmann

2016-Oct-02 18:43 UTC

head link

[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

On 02.10.2016 20:26, Ilia Mirkin wrote:> That's very odd. LoadPropagation should have picked that up even in
> its current form. Should try to figure out why it didn't and that is
> likely to "fix" a *lot* more situations.
Actually i was coming from an, given really constrained, addition to the 
LoadPropagation pass, where i was told to fix it within OP_MAD :/
> On Sun, Oct 2, 2016 at 2:24 PM, Tobias Klausmann
> <tobias.johannes.klausmann at mni.thm.de> wrote:
>>
>> On 02.10.2016 20:03, Ilia Mirkin wrote:
>>> On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann
>>> <tobias.johannes.klausmann at mni.thm.de> wrote:
>>>> Previously we'd end up with an unnecessary mov for the
thirs immediate
>>>> value.
>>>>
>>>> total instructions in shared programs : 851881 -> 851864
(-0.00%)
>>>> total gprs used in shared programs    : 110295 -> 110295
(0.00%)
>>>> total local used in shared programs   : 1020 -> 1020 (0.00%)
>>>>
>>>>                   local        gpr       inst      bytes
>>>>       helped           0           0          17          17
>>>>         hurt           0           0           0           0
>>>>
>>>> Suggested-by: Karol Herbst <nouveau at karolherbst.de>
>>>> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann
at mni.thm.de>
>>>> ---
>>>>    src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp |
15
>>>> ++++++++++++---
>>>>    1 file changed, 12 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git
a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>>> index 9875738..8bb5cf9 100644
>>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>>> @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i,
>>>> ImmediateValue &imm0, int s)
>>>>          break;
>>>>       case OP_MAD:
>>>>          if (imm0.isInteger(0)) {
>>>> +         ImmediateValue imm1;
>>>>             i->setSrc(0, i->getSrc(2));
>>>>             i->src(0).mod = i->src(2).mod;
>>>>             i->setSrc(1, NULL);
>>>>             i->setSrc(2, NULL);
>>>> -         i->op = i->src(0).mod.getOp();
>>>> -         if (i->op != OP_CVT)
>>>> -            i->src(0).mod = 0;
>>>> +         if (i->src(0).getImmediate(imm1)) {
>>>> +            bld.setPosition(i, false);
>>>> +            newi = bld.mkMov(i->getDef(0),
bld.mkImm(imm1.reg.data.u64),
>>>> +                             i->dType);
>>>> +            delete_Instruction(prog, i);
>>> What's an example of a situation where this helps? It
shouldn't
>>> matter, the mov's should get cleaned up. [Clearly 17 shaders
>>> disagree...] Is this just a side-effect of the fact that we
don't run
>>> the opts to a fixed point?
>>
>> It is a second mov that causes a problem for later folding in the imm,
here
>> output of a testshader[1]:
>>
>>    0: nop u32 %r56 (0)
>>    1: ld  u32 %r31 c0[0x0] (0)
>>    2: ld  u32 %r37 c0[0x140] (0)
>>    3: mov u32 %r38 0x00000000 (0)
>>    4: mov u32 %r39 0x3f800000 (0)
>>    5: mad f32 %r40 %r37 %r38 %r39 (0)
>>    6: mad f32 %r44 %r37 %r38 %r38 (0)
>>    7: add f32 %r53 %r31 %r40 (0)
>>    8: add f32 %r54 %r31 %r44 (0)
>>    9: add f32 %r57 %r56 %r44 (0)
>>
>> Constantfolding...
>>
>> MAIN:-1 ()
>> BB:0 (14 instructions) - df = { }
>>   -> BB:1 (tree)
>>    0: nop u32 %r56 (0)
>>    1: ld  u32 %r31 c0[0x0] (0)
>>    2: ld  u32 %r37 c0[0x140] (0)
>>    3: mov u32 %r38 0x00000000 (0)
>>    4: mov u32 %r39 0x3f800000 (0)
>>    5: mov f32 %r40 %r39 (0)
>>    6: mov f32 %r44 %r38 (0)
>>    7: add f32 %r53 %r31 %r40 (0)
>>    8: mov f32 %r54 %r31 (0)
>>    9: mov f32 %r57 %r56 (0)
>>
>>
>> The outcome:
>>    0: ld  u32 $r2 c0[0x0] (8)
>>    1: mov u32 $r0 0x3f800000 (8)
>>    2: add ftz f32 $r0 $r2 $r0 (8)
>>    3: mov f32 $r3 $r1 (8)
>>    4: mov u32 $r1 $r2 (8)
>>    5: export b128 # o[0x0] $r0q (8)
>>
>> With patch:
>>    0: ld  u32 $r2 c0[0x0] (8)
>>    1: add ftz f32 $r0 $r2 1.000000 (8)
>>    2: mov f32 $r3 $r1 (8)
>>    3: mov u32 $r1 $r2 (8)
>>    4: export b128 # o[0x0] $r0q (8)
>>
>>
>> [1]:
>> VERT
>> PROPERTY NEXT_SHADER FRAG
>> DCL OUT[0], GENERIC[0]
>> DCL CONST[0]
>> DCL TEMP[0..1], LOCAL
>> IMM[0] FLT32 {    0.0078,    -1.0000,     0.0000,     0.5000}
>> IMM[1] FLT32 {    1.0000,     0.0000, 65535.0000,     0.0100}
>>    0: MOV TEMP[0].xyz, CONST[0].xxxx
>>   39: MAD TEMP[1], CONST[20].xxxx, IMM[1].yyyy, IMM[1].xyyy
>>   41: ADD TEMP[1], TEMP[0], TEMP[1]
>> 208: MOV OUT[0], TEMP[1]
>> 211: END
>>
>>
>>
>>
>>>> +         }
>>>> +         else {
>>>> +            i->op = i->src(0).mod.getOp();
>>>> +            if (i->op != OP_CVT)
>>>> +               i->src(0).mod = 0;
>>>> +         }
>>>>          } else
>>>>          if (i->subOp != NV50_IR_SUBOP_MUL_HIGH &&
>>>>              (imm0.isInteger(1) || imm0.isInteger(-1))) {
>>>> --
>>>> 2.10.0
>>>>
>>>> _______________________________________________
>>>> Nouveau mailing list
>>>> Nouveau at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/nouveau
>>

Reasonably Related Threads

Search for more apparently analagous threads

Nouveau - Oct 2016 - [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

Reasonably Related Threads