thr3ads.net - Nouveau - [Nouveau] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Ilia Mirkin

2015-Aug-19 01:49 UTC

[Nouveau] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

Some shaders appear to extract bits using shift/and combos. Detect
(some) of those and convert to EXTBF instead.

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 66 +++++++++++++++-------
 1 file changed, 46 insertions(+), 20 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 3841c33..b0e74f0 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1023,27 +1023,53 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue
&imm0, int s)
 
    case OP_AND:
    {
-      CmpInstruction *cmp = i->getSrc(t)->getInsn()->asCmp();
-      if (!cmp || cmp->op == OP_SLCT || cmp->getDef(0)->refCount()
> 1)
-         return;
-      if (!prog->getTarget()->isOpSupported(cmp->op, TYPE_F32))
-         return;
-      if (imm0.reg.data.f32 != 1.0)
-         return;
-      if (i->getSrc(t)->getInsn()->dType != TYPE_U32)
-         return;
+      Instruction *src = i->getSrc(t)->getInsn();
+      ImmediateValue imm1;
+      if (imm0.reg.data.u32 == 0) {
+         i->op = OP_MOV;
+         i->setSrc(0, new_ImmediateValue(prog, 0u));
+         i->src(0).mod = Modifier(0);
+         i->setSrc(1, NULL);
+      } else if (imm0.reg.data.u32 == ~0U) {
+         i->op = i->src(t).mod.getOp();
+         if (t) {
+            i->setSrc(0, i->getSrc(t));
+            i->src(0).mod = i->src(t).mod;
+         }
+         i->setSrc(1, NULL);
+      } else if (src->asCmp()) {
+         CmpInstruction *cmp = src->asCmp();
+         if (!cmp || cmp->op == OP_SLCT || cmp->getDef(0)->refCount()
> 1)
+            return;
+         if (!prog->getTarget()->isOpSupported(cmp->op, TYPE_F32))
+            return;
+         if (imm0.reg.data.f32 != 1.0)
+            return;
+         if (cmp->dType != TYPE_U32)
+            return;
 
-      i->getSrc(t)->getInsn()->dType = TYPE_F32;
-      if (i->src(t).mod != Modifier(0)) {
-         assert(i->src(t).mod == Modifier(NV50_IR_MOD_NOT));
-         i->src(t).mod = Modifier(0);
-         cmp->setCond = inverseCondCode(cmp->setCond);
-      }
-      i->op = OP_MOV;
-      i->setSrc(s, NULL);
-      if (t) {
-         i->setSrc(0, i->getSrc(t));
-         i->setSrc(t, NULL);
+         cmp->dType = TYPE_F32;
+         if (i->src(t).mod != Modifier(0)) {
+            assert(i->src(t).mod == Modifier(NV50_IR_MOD_NOT));
+            i->src(t).mod = Modifier(0);
+            cmp->setCond = inverseCondCode(cmp->setCond);
+         }
+         i->op = OP_MOV;
+         i->setSrc(s, NULL);
+         if (t) {
+            i->setSrc(0, i->getSrc(t));
+            i->setSrc(t, NULL);
+         }
+      } else if (prog->getTarget()->isOpSupported(OP_EXTBF, TYPE_U32)
&&
+                 src->op == OP_SHR &&
+                 src->src(1).getImmediate(imm1) &&
+                 i->src(t).mod == Modifier(0) &&
+                 util_is_power_of_two(imm0.reg.data.u32 + 1)) {
+         // low byte = offset, high byte = width
+         uint32_t ext = (util_last_bit(imm0.reg.data.u32) << 8) |
imm1.reg.data.u32;
+         i->op = OP_EXTBF;
+         i->setSrc(0, src->getSrc(0));
+         i->setSrc(1, new_ImmediateValue(prog, ext));
       }
    }
       break;
-- 
2.4.6

Ilia Mirkin

2015-Aug-19 01:49 UTC

head link

[Nouveau] [PATCH 2/2] nvc0/ir: detect i2f/i2i which operate on specific bytes/words

Some Unigine shaders have been observed to unpack bytes out of 32-bit
integers and convert them to floats. I2F/I2I can handle this sort of
thing directly. Detect the handleable situations.

This misses 16-bit word capabilities in nv50, but I haven't seen shaders
that would actually make use of that.

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp |  1 +
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp |  2 +
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  |  4 ++
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 79 ++++++++++++++++++++--
 4 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index f06056f..8f15429 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -933,6 +933,7 @@ CodeEmitterGK110::emitCVT(const Instruction *i)
 
    code[0] |= typeSizeofLog2(dType) << 10;
    code[0] |= typeSizeofLog2(i->sType) << 12;
+   code[1] |= i->subOp << 12;
 
    if (isSignedIntType(dType))
       code[0] |= 0x4000;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index ef5c87d..6e22788 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -818,6 +818,7 @@ CodeEmitterGM107::emitI2F()
    emitField(0x31, 1, (insn->op == OP_ABS) || insn->src(0).mod.abs());
    emitCC   (0x2f);
    emitField(0x2d, 1, (insn->op == OP_NEG) || insn->src(0).mod.neg());
+   emitField(0x29, 2, insn->subOp);
    emitRND  (0x27, rnd, -1);
    emitField(0x0d, 1, isSignedType(insn->sType));
    emitField(0x0a, 2, util_logbase2(typeSizeof(insn->sType)));
@@ -850,6 +851,7 @@ CodeEmitterGM107::emitI2I()
    emitField(0x31, 1, (insn->op == OP_ABS) || insn->src(0).mod.abs());
    emitCC   (0x2f);
    emitField(0x2d, 1, (insn->op == OP_NEG) || insn->src(0).mod.neg());
+   emitField(0x29, 2, insn->subOp);
    emitField(0x0d, 1, isSignedType(insn->sType));
    emitField(0x0c, 1, isSignedType(insn->dType));
    emitField(0x0a, 2, util_logbase2(typeSizeof(insn->sType)));
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index 5703712..6bf5219 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -1020,6 +1020,10 @@ CodeEmitterNVC0::emitCVT(Instruction *i)
       code[0] |= util_logbase2(typeSizeof(dType)) << 20;
       code[0] |= util_logbase2(typeSizeof(i->sType)) << 23;
 
+      // for 8/16 source types, the byte/word is in subOp. word 1 is
+      // represented as 2.
+      code[1] |= i->subOp << 0x17;
+
       if (sat)
          code[0] |= 0x20;
       if (abs)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index b0e74f0..e37420c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1312,7 +1312,8 @@ private:
    void handleRCP(Instruction *);
    void handleSLCT(Instruction *);
    void handleLOGOP(Instruction *);
-   void handleCVT(Instruction *);
+   void handleCVT_NEG(Instruction *);
+   void handleCVT_EXTBF(Instruction *);
    void handleSUCLAMP(Instruction *);
 
    BuildUtil bld;
@@ -1563,12 +1564,12 @@ AlgebraicOpt::handleLOGOP(Instruction *logop)
 // nv50:
 //  F2I(NEG(I2F(ABS(SET))))
 void
-AlgebraicOpt::handleCVT(Instruction *cvt)
+AlgebraicOpt::handleCVT_NEG(Instruction *cvt)
 {
+   Instruction *insn = cvt->getSrc(0)->getInsn();
    if (cvt->sType != TYPE_F32 ||
        cvt->dType != TYPE_S32 || cvt->src(0).mod != Modifier(0))
       return;
-   Instruction *insn = cvt->getSrc(0)->getInsn();
    if (!insn || insn->op != OP_NEG || insn->dType != TYPE_F32)
       return;
    if (insn->src(0).mod != Modifier(0))
@@ -1598,6 +1599,74 @@ AlgebraicOpt::handleCVT(Instruction *cvt)
    delete_Instruction(prog, cvt);
 }
 
+// Some shaders extract packed bytes out of words and convert them to
+// e.g. float. The Fermi+ CVT instruction can extract those directly, as can
+// nv50 for word sizes.
+//
+// CVT(EXTBF(x, byte/word))
+// CVT(AND(bytemask, x))
+// CVT(AND(bytemask, SHR(x, 8/16/24)))
+void
+AlgebraicOpt::handleCVT_EXTBF(Instruction *cvt)
+{
+   Instruction *insn = cvt->getSrc(0)->getInsn();
+   ImmediateValue imm0, imm1;
+   Value *arg = NULL;
+   unsigned width, offset;
+   if ((cvt->sType != TYPE_U32 && cvt->sType != TYPE_S32) ||
!insn)
+      return;
+   if (insn->op == OP_EXTBF && insn->src(1).getImmediate(imm0)) {
+      width = (imm0.reg.data.u32 >> 8) & 0xff;
+      offset = imm0.reg.data.u32 & 0xff;
+      arg = insn->getSrc(0);
+
+      if (width != 8 && width != 16)
+         return;
+      if (width == 8 && offset & 0x7)
+         return;
+      if (width == 16 && offset & 0xf)
+         return;
+   } else if (insn->op == OP_AND) {
+      int s;
+      if (insn->src(0).getImmediate(imm0))
+         s = 0;
+      else if (insn->src(1).getImmediate(imm0))
+         s = 1;
+      else
+         return;
+
+      if (imm0.reg.data.u32 == 0xff)
+         width = 8;
+      else if (imm0.reg.data.u32 == 0xffff)
+         width = 16;
+      else
+         return;
+
+      arg = insn->getSrc(!s);
+      Instruction *shift = arg->getInsn();
+      offset = 0;
+      if (shift && shift->op == OP_SHR &&
+          shift->src(1).getImmediate(imm1) &&
+          ((width == 8 && (imm1.reg.data.u32 & 0x7) == 0) ||
+           (width == 16 && (imm1.reg.data.u32 & 0xf) == 0))) {
+         arg = shift->getSrc(0);
+         offset = imm1.reg.data.u32;
+      }
+   }
+
+   if (!arg)
+      return;
+
+   if (width == 8) {
+      cvt->sType = cvt->sType == TYPE_U32 ? TYPE_U8 : TYPE_S8;
+   } else {
+      assert(width == 16);
+      cvt->sType = cvt->sType == TYPE_U32 ? TYPE_U16 : TYPE_S16;
+   }
+   cvt->setSrc(0, arg);
+   cvt->subOp = offset >> 3;
+}
+
 // SUCLAMP dst, (ADD b imm), k, 0 -> SUCLAMP dst, b, k, imm (if imm fits s6)
 void
 AlgebraicOpt::handleSUCLAMP(Instruction *insn)
@@ -1668,7 +1737,9 @@ AlgebraicOpt::visit(BasicBlock *bb)
          handleLOGOP(i);
          break;
       case OP_CVT:
-         handleCVT(i);
+         handleCVT_NEG(i);
+         if (prog->getTarget()->isOpSupported(OP_EXTBF, TYPE_U32))
+             handleCVT_EXTBF(i);
          break;
       case OP_SUCLAMP:
          handleSUCLAMP(i);
-- 
2.4.6

Matt Turner

2015-Aug-19 01:57 UTC

head link

[Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin <imirkin at alum.mit.edu>
wrote:> Some shaders appear to extract bits using shift/and combos. Detect
> (some) of those and convert to EXTBF instead.
What is EXTBF? Extract byte to float?

I ask because Unigine Heaven has shaders that pack 3x byte-integers
into one component of a vec4 and extracts them with shifts/ands and
converts them to floats, and i965 could do the extraction and
conversion in a single instruction. I'm curious if this is the same
thing you're optimizing.

I thought about adding an extract_byte(src, byte_num) operation, but
i965's copy propagation caused me some headache and I shelved it.

Matt Turner

2015-Aug-19 01:58 UTC

head link

[Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

On Tue, Aug 18, 2015 at 6:57 PM, Matt Turner <mattst88 at gmail.com>
wrote:> On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin <imirkin at
alum.mit.edu> wrote:
>> Some shaders appear to extract bits using shift/and combos. Detect
>> (some) of those and convert to EXTBF instead.
>
> What is EXTBF? Extract byte to float?
>
> I ask because Unigine Heaven has shaders that pack 3x byte-integers
> into one component of a vec4 and extracts them with shifts/ands and
> converts them to floats, and i965 could do the extraction and
> conversion in a single instruction. I'm curious if this is the same
> thing you're optimizing.
Well, I apparently just needed to read your second patch's commit
message to confirm my suspicions.
> I thought about adding an extract_byte(src, byte_num) operation, but
> i965's copy propagation caused me some headache and I shelved it.

Ilia Mirkin

2015-Aug-19 02:00 UTC

head link

[Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

On Tue, Aug 18, 2015 at 9:57 PM, Matt Turner <mattst88 at gmail.com>
wrote:> On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin <imirkin at
alum.mit.edu> wrote:
>> Some shaders appear to extract bits using shift/and combos. Detect
>> (some) of those and convert to EXTBF instead.
>
> What is EXTBF? Extract byte to float?
Extract Bitfield.
>
> I ask because Unigine Heaven has shaders that pack 3x byte-integers
> into one component of a vec4 and extracts them with shifts/ands and
> converts them to floats, and i965 could do the extraction and
> conversion in a single instruction. I'm curious if this is the same
> thing you're optimizing.
>
> I thought about adding an extract_byte(src, byte_num) operation, but
> i965's copy propagation caused me some headache and I shelved it.
Yes, I think it's the same shader... it's doing a texelFetch() and
then grabbing bytes 0, 1, 2 off that.

The generated shader code after the second patch does:

        /*05d0*/                   TLD.LL.P R0, R24, 0x0, 2D, 0x3;
        /*05d8*/                   TEXDEPBAR 0x0;
        /*05e0*/                   I2F.F32.U8 R2, R1;
        /*05e8*/                   FFMA.FTZ R2, R2, R15, R19;
        /*05f0*/                   I2F.F32.U8 R8, R1.B1;
        /*05f8*/                   FFMA.FTZ R8, R8, R15, R19;
        /*0608*/                   I2F.F32.U8 R1, R1.B2;

I'll let you guess what these things mean. TLD = texelfetch :)

  -ilia

Eric Anholt

2015-Aug-20 16:13 UTC

head link

[Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

Matt Turner <mattst88 at gmail.com> writes:
> On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin <imirkin at
alum.mit.edu> wrote:
>> Some shaders appear to extract bits using shift/and combos. Detect
>> (some) of those and convert to EXTBF instead.
>
> What is EXTBF? Extract byte to float?
>
> I ask because Unigine Heaven has shaders that pack 3x byte-integers
> into one component of a vec4 and extracts them with shifts/ands and
> converts them to floats, and i965 could do the extraction and
> conversion in a single instruction. I'm curious if this is the same
> thing you're optimizing.
>
> I thought about adding an extract_byte(src, byte_num) operation, but
> i965's copy propagation caused me some headache and I shelved it.
I could use this one, as int, uint, and unorm unpacks.  Right now for
int/uint I'm recognizing the pattern in vc4_program.c (in a branch).
I'd be interested in writing the NIR bits if others are interested in
having this.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL:
<lists.freedesktop.org/archives/nouveau/attachments/20150820/80bab5fd/attachment.sig>

Apparently Analagous Threads

Search for more possibly parallel threads

Nouveau - Aug 2015 - [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

[Nouveau] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

[Nouveau] [PATCH 2/2] nvc0/ir: detect i2f/i2i which operate on specific bytes/words

[Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

[Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

[Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

[Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

Apparently Analagous Threads