Ilia Mirkin
2015-Aug-19 01:49 UTC
[Nouveau] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
Some shaders appear to extract bits using shift/and combos. Detect (some) of those and convert to EXTBF instead. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 66 +++++++++++++++------- 1 file changed, 46 insertions(+), 20 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 3841c33..b0e74f0 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1023,27 +1023,53 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s) case OP_AND: { - CmpInstruction *cmp = i->getSrc(t)->getInsn()->asCmp(); - if (!cmp || cmp->op == OP_SLCT || cmp->getDef(0)->refCount() > 1) - return; - if (!prog->getTarget()->isOpSupported(cmp->op, TYPE_F32)) - return; - if (imm0.reg.data.f32 != 1.0) - return; - if (i->getSrc(t)->getInsn()->dType != TYPE_U32) - return; + Instruction *src = i->getSrc(t)->getInsn(); + ImmediateValue imm1; + if (imm0.reg.data.u32 == 0) { + i->op = OP_MOV; + i->setSrc(0, new_ImmediateValue(prog, 0u)); + i->src(0).mod = Modifier(0); + i->setSrc(1, NULL); + } else if (imm0.reg.data.u32 == ~0U) { + i->op = i->src(t).mod.getOp(); + if (t) { + i->setSrc(0, i->getSrc(t)); + i->src(0).mod = i->src(t).mod; + } + i->setSrc(1, NULL); + } else if (src->asCmp()) { + CmpInstruction *cmp = src->asCmp(); + if (!cmp || cmp->op == OP_SLCT || cmp->getDef(0)->refCount() > 1) + return; + if (!prog->getTarget()->isOpSupported(cmp->op, TYPE_F32)) + return; + if (imm0.reg.data.f32 != 1.0) + return; + if (cmp->dType != TYPE_U32) + return; - i->getSrc(t)->getInsn()->dType = TYPE_F32; - if (i->src(t).mod != Modifier(0)) { - assert(i->src(t).mod == Modifier(NV50_IR_MOD_NOT)); - i->src(t).mod = Modifier(0); - cmp->setCond = inverseCondCode(cmp->setCond); - } - i->op = OP_MOV; - i->setSrc(s, NULL); - if (t) { - i->setSrc(0, i->getSrc(t)); - i->setSrc(t, NULL); + cmp->dType = TYPE_F32; + if (i->src(t).mod != Modifier(0)) { + assert(i->src(t).mod == Modifier(NV50_IR_MOD_NOT)); + i->src(t).mod = Modifier(0); + cmp->setCond = inverseCondCode(cmp->setCond); + } + i->op = OP_MOV; + i->setSrc(s, NULL); + if (t) { + i->setSrc(0, i->getSrc(t)); + i->setSrc(t, NULL); + } + } else if (prog->getTarget()->isOpSupported(OP_EXTBF, TYPE_U32) && + src->op == OP_SHR && + src->src(1).getImmediate(imm1) && + i->src(t).mod == Modifier(0) && + util_is_power_of_two(imm0.reg.data.u32 + 1)) { + // low byte = offset, high byte = width + uint32_t ext = (util_last_bit(imm0.reg.data.u32) << 8) | imm1.reg.data.u32; + i->op = OP_EXTBF; + i->setSrc(0, src->getSrc(0)); + i->setSrc(1, new_ImmediateValue(prog, ext)); } } break; -- 2.4.6
Ilia Mirkin
2015-Aug-19 01:49 UTC
[Nouveau] [PATCH 2/2] nvc0/ir: detect i2f/i2i which operate on specific bytes/words
Some Unigine shaders have been observed to unpack bytes out of 32-bit integers and convert them to floats. I2F/I2I can handle this sort of thing directly. Detect the handleable situations. This misses 16-bit word capabilities in nv50, but I haven't seen shaders that would actually make use of that. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 1 + .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 2 + .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 4 ++ .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 79 ++++++++++++++++++++-- 4 files changed, 82 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp index f06056f..8f15429 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp @@ -933,6 +933,7 @@ CodeEmitterGK110::emitCVT(const Instruction *i) code[0] |= typeSizeofLog2(dType) << 10; code[0] |= typeSizeofLog2(i->sType) << 12; + code[1] |= i->subOp << 12; if (isSignedIntType(dType)) code[0] |= 0x4000; diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp index ef5c87d..6e22788 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp @@ -818,6 +818,7 @@ CodeEmitterGM107::emitI2F() emitField(0x31, 1, (insn->op == OP_ABS) || insn->src(0).mod.abs()); emitCC (0x2f); emitField(0x2d, 1, (insn->op == OP_NEG) || insn->src(0).mod.neg()); + emitField(0x29, 2, insn->subOp); emitRND (0x27, rnd, -1); emitField(0x0d, 1, isSignedType(insn->sType)); emitField(0x0a, 2, util_logbase2(typeSizeof(insn->sType))); @@ -850,6 +851,7 @@ CodeEmitterGM107::emitI2I() emitField(0x31, 1, (insn->op == OP_ABS) || insn->src(0).mod.abs()); emitCC (0x2f); emitField(0x2d, 1, (insn->op == OP_NEG) || insn->src(0).mod.neg()); + emitField(0x29, 2, insn->subOp); emitField(0x0d, 1, isSignedType(insn->sType)); emitField(0x0c, 1, isSignedType(insn->dType)); emitField(0x0a, 2, util_logbase2(typeSizeof(insn->sType))); diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index 5703712..6bf5219 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp @@ -1020,6 +1020,10 @@ CodeEmitterNVC0::emitCVT(Instruction *i) code[0] |= util_logbase2(typeSizeof(dType)) << 20; code[0] |= util_logbase2(typeSizeof(i->sType)) << 23; + // for 8/16 source types, the byte/word is in subOp. word 1 is + // represented as 2. + code[1] |= i->subOp << 0x17; + if (sat) code[0] |= 0x20; if (abs) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index b0e74f0..e37420c 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1312,7 +1312,8 @@ private: void handleRCP(Instruction *); void handleSLCT(Instruction *); void handleLOGOP(Instruction *); - void handleCVT(Instruction *); + void handleCVT_NEG(Instruction *); + void handleCVT_EXTBF(Instruction *); void handleSUCLAMP(Instruction *); BuildUtil bld; @@ -1563,12 +1564,12 @@ AlgebraicOpt::handleLOGOP(Instruction *logop) // nv50: // F2I(NEG(I2F(ABS(SET)))) void -AlgebraicOpt::handleCVT(Instruction *cvt) +AlgebraicOpt::handleCVT_NEG(Instruction *cvt) { + Instruction *insn = cvt->getSrc(0)->getInsn(); if (cvt->sType != TYPE_F32 || cvt->dType != TYPE_S32 || cvt->src(0).mod != Modifier(0)) return; - Instruction *insn = cvt->getSrc(0)->getInsn(); if (!insn || insn->op != OP_NEG || insn->dType != TYPE_F32) return; if (insn->src(0).mod != Modifier(0)) @@ -1598,6 +1599,74 @@ AlgebraicOpt::handleCVT(Instruction *cvt) delete_Instruction(prog, cvt); } +// Some shaders extract packed bytes out of words and convert them to +// e.g. float. The Fermi+ CVT instruction can extract those directly, as can +// nv50 for word sizes. +// +// CVT(EXTBF(x, byte/word)) +// CVT(AND(bytemask, x)) +// CVT(AND(bytemask, SHR(x, 8/16/24))) +void +AlgebraicOpt::handleCVT_EXTBF(Instruction *cvt) +{ + Instruction *insn = cvt->getSrc(0)->getInsn(); + ImmediateValue imm0, imm1; + Value *arg = NULL; + unsigned width, offset; + if ((cvt->sType != TYPE_U32 && cvt->sType != TYPE_S32) || !insn) + return; + if (insn->op == OP_EXTBF && insn->src(1).getImmediate(imm0)) { + width = (imm0.reg.data.u32 >> 8) & 0xff; + offset = imm0.reg.data.u32 & 0xff; + arg = insn->getSrc(0); + + if (width != 8 && width != 16) + return; + if (width == 8 && offset & 0x7) + return; + if (width == 16 && offset & 0xf) + return; + } else if (insn->op == OP_AND) { + int s; + if (insn->src(0).getImmediate(imm0)) + s = 0; + else if (insn->src(1).getImmediate(imm0)) + s = 1; + else + return; + + if (imm0.reg.data.u32 == 0xff) + width = 8; + else if (imm0.reg.data.u32 == 0xffff) + width = 16; + else + return; + + arg = insn->getSrc(!s); + Instruction *shift = arg->getInsn(); + offset = 0; + if (shift && shift->op == OP_SHR && + shift->src(1).getImmediate(imm1) && + ((width == 8 && (imm1.reg.data.u32 & 0x7) == 0) || + (width == 16 && (imm1.reg.data.u32 & 0xf) == 0))) { + arg = shift->getSrc(0); + offset = imm1.reg.data.u32; + } + } + + if (!arg) + return; + + if (width == 8) { + cvt->sType = cvt->sType == TYPE_U32 ? TYPE_U8 : TYPE_S8; + } else { + assert(width == 16); + cvt->sType = cvt->sType == TYPE_U32 ? TYPE_U16 : TYPE_S16; + } + cvt->setSrc(0, arg); + cvt->subOp = offset >> 3; +} + // SUCLAMP dst, (ADD b imm), k, 0 -> SUCLAMP dst, b, k, imm (if imm fits s6) void AlgebraicOpt::handleSUCLAMP(Instruction *insn) @@ -1668,7 +1737,9 @@ AlgebraicOpt::visit(BasicBlock *bb) handleLOGOP(i); break; case OP_CVT: - handleCVT(i); + handleCVT_NEG(i); + if (prog->getTarget()->isOpSupported(OP_EXTBF, TYPE_U32)) + handleCVT_EXTBF(i); break; case OP_SUCLAMP: handleSUCLAMP(i); -- 2.4.6
Matt Turner
2015-Aug-19 01:57 UTC
[Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:> Some shaders appear to extract bits using shift/and combos. Detect > (some) of those and convert to EXTBF instead.What is EXTBF? Extract byte to float? I ask because Unigine Heaven has shaders that pack 3x byte-integers into one component of a vec4 and extracts them with shifts/ands and converts them to floats, and i965 could do the extraction and conversion in a single instruction. I'm curious if this is the same thing you're optimizing. I thought about adding an extract_byte(src, byte_num) operation, but i965's copy propagation caused me some headache and I shelved it.
Matt Turner
2015-Aug-19 01:58 UTC
[Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
On Tue, Aug 18, 2015 at 6:57 PM, Matt Turner <mattst88 at gmail.com> wrote:> On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote: >> Some shaders appear to extract bits using shift/and combos. Detect >> (some) of those and convert to EXTBF instead. > > What is EXTBF? Extract byte to float? > > I ask because Unigine Heaven has shaders that pack 3x byte-integers > into one component of a vec4 and extracts them with shifts/ands and > converts them to floats, and i965 could do the extraction and > conversion in a single instruction. I'm curious if this is the same > thing you're optimizing.Well, I apparently just needed to read your second patch's commit message to confirm my suspicions.> I thought about adding an extract_byte(src, byte_num) operation, but > i965's copy propagation caused me some headache and I shelved it.
Ilia Mirkin
2015-Aug-19 02:00 UTC
[Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
On Tue, Aug 18, 2015 at 9:57 PM, Matt Turner <mattst88 at gmail.com> wrote:> On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote: >> Some shaders appear to extract bits using shift/and combos. Detect >> (some) of those and convert to EXTBF instead. > > What is EXTBF? Extract byte to float?Extract Bitfield.> > I ask because Unigine Heaven has shaders that pack 3x byte-integers > into one component of a vec4 and extracts them with shifts/ands and > converts them to floats, and i965 could do the extraction and > conversion in a single instruction. I'm curious if this is the same > thing you're optimizing. > > I thought about adding an extract_byte(src, byte_num) operation, but > i965's copy propagation caused me some headache and I shelved it.Yes, I think it's the same shader... it's doing a texelFetch() and then grabbing bytes 0, 1, 2 off that. The generated shader code after the second patch does: /*05d0*/ TLD.LL.P R0, R24, 0x0, 2D, 0x3; /*05d8*/ TEXDEPBAR 0x0; /*05e0*/ I2F.F32.U8 R2, R1; /*05e8*/ FFMA.FTZ R2, R2, R15, R19; /*05f0*/ I2F.F32.U8 R8, R1.B1; /*05f8*/ FFMA.FTZ R8, R8, R15, R19; /*0608*/ I2F.F32.U8 R1, R1.B2; I'll let you guess what these things mean. TLD = texelfetch :) -ilia
Eric Anholt
2015-Aug-20 16:13 UTC
[Nouveau] [Mesa-dev] [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
Matt Turner <mattst88 at gmail.com> writes:> On Tue, Aug 18, 2015 at 6:49 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote: >> Some shaders appear to extract bits using shift/and combos. Detect >> (some) of those and convert to EXTBF instead. > > What is EXTBF? Extract byte to float? > > I ask because Unigine Heaven has shaders that pack 3x byte-integers > into one component of a vec4 and extracts them with shifts/ands and > converts them to floats, and i965 could do the extraction and > conversion in a single instruction. I'm curious if this is the same > thing you're optimizing. > > I thought about adding an extract_byte(src, byte_num) operation, but > i965's copy propagation caused me some headache and I shelved it.I could use this one, as int, uint, and unorm unpacks. Right now for int/uint I'm recognizing the pattern in vc4_program.c (in a branch). I'd be interested in writing the NIR bits if others are interested in having this. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 818 bytes Desc: not available URL: <http://lists.freedesktop.org/archives/nouveau/attachments/20150820/80bab5fd/attachment.sig>
Reasonably Related Threads
- [Bug 111217] New: compilation of vdpau shaders crashes in handleCVT_CVT
- [PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax
- [PATCH] nv50/ir: optimmize shl(a, 0) to a
- [PATCH 1/4] nvc0/ir: avoid jumping to a sched instruction
- [PATCH v2 0/3] nv50/ir: Preapre for running Opts inside a loop