Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2
OK, so there's a bunch of stuff in here. The geometry stuff is based on the work started by Bryan Cain and Christoph Bumiller. Patches 01-12: Add support for geometry shaders and fix related issues Patches 13-14: Make it possible for fb clears to operate on texture attachments with an explicit layer set (as is allowed in gl 3.2). Patches 15-17: Make ARB_texture_multisample work Patch 18: Enable GLSL 1.50 Patch 19: Turn on ARB_seamless_cube_map irrespective of HW support so that all nv50 cards can get OpenGL 3.2 and geometry shaders (which are otherwise unsupported) There are still a few geometry-related piglits that fail -- specifically: primitive-id-no-gs gl-3.2-layered-rendering-gl-layer* I need to trace the blob to figure out exactly how to configure the HW for those situations, but I suspect that the fixes will be fairly small and self-contained. Note that there are also a bunch of EXT_framebuffer_multisample tests that are failing, but that has nothing to do with these changes. There's something wrong with the blit_3d function, at the very least to do with depth/stencil, but also some color tests fail as well. These patches are available at https://github.com/imirkin/mesa.git nv50-gs or https://github.com/imirkin/mesa/commits/nv50-gs for those who prefer a web ui. Bryan Cain (2): nv50/ir: delay calculation of indirect addresses nv50: add support for geometry shaders Christoph Bumiller (1): nv50/ir: fix PFETCH and add RDSV to get VSTRIDE for GPs Ilia Mirkin (16): nv50: allow vert_count to be >255 nv50/ir: disallow predicates on emit/restart ops nv50/ir: disallow shader input propagation for gp nv50/ir: comment out code to allow input/immed loads nv50/ir: add support for gl_PrimitiveIDIn nv50: properly set the PRIMITIVE_ID enable flag when it is a gp input. nv50: VP_RESULT_MAP_SIZE has to be positive nv50: GP_REG_ALLOC_RESULT must be positive nv50: allocate an extra code bo to avoid dmesg spam nv50: don't forget to also clear additional layers nvc0: don't forget to also clear additional layers nv50: add comments about CB_AUX contents nv50: copy nvc0's get_sample_position implementation nv50: add support for textureFetch'ing MS textures, ARB_texture_multisample nv50: report glsl 1.50 now that gp tests pass nv50: enable seamless cube maps on all hw for OpenGL 3.2 src/gallium/drivers/nouveau/codegen/nv50_ir.h | 9 ++ .../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 92 ++++++++++-- .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 41 ++++-- .../nouveau/codegen/nv50_ir_lowering_nv50.cpp | 164 ++++++++++++++++++++- .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 7 + .../drivers/nouveau/codegen/nv50_ir_print.cpp | 1 + .../nouveau/codegen/nv50_ir_target_nv50.cpp | 18 ++- src/gallium/drivers/nouveau/nv50/nv50_context.c | 46 ++++++ src/gallium/drivers/nouveau/nv50/nv50_context.h | 17 +++ src/gallium/drivers/nouveau/nv50/nv50_program.c | 30 +++- src/gallium/drivers/nouveau/nv50/nv50_program.h | 2 +- src/gallium/drivers/nouveau/nv50/nv50_screen.c | 23 ++- .../drivers/nouveau/nv50/nv50_shader_state.c | 6 + .../drivers/nouveau/nv50/nv50_state_validate.c | 2 +- src/gallium/drivers/nouveau/nv50/nv50_surface.c | 25 ++-- src/gallium/drivers/nouveau/nv50/nv50_tex.c | 77 +++++++++- src/gallium/drivers/nouveau/nvc0/nvc0_surface.c | 22 ++- 17 files changed, 526 insertions(+), 56 deletions(-) -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 01/19] nv50/ir: fix PFETCH and add RDSV to get VSTRIDE for GPs
From: Christoph Bumiller <e0425955 at student.tuwien.ac.at> --- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 + .../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 62 ++++++++++++++++++++-- .../drivers/nouveau/codegen/nv50_ir_print.cpp | 1 + 3 files changed, 59 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h index 68c76e5..6a001d3 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h @@ -366,6 +366,7 @@ enum SVSemantic SV_CLOCK, SV_LBASE, SV_SBASE, + SV_VERTEX_STRIDE, SV_UNDEFINED, SV_LAST }; diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp index 3eca27d..cf82e2f 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp @@ -87,6 +87,7 @@ private: void emitLOAD(const Instruction *); void emitSTORE(const Instruction *); void emitMOV(const Instruction *); + void emitRDSV(const Instruction *); void emitNOP(); void emitINTERP(const Instruction *); void emitPFETCH(const Instruction *); @@ -772,6 +773,29 @@ CodeEmitterNV50::emitMOV(const Instruction *i) } } +static inline uint8_t getSRegEncoding(const ValueRef &ref) +{ + switch (SDATA(ref).sv.sv) { + case SV_PHYSID: return 0; + case SV_CLOCK: return 1; + case SV_VERTEX_STRIDE: return 3; +// case SV_PM_COUNTER: return 4 + SDATA(ref).sv.index; + case SV_SAMPLE_INDEX: return 8; + default: + assert(!"no sreg for system value"); + return 0; + } +} + +void +CodeEmitterNV50::emitRDSV(const Instruction *i) +{ + code[0] = 0x00000001; + code[1] = 0x60000000 | (getSRegEncoding(i->src(0)) << 14); + defId(i->def(0), 2); + emitFlagsRd(i); +} + void CodeEmitterNV50::emitNOP() { @@ -794,15 +818,40 @@ CodeEmitterNV50::emitQUADOP(const Instruction *i, uint8_t lane, uint8_t quOp) srcId(i->src(0), 32 + 14); } +/* NOTE: This returns the base address of a vertex inside the primitive. + * src0 is an immediate, the index (not offset) of the vertex + * inside the primitive. XXX: signed or unsigned ? + * src1 (may be NULL) should use whatever units the hardware requires + * (on nv50 this is bytes, so, relative index * 4; signed 16 bit value). + */ void CodeEmitterNV50::emitPFETCH(const Instruction *i) { - code[0] = 0x11800001; - code[1] = 0x04200000 | (0xf << 14); + const uint32_t prim = i->src(0).get()->reg.data.u32; + assert(prim <= 127); - defId(i->def(0), 2); - srcAddr8(i->src(0), 9); - setAReg16(i, 0); + if (i->def(0).getFile() == FILE_ADDRESS) { + // shl $aX a[] 0 + code[0] = 0x00000001 | ((DDATA(i->def(0)).id + 1) << 2); + code[1] = 0xc0200000; + code[0] |= prim << 9; + assert(!i->srcExists(1)); + } else + if (i->srcExists(1)) { + // ld b32 $rX a[$aX+base] + code[0] = 0x00000001; + code[1] = 0x04200000 | (0xf << 14); + defId(i->def(0), 2); + code[0] |= prim << 9; + setARegBits(SDATA(i->src(1)).id + 1); + } else { + // mov b32 $rX a[] + code[0] = 0x10000001; + code[1] = 0x04200000 | (0xf << 14); + defId(i->def(0), 2); + code[0] |= prim << 9; + } + emitFlagsRd(i); } void @@ -1620,6 +1669,9 @@ CodeEmitterNV50::emitInstruction(Instruction *insn) case OP_PFETCH: emitPFETCH(insn); break; + case OP_RDSV: + emitRDSV(insn); + break; case OP_LINTERP: case OP_PINTERP: emitINTERP(insn); diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp index ee39b3c..ae42d03 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp @@ -265,6 +265,7 @@ static const char *SemanticStr[SV_LAST + 1] "CLOCK", "LBASE", "SBASE", + "VERTEX_STRIDE", "?", "(INVALID)" }; -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 02/19] nv50/ir: delay calculation of indirect addresses
From: Bryan Cain <bryancain3 at gmail.com> Instead of emitting an SHL 4 io an address register on the TGSI ARL and UARL instructions, emit the shift when the loaded address is actually used. This is necessary because input vertex and attribute indices in geometry shaders on nv50 need to be shifted left by 2 instead of 4. Signed-off-by: Bryan Cain <bryancain3 at gmail.com> [calim: various updates to the indirect address logic] Signed-off-by: Christoph Bumiller <e0425955 at student.tuwien.ac.at> [imirkin: remove OP_MAD change that calim made, add OP_RESTART handling same as OP_EMIT for code flow analysis] Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 38 ++++++-- .../nouveau/codegen/nv50_ir_lowering_nv50.cpp | 104 ++++++++++++++++++++- .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 7 ++ 3 files changed, 136 insertions(+), 13 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 49a45f8..3c790cf 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -1126,6 +1126,7 @@ private: ValueMap values; }; + Value *shiftAddress(Value *); Value *getVertexBase(int s); DataArray *getArrayForFile(unsigned file, int idx); Value *fetchSrc(int s, int c); @@ -1344,7 +1345,8 @@ Converter::getVertexBase(int s) if (tgsi.getSrc(s).isIndirect(1)) rel = fetchSrc(tgsi.getSrc(s).getIndirect(1), 0, NULL); vtxBaseValid |= 1 << s; - vtxBase[s] = mkOp2v(OP_PFETCH, TYPE_U32, getSSA(), mkImm(index), rel); + vtxBase[s] = mkOp2v(OP_PFETCH, TYPE_U32, getSSA(4, FILE_ADDRESS), + mkImm(index), rel); } return vtxBase[s]; } @@ -1403,6 +1405,14 @@ Converter::getArrayForFile(unsigned file, int idx) } Value * +Converter::shiftAddress(Value *index) +{ + if (!index) + return NULL; + return mkOp2v(OP_SHL, TYPE_U32, getSSA(4, FILE_ADDRESS), index, mkImm(4)); +} + +Value * Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr) { const int idx2d = src.is2D() ? src.getIndex(1) : 0; @@ -1414,7 +1424,7 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr) assert(!ptr); return loadImm(NULL, info->immd.data[idx * 4 + swz]); case TGSI_FILE_CONSTANT: - return mkLoadv(TYPE_U32, srcToSym(src, c), ptr); + return mkLoadv(TYPE_U32, srcToSym(src, c), shiftAddress(ptr)); case TGSI_FILE_INPUT: if (prog->getType() == Program::TYPE_FRAGMENT) { // don't load masked inputs, won't be assigned a slot @@ -1422,9 +1432,17 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr) return loadImm(NULL, swz == TGSI_SWIZZLE_W ? 1.0f : 0.0f); if (!ptr && info->in[idx].sn == TGSI_SEMANTIC_FACE) return mkOp1v(OP_RDSV, TYPE_F32, getSSA(), mkSysVal(SV_FACE, 0)); - return interpolate(src, c, ptr); + return interpolate(src, c, shiftAddress(ptr)); + } else + if (ptr && prog->getType() == Program::TYPE_GEOMETRY) { + // XXX: This is going to be a problem with scalar arrays, i.e. when + // we cannot assume that the address is given in units of vec4. + // + // nv50 and nvc0 need different things here, so let the lowering + // passes decide what to do with the address + return mkLoadv(TYPE_U32, srcToSym(src, c), ptr); } - return mkLoadv(TYPE_U32, srcToSym(src, c), ptr); + return mkLoadv(TYPE_U32, srcToSym(src, c), shiftAddress(ptr)); case TGSI_FILE_OUTPUT: assert(!"load from output file"); return NULL; @@ -1433,7 +1451,7 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr) return mkOp1v(OP_RDSV, TYPE_U32, getSSA(), srcToSym(src, c)); default: return getArrayForFile(src.getFile(), idx2d)->load( - sub.cur->values, idx, swz, ptr); + sub.cur->values, idx, swz, shiftAddress(ptr)); } } @@ -1476,8 +1494,9 @@ Converter::storeDst(int d, int c, Value *val) break; } - Value *ptr = dst.isIndirect(0) ? - fetchSrc(dst.getIndirect(0), 0, NULL) : NULL; + Value *ptr = NULL; + if (dst.isIndirect(0)) + ptr = shiftAddress(fetchSrc(dst.getIndirect(0), 0, NULL)); if (info->io.genUserClip > 0 && dst.getFile() == TGSI_FILE_OUTPUT && @@ -2179,12 +2198,11 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn) FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) { src0 = fetchSrc(0, c); mkCvt(OP_CVT, TYPE_S32, dst0[c], TYPE_F32, src0)->rnd = ROUND_M; - mkOp2(OP_SHL, TYPE_U32, dst0[c], dst0[c], mkImm(4)); } break; case TGSI_OPCODE_UARL: FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) - mkOp2(OP_SHL, TYPE_U32, dst0[c], fetchSrc(0, c), mkImm(4)); + mkOp1(OP_MOV, TYPE_U32, dst0[c], fetchSrc(0, c)); break; case TGSI_OPCODE_EX2: case TGSI_OPCODE_LG2: @@ -2721,7 +2739,7 @@ Converter::Converter(Program *ir, const tgsi::Source *code) : BuildUtil(ir), tData.setup(TGSI_FILE_TEMPORARY, 0, 0, tSize, 4, 4, tFile, 0); pData.setup(TGSI_FILE_PREDICATE, 0, 0, pSize, 4, 4, FILE_PREDICATE, 0); - aData.setup(TGSI_FILE_ADDRESS, 0, 0, aSize, 4, 4, FILE_ADDRESS, 0); + aData.setup(TGSI_FILE_ADDRESS, 0, 0, aSize, 4, 4, FILE_GPR, 0); oData.setup(TGSI_FILE_OUTPUT, 0, 0, oSize, 4, 4, FILE_GPR, 0); zero = mkImm((uint32_t)0); diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp index 07f3a21..1d13aea 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp @@ -278,10 +278,24 @@ NV50LegalizeSSA::propagateWriteToOutput(Instruction *st) // TODO: move exports (if beneficial) in common opt pass if (di->isPseudo() || isTextureOp(di->op) || di->defCount(0xff, true) > 1) return; + for (int s = 0; di->srcExists(s); ++s) if (di->src(s).getFile() == FILE_IMMEDIATE) return; + if (prog->getType() == Program::TYPE_GEOMETRY) { + // Only propagate output writes in geometry shaders when we can be sure + // that we are propagating to the same output vertex. + if (di->bb != st->bb) + return; + Instruction *i; + for (i = di; i != st; i = i->next) { + if (i->op == OP_EMIT || i->op == OP_RESTART) + return; + } + assert(i); // st after di + } + // We cannot set defs to non-lvalues before register allocation, so // save & remove (to save registers) the exports and replace later. outWrites->push_back(st); @@ -307,6 +321,9 @@ NV50LegalizeSSA::handleAddrDef(Instruction *i) i->getDef(0)->reg.size = 2; // $aX are only 16 bit + // PFETCH can always write to $a + if (i->op == OP_PFETCH) + return; // only ADDR <- SHL(GPR, IMM) and ADDR <- ADD(ADDR, IMM) are valid if (i->srcExists(1) && i->src(1).getFile() == FILE_IMMEDIATE) { if (i->op == OP_SHL && i->src(0).getFile() == FILE_GPR) @@ -473,6 +490,9 @@ NV50LegalizeSSA::visit(BasicBlock *bb) for (insn = bb->getEntry(); insn; insn = next) { next = insn->next; + if (insn->defExists(0) && insn->getDef(0)->reg.file == FILE_ADDRESS) + handleAddrDef(insn); + switch (insn->op) { case OP_EXPORT: if (outWrites) @@ -491,9 +511,6 @@ NV50LegalizeSSA::visit(BasicBlock *bb) default: break; } - - if (insn->defExists(0) && insn->getDef(0)->reg.file == FILE_ADDRESS) - handleAddrDef(insn); } return true; } @@ -510,7 +527,9 @@ private: bool handleRDSV(Instruction *); bool handleWRSV(Instruction *); + bool handlePFETCH(Instruction *); bool handleEXPORT(Instruction *); + bool handleLOAD(Instruction *); bool handleDIV(Instruction *); bool handleSQRT(Instruction *); @@ -1002,6 +1021,81 @@ NV50LoweringPreSSA::handleEXPORT(Instruction *i) return true; } +// Handle indirect addressing in geometry shaders: +// +// ld $r0 a[$a1][$a2+k] -> +// ld $r0 a[($a1 + $a2 * $vstride) + k], where k *= $vstride is implicit +// +bool +NV50LoweringPreSSA::handleLOAD(Instruction *i) +{ + ValueRef src = i->src(0); + + if (src.isIndirect(1)) { + assert(prog->getType() == Program::TYPE_GEOMETRY); + Value *addr = i->getIndirect(0, 1); + + if (src.isIndirect(0)) { + // base address is in an address register, so move to a GPR + Value *base = bld.getScratch(); + bld.mkMov(base, addr); + + Symbol *sv = bld.mkSysVal(SV_VERTEX_STRIDE, 0); + Value *vstride = bld.mkOp1v(OP_RDSV, TYPE_U32, bld.getSSA(), sv); + Value *attrib = bld.mkOp2v(OP_SHL, TYPE_U32, bld.getSSA(), + i->getIndirect(0, 0), bld.mkImm(2)); + + // Calculate final address: addr = base + attr*vstride; use 16-bit + // multiplication since 32-bit would be lowered to multiple + // instructions, and we only need the low 16 bits of the result + Value *a[2], *b[2]; + bld.mkSplit(a, 2, attrib); + bld.mkSplit(b, 2, vstride); + Value *sum = bld.mkOp3v(OP_MAD, TYPE_U16, bld.getSSA(), a[0], b[0], + base); + + // move address from GPR into an address register + addr = bld.getSSA(2, FILE_ADDRESS); + bld.mkMov(addr, sum); + } + + i->setIndirect(0, 1, NULL); + i->setIndirect(0, 0, addr); + } + + return true; +} + +bool +NV50LoweringPreSSA::handlePFETCH(Instruction *i) +{ + assert(prog->getType() == Program::TYPE_GEOMETRY); + + // NOTE: cannot use getImmediate here, not in SSA form yet, move to + // later phase if that assertion ever triggers: + + ImmediateValue *imm = i->getSrc(0)->asImm(); + assert(imm); + + assert(imm->reg.data.u32 <= 127); // TODO: use address reg if that happens + + if (i->srcExists(1)) { + // indirect addressing of vertex in primitive space + + LValue *val = bld.getScratch(); + Value *ptr = bld.getSSA(2, FILE_ADDRESS); + bld.mkOp2v(OP_SHL, TYPE_U32, ptr, i->getSrc(1), bld.mkImm(2)); + bld.mkOp2v(OP_PFETCH, TYPE_U32, val, imm, ptr); + + // NOTE: PFETCH directly to an $aX only works with direct addressing + i->op = OP_SHL; + i->setSrc(0, val); + i->setSrc(1, bld.mkImm(0)); + } + + return true; +} + // Set flags according to predicate and make the instruction read $cX. void NV50LoweringPreSSA::checkPredicate(Instruction *insn) @@ -1060,6 +1154,8 @@ NV50LoweringPreSSA::visit(Instruction *i) return handleSQRT(i); case OP_EXPORT: return handleEXPORT(i); + case OP_LOAD: + return handleLOAD(i); case OP_RDSV: return handleRDSV(i); case OP_WRSV: @@ -1070,6 +1166,8 @@ NV50LoweringPreSSA::visit(Instruction *i) return handlePRECONT(i); case OP_CONT: return handleCONT(i); + case OP_PFETCH: + return handlePFETCH(i); default: break; } diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index a838004..3840f75 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp @@ -1548,6 +1548,13 @@ NVC0LoweringPass::visit(Instruction *i) if (prog->getType() == Program::TYPE_COMPUTE) { i->getSrc(0)->reg.file = FILE_MEMORY_CONST; i->getSrc(0)->reg.fileIndex = 0; + } else + if (prog->getType() == Program::TYPE_GEOMETRY && + i->src(0).isIndirect(0)) { + // XXX: this assumes vec4 units + Value *ptr = bld.mkOp2v(OP_SHL, TYPE_U32, bld.getSSA(), + i->getIndirect(0, 0), bld.mkImm(4)); + i->setIndirect(0, 0, ptr); } else { i->op = OP_VFETCH; assert(prog->getType() != Program::TYPE_FRAGMENT); // INTERP -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 03/19] nv50: add support for geometry shaders
From: Bryan Cain <bryancain3 at gmail.com> Layer output probably doesn't work yet, but other than that everything seems to be working. Signed-off-by: Bryan Cain <bryancain3 at gmail.com> [calim: fix up minor bugs, code formatting] Signed-off-by: Christoph Bumiller <e0425955 at student.tuwien.ac.at> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 25 ++++++++++++++++------ src/gallium/drivers/nouveau/nv50/nv50_program.c | 16 ++++++++++++++ .../drivers/nouveau/nv50/nv50_shader_state.c | 2 ++ src/gallium/drivers/nouveau/nv50/nv50_tex.c | 2 ++ 4 files changed, 39 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp index cf82e2f..f4db2ed 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp @@ -493,7 +493,12 @@ CodeEmitterNV50::emitForm_MAD(const Instruction *i) setSrc(i, 1, 1); setSrc(i, 2, 2); - setAReg16(i, 1); + if (i->getIndirect(0, 0)) { + assert(!i->getIndirect(1, 0)); + setAReg16(i, 0); + } else { + setAReg16(i, 1); + } } // like default form, but 2nd source in slot 2, and no 3rd source @@ -512,7 +517,12 @@ CodeEmitterNV50::emitForm_ADD(const Instruction *i) setSrc(i, 0, 0); setSrc(i, 1, 2); - setAReg16(i, 1); + if (i->getIndirect(0, 0)) { + assert(!i->getIndirect(1, 0)); + setAReg16(i, 0); + } else { + setAReg16(i, 1); + } } // default short form (rr, ar, rc, gr) @@ -602,8 +612,11 @@ CodeEmitterNV50::emitLOAD(const Instruction *i) switch (sf) { case FILE_SHADER_INPUT: - // use 'mov' where we can - code[0] = i->src(0).isIndirect(0) ? 0x00000001 : 0x10000001; + if (progType == Program::TYPE_GEOMETRY) + code[0] = 0x11800001; + else + // use 'mov' where we can + code[0] = i->src(0).isIndirect(0) ? 0x00000001 : 0x10000001; code[1] = 0x00200000 | (i->lanes << 14); if (typeSizeof(i->dType) == 4) code[1] |= 0x04000000; @@ -1399,8 +1412,8 @@ CodeEmitterNV50::emitShift(const Instruction *i) void CodeEmitterNV50::emitOUT(const Instruction *i) { - code[0] = (i->op == OP_EMIT) ? 0xf0000200 : 0xf0000400; - code[1] = 0xc0000001; + code[0] = (i->op == OP_EMIT) ? 0xf0000201 : 0xf0000401; + code[1] = 0xc0000000; emitFlagsRd(i); } diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c b/src/gallium/drivers/nouveau/nv50/nv50_program.c index 97857d7..78a12e3 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_program.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c @@ -358,6 +358,22 @@ nv50_program_translate(struct nv50_program *prog, uint16_t chipset) } if (info->prop.fp.usesDiscard) prog->fp.flags[0] |= NV50_3D_FP_CONTROL_USES_KIL; + } else + if (prog->type == PIPE_SHADER_GEOMETRY) { + switch (info->prop.gp.outputPrim) { + case PIPE_PRIM_LINE_STRIP: + prog->gp.prim_type = NV50_3D_GP_OUTPUT_PRIMITIVE_TYPE_LINE_STRIP; + break; + case PIPE_PRIM_TRIANGLE_STRIP: + prog->gp.prim_type = NV50_3D_GP_OUTPUT_PRIMITIVE_TYPE_TRIANGLE_STRIP; + break; + case PIPE_PRIM_POINTS: + default: + assert(info->prop.gp.outputPrim == PIPE_PRIM_POINTS); + prog->gp.prim_type = NV50_3D_GP_OUTPUT_PRIMITIVE_TYPE_POINTS; + break; + } + prog->gp.vert_count = info->prop.gp.maxVertices; } if (prog->pipe.stream_output.num_outputs) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c index 9144fc4..ba4f592 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c @@ -193,6 +193,8 @@ nv50_gmtyprog_validate(struct nv50_context *nv50) struct nv50_program *gp = nv50->gmtyprog; if (gp) { + if (!nv50_program_validate(nv50, gp)) + return; BEGIN_NV04(push, NV50_3D(GP_REG_ALLOC_TEMP), 1); PUSH_DATA (push, gp->max_gpr); BEGIN_NV04(push, NV50_3D(GP_REG_ALLOC_RESULT), 1); diff --git a/src/gallium/drivers/nouveau/nv50/nv50_tex.c b/src/gallium/drivers/nouveau/nv50/nv50_tex.c index f7284fa..6663a61 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_tex.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_tex.c @@ -293,6 +293,7 @@ void nv50_validate_textures(struct nv50_context *nv50) boolean need_flush; need_flush = nv50_validate_tic(nv50, 0); + need_flush |= nv50_validate_tic(nv50, 1); need_flush |= nv50_validate_tic(nv50, 2); if (need_flush) { @@ -343,6 +344,7 @@ void nv50_validate_samplers(struct nv50_context *nv50) boolean need_flush; need_flush = nv50_validate_tsc(nv50, 0); + need_flush |= nv50_validate_tsc(nv50, 1); need_flush |= nv50_validate_tsc(nv50, 2); if (need_flush) { -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 04/19] nv50: allow vert_count to be >255
--- src/gallium/drivers/nouveau/nv50/nv50_program.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.h b/src/gallium/drivers/nouveau/nv50/nv50_program.h index 13b9516..f63352f 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_program.h +++ b/src/gallium/drivers/nouveau/nv50/nv50_program.h @@ -88,7 +88,7 @@ struct nv50_program { struct { ubyte primid; /* primitive id output register */ - uint8_t vert_count; + uint32_t vert_count; uint8_t prim_type; /* point, line strip or tri strip */ } gp; -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 05/19] nv50/ir: disallow predicates on emit/restart ops
--- src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp index ade9be0..52257a8 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp @@ -130,7 +130,8 @@ void TargetNV50::initOpInfo() }; static const operation noPredList[] { - OP_CALL, OP_PREBREAK, OP_PRERET, OP_QUADON, OP_QUADPOP, OP_JOINAT + OP_CALL, OP_PREBREAK, OP_PRERET, OP_QUADON, OP_QUADPOP, OP_JOINAT, + OP_EMIT, OP_RESTART }; for (i = 0; i < DATA_FILE_COUNT; ++i) -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 06/19] nv50/ir: disallow shader input propagation for gp
For some reason, shader input accesses don't work correctly in non-ld instructions. Disallow those loads from being propagated. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- I'm not particularly happy with this patch. Some investigation needs to happen as to what's going on here. NVIDIA's shaders include p[] accesses in various instructions just fine. Perhaps this is just masking some other bug. However this works for now for all the piglit tests in the repo. src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp index 52257a8..18fa069 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp @@ -297,14 +297,19 @@ TargetNV50::insnCanLoad(const Instruction *i, int s, switch (mode) { case 0x00: - case 0x01: case 0x03: case 0x08: - case 0x09: case 0x0c: case 0x20: case 0x21: break; + case 0x01: + case 0x09: + // TODO: Figure out why a[] accesses can't be propagated into non-ld + // instructions. Something to do with vstride maybe? + if (ld->bb->getProgram()->getType() == Program::TYPE_GEOMETRY) + return false; + break; case 0x0d: if (ld->bb->getProgram()->getType() != Program::TYPE_GEOMETRY) return false; -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 07/19] nv50/ir: comment out code to allow input/immed loads
This code was missing a break which made it ineffective. But since shader input loads have been disallowed, define the code out. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp index 18fa069..a84a54a 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp @@ -310,9 +310,12 @@ TargetNV50::insnCanLoad(const Instruction *i, int s, if (ld->bb->getProgram()->getType() == Program::TYPE_GEOMETRY) return false; break; +#if 0 case 0x0d: if (ld->bb->getProgram()->getType() != Program::TYPE_GEOMETRY) return false; + break; +#endif default: return false; } -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 08/19] nv50/ir: add support for gl_PrimitiveIDIn
Note that the primitive id is stored in a[0x18], while usually the geometry instructions are of the form a[$a1 + 0x4] which gets mapped to p[] space. We need to avoid the change from a[] to p[] here, so it's keyed on whether the access is indirect or not. Note that there's also a use-case for accessing e.g. a[$r1], however that's not supported for now. (Could be added by checking the register file of the indirect parameter.) Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 6 +++--- src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 7 +++++-- src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 3 +++ 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp index f4db2ed..a6ed4b0 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp @@ -381,7 +381,7 @@ CodeEmitterNV50::setSrcFileBits(const Instruction *i, int enc) case 0x00: // rrr break; case 0x01: // arr/grr - if (progType == Program::TYPE_GEOMETRY) { + if (progType == Program::TYPE_GEOMETRY && i->src(0).isIndirect(0)) { code[0] |= 0x01800000; if (enc == NV50_OP_ENC_LONG || enc == NV50_OP_ENC_LONG_ALT) code[1] |= 0x00200000; @@ -407,7 +407,7 @@ CodeEmitterNV50::setSrcFileBits(const Instruction *i, int enc) code[1] |= (i->getSrc(1)->reg.fileIndex << 22); break; case 0x09: // acr/gcr - if (progType == Program::TYPE_GEOMETRY) { + if (progType == Program::TYPE_GEOMETRY && i->src(0).isIndirect(0)) { code[0] |= 0x01800000; } else { code[0] |= (enc == NV50_OP_ENC_LONG_ALT) ? 0x01000000 : 0x00800000; @@ -612,7 +612,7 @@ CodeEmitterNV50::emitLOAD(const Instruction *i) switch (sf) { case FILE_SHADER_INPUT: - if (progType == Program::TYPE_GEOMETRY) + if (progType == Program::TYPE_GEOMETRY && i->src(0).isIndirect(0)) code[0] = 0x11800001; else // use 'mov' where we can diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 3c790cf..321410e 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -1434,13 +1434,16 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr) return mkOp1v(OP_RDSV, TYPE_F32, getSSA(), mkSysVal(SV_FACE, 0)); return interpolate(src, c, shiftAddress(ptr)); } else - if (ptr && prog->getType() == Program::TYPE_GEOMETRY) { + if (prog->getType() == Program::TYPE_GEOMETRY) { + if (!ptr && info->in[idx].sn == TGSI_SEMANTIC_PRIMID) + return mkOp1v(OP_RDSV, TYPE_U32, getSSA(), mkSysVal(SV_PRIMITIVE_ID, 0)); // XXX: This is going to be a problem with scalar arrays, i.e. when // we cannot assume that the address is given in units of vec4. // // nv50 and nvc0 need different things here, so let the lowering // passes decide what to do with the address - return mkLoadv(TYPE_U32, srcToSym(src, c), ptr); + if (ptr) + return mkLoadv(TYPE_U32, srcToSym(src, c), ptr); } return mkLoadv(TYPE_U32, srcToSym(src, c), shiftAddress(ptr)); case TGSI_FILE_OUTPUT: diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp index a84a54a..1925c09 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp @@ -238,6 +238,9 @@ TargetNV50::getSVAddress(DataFile shaderFile, const Symbol *sym) const addr += 4; return addr; } + case SV_PRIMITIVE_ID: + return shaderFile == FILE_SHADER_INPUT ? 0x18 : + sysvalLocation[sym->reg.data.sv.sv]; case SV_NCTAID: return 0x8 + 2 * sym->reg.data.sv.index; case SV_CTAID: -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 09/19] nv50: properly set the PRIMITIVE_ID enable flag when it is a gp input.
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/nv50/nv50_program.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c b/src/gallium/drivers/nouveau/nv50/nv50_program.c index 78a12e3..f46f240 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_program.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c @@ -52,6 +52,9 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info *info) for (c = 0; c < 4; ++c) if (info->in[i].mask & (1 << c)) info->in[i].slot[c] = n++; + + if (info->in[i].sn == TGSI_SEMANTIC_PRIMID) + prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_PRIMITIVE_ID; } prog->in_nr = info->numInputs; -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 10/19] nv50: VP_RESULT_MAP_SIZE has to be positive
Make sure that we never try to use a 0-sized map. This can happen when using a gp, so add a dummy mapping when computing vp_gp_mapping in that case. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/nv50/nv50_shader_state.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c index ba4f592..265ef20 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c @@ -457,6 +457,7 @@ nv50_fp_linkage_validate(struct nv50_context *nv50) BEGIN_NV04(push, NV50_3D(SEMANTIC_PRIM_ID), 1); PUSH_DATA (push, primid); + assert(m > 0); BEGIN_NV04(push, NV50_3D(VP_RESULT_MAP_SIZE), 1); PUSH_DATA (push, m); BEGIN_NV04(push, NV50_3D(VP_RESULT_MAP(0)), n); @@ -516,6 +517,8 @@ nv50_vp_gp_mapping(uint8_t *map, int m, oid += mv & 1; } } + if (!m) + map[m++] = 0; return m; } @@ -540,6 +543,7 @@ nv50_gp_linkage_validate(struct nv50_context *nv50) BEGIN_NV04(push, NV50_3D(VP_GP_BUILTIN_ATTR_EN), 1); PUSH_DATA (push, vp->vp.attrs[2] | gp->vp.attrs[2]); + assert(m > 0); BEGIN_NV04(push, NV50_3D(VP_RESULT_MAP_SIZE), 1); PUSH_DATA (push, m); BEGIN_NV04(push, NV50_3D(VP_RESULT_MAP(0)), n); -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 11/19] nv50: GP_REG_ALLOC_RESULT must be positive
Set max_out to 1 when there are no outputs. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/nv50/nv50_program.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c b/src/gallium/drivers/nouveau/nv50/nv50_program.c index f46f240..813795f 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_program.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c @@ -118,6 +118,8 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info *info) } prog->out_nr = info->numOutputs; prog->max_out = n; + if (!prog->max_out) + prog->max_out = 1; if (prog->vp.psiz < info->numOutputs) prog->vp.psiz = prog->out[prog->vp.psiz].hw; -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 12/19] nv50: allocate an extra code bo to avoid dmesg spam
Each code BO is a heap that allocates at the end first, and so GPs are allocated at the very end of the allocated space. When executing, we see PAGE_NOT_PRESENT errors for the next page. Just over-allocate to make sure that there's something there. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/nv50/nv50_screen.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index 43e0f50..82b0207 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -739,8 +739,12 @@ nv50_screen_create(struct nouveau_device *dev) goto fail; } + /* This over-allocates by a whole code BO. The GP, which would execute at + * the end of the last page, would trigger faults. The going theory is that + * it prefetches up to a certain amount. This avoids dmesg spam. + */ ret = nouveau_bo_new(dev, NOUVEAU_BO_VRAM, 1 << 16, - 3 << NV50_CODE_BO_SIZE_LOG2, NULL, &screen->code); + 4 << NV50_CODE_BO_SIZE_LOG2, NULL, &screen->code); if (ret) { NOUVEAU_ERR("Failed to allocate code bo: %d\n", ret); goto fail; -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 13/19] nv50: don't forget to also clear additional layers
Fixes most of the tests/spec/gl-3.2/layered-rendering/* piglits. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/nv50/nv50_surface.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_surface.c b/src/gallium/drivers/nouveau/nv50/nv50_surface.c index 358f57a..16a4369 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_surface.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_surface.c @@ -395,7 +395,7 @@ nv50_clear(struct pipe_context *pipe, unsigned buffers, struct nv50_context *nv50 = nv50_context(pipe); struct nouveau_pushbuf *push = nv50->base.pushbuf; struct pipe_framebuffer_state *fb = &nv50->framebuffer; - unsigned i; + unsigned i, j; uint32_t mode = 0; /* don't need NEW_BLEND, COLOR_MASK doesn't affect CLEAR_BUFFERS */ @@ -408,9 +408,6 @@ nv50_clear(struct pipe_context *pipe, unsigned buffers, PUSH_DATAf(push, color->f[1]); PUSH_DATAf(push, color->f[2]); PUSH_DATAf(push, color->f[3]); - mode - NV50_3D_CLEAR_BUFFERS_R | NV50_3D_CLEAR_BUFFERS_G | - NV50_3D_CLEAR_BUFFERS_B | NV50_3D_CLEAR_BUFFERS_A; } if (buffers & PIPE_CLEAR_DEPTH) { @@ -425,12 +422,22 @@ nv50_clear(struct pipe_context *pipe, unsigned buffers, mode |= NV50_3D_CLEAR_BUFFERS_S; } - BEGIN_NV04(push, NV50_3D(CLEAR_BUFFERS), 1); - PUSH_DATA (push, mode); + if ((buffers & PIPE_CLEAR_DEPTH) || (buffers & PIPE_CLEAR_STENCIL)) { + for (j = fb->zsbuf->u.tex.first_layer; j <= fb->zsbuf->u.tex.last_layer; j++) { + BEGIN_NV04(push, NV50_3D(CLEAR_BUFFERS), 1); + PUSH_DATA(push, mode | (j << NV50_3D_CLEAR_BUFFERS_LAYER__SHIFT)); + } + } - for (i = 1; i < fb->nr_cbufs; i++) { - BEGIN_NV04(push, NV50_3D(CLEAR_BUFFERS), 1); - PUSH_DATA (push, (i << 6) | 0x3c); + if (buffers & PIPE_CLEAR_COLOR) { + for (i = 0; i < fb->nr_cbufs; i++) { + struct pipe_surface *sf = fb->cbufs[i]; + for (j = sf->u.tex.first_layer; j <= sf->u.tex.last_layer; j++) { + BEGIN_NV04(push, NV50_3D(CLEAR_BUFFERS), 1); + PUSH_DATA (push, (i << 6) | 0x3c | + (j << NV50_3D_CLEAR_BUFFERS_LAYER__SHIFT)); + } + } } } -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 14/19] nvc0: don't forget to also clear additional layers
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/nv50/nv50_program.c | 2 ++ src/gallium/drivers/nouveau/nvc0/nvc0_surface.c | 22 ++++++++++++++++------ 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c b/src/gallium/drivers/nouveau/nv50/nv50_program.c index 813795f..e7609fa 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_program.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c @@ -166,6 +166,8 @@ nv50_fragprog_assign_slots(struct nv50_ir_prog_info *info) if (info->in[i].sn == TGSI_SEMANTIC_COLOR) prog->vp.bfc[info->in[i].si] = j; + if (info->in[i].sn == TGSI_SEMANTIC_PRIMID) + prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_PRIMITIVE_ID; prog->in[j].id = i; prog->in[j].mask = info->in[i].mask; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c index 5375bd4..8cc7021 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c @@ -414,7 +414,7 @@ nvc0_clear(struct pipe_context *pipe, unsigned buffers, struct nvc0_context *nvc0 = nvc0_context(pipe); struct nouveau_pushbuf *push = nvc0->base.pushbuf; struct pipe_framebuffer_state *fb = &nvc0->framebuffer; - unsigned i; + unsigned i, j; uint32_t mode = 0; /* don't need NEW_BLEND, COLOR_MASK doesn't affect CLEAR_BUFFERS */ @@ -444,12 +444,22 @@ nvc0_clear(struct pipe_context *pipe, unsigned buffers, mode |= NVC0_3D_CLEAR_BUFFERS_S; } - BEGIN_NVC0(push, NVC0_3D(CLEAR_BUFFERS), 1); - PUSH_DATA (push, mode); + if ((buffers & PIPE_CLEAR_DEPTH) || (buffers & PIPE_CLEAR_STENCIL)) { + for (j = fb->zsbuf->u.tex.first_layer; j <= fb->zsbuf->u.tex.last_layer; j++) { + BEGIN_NVC0(push, NVC0_3D(CLEAR_BUFFERS), 1); + PUSH_DATA(push, mode | (j << NVC0_3D_CLEAR_BUFFERS_LAYER__SHIFT)); + } + } - for (i = 1; i < fb->nr_cbufs; i++) { - BEGIN_NVC0(push, NVC0_3D(CLEAR_BUFFERS), 1); - PUSH_DATA (push, (i << 6) | 0x3c); + if (buffers & PIPE_CLEAR_COLOR) { + for (i = 0; i < fb->nr_cbufs; i++) { + struct pipe_surface *sf = fb->cbufs[i]; + for (j = sf->u.tex.first_layer; j <= sf->u.tex.last_layer; j++) { + BEGIN_NVC0(push, NVC0_3D(CLEAR_BUFFERS), 1); + PUSH_DATA (push, (i << 6) | 0x3c | + (j << NVC0_3D_CLEAR_BUFFERS_LAYER__SHIFT)); + } + } } } -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 15/19] nv50: add comments about CB_AUX contents
Updates a few inconsistencies as well, like the size of the buffer, location of the runout, etc. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/nv50/nv50_context.h | 10 ++++++++++ src/gallium/drivers/nouveau/nv50/nv50_screen.c | 8 ++++---- src/gallium/drivers/nouveau/nv50/nv50_state_validate.c | 2 +- 3 files changed, 15 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.h b/src/gallium/drivers/nouveau/nv50/nv50_context.h index ee6eb0e..7bf4ce3 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_context.h +++ b/src/gallium/drivers/nouveau/nv50/nv50_context.h @@ -70,7 +70,17 @@ #define NV50_CB_PVP 124 #define NV50_CB_PGP 126 #define NV50_CB_PFP 125 +/* constant buffer permanently mapped in as c15[] */ #define NV50_CB_AUX 127 +/* size of the buffer: 64k. not all taken up, can be reduced if needed. */ +#define NV50_CB_AUX_SIZE (1 << 16) +/* 8 user clip planes, at 4 32-bit floats each */ +#define NV50_CB_AUX_UCP_OFFSET 0x0 +/* 256 textures, each with 2 16-bit integers specifying the x/y MS shift */ +#define NV50_CB_AUX_MS_OFFSET 0x80 +/* 4 32-bit floats for the vertex runout, put at the end */ +#define NV50_CB_AUX_RUNOUT_OFFSET (NV50_CB_AUX_SIZE - 0x10) + struct nv50_blitctx; diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index 82b0207..9ed2d01 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -472,7 +472,7 @@ nv50_screen_init_hwctx(struct nv50_screen *screen) BEGIN_NV04(push, NV50_3D(CB_DEF_ADDRESS_HIGH), 3); PUSH_DATAh(push, screen->uniforms->offset + (3 << 16)); PUSH_DATA (push, screen->uniforms->offset + (3 << 16)); - PUSH_DATA (push, (NV50_CB_AUX << 16) | 0x0200); + PUSH_DATA (push, (NV50_CB_AUX << 16) | (NV50_CB_AUX_SIZE & 0xffff)); BEGIN_NI04(push, NV50_3D(SET_PROGRAM_CB), 3); PUSH_DATA (push, (NV50_CB_AUX << 12) | 0xf01); @@ -481,15 +481,15 @@ nv50_screen_init_hwctx(struct nv50_screen *screen) /* return { 0.0, 0.0, 0.0, 0.0 } on out-of-bounds vtxbuf access */ BEGIN_NV04(push, NV50_3D(CB_ADDR), 1); - PUSH_DATA (push, ((1 << 9) << 6) | NV50_CB_AUX); + PUSH_DATA (push, (NV50_CB_AUX_RUNOUT_OFFSET << 6) | NV50_CB_AUX); BEGIN_NI04(push, NV50_3D(CB_DATA(0)), 4); PUSH_DATAf(push, 0.0f); PUSH_DATAf(push, 0.0f); PUSH_DATAf(push, 0.0f); PUSH_DATAf(push, 0.0f); BEGIN_NV04(push, NV50_3D(VERTEX_RUNOUT_ADDRESS_HIGH), 2); - PUSH_DATAh(push, screen->uniforms->offset + (3 << 16) + (1 << 9)); - PUSH_DATA (push, screen->uniforms->offset + (3 << 16) + (1 << 9)); + PUSH_DATAh(push, screen->uniforms->offset + (3 << 16) + NV50_CB_AUX_RUNOUT_OFFSET); + PUSH_DATA (push, screen->uniforms->offset + (3 << 16) + NV50_CB_AUX_RUNOUT_OFFSET); /* max TIC (bits 4:8) & TSC bindings, per program type */ for (i = 0; i < 3; ++i) { diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c index 86b9a23..3d99b73 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c @@ -238,7 +238,7 @@ nv50_validate_clip(struct nv50_context *nv50) if (nv50->dirty & NV50_NEW_CLIP) { BEGIN_NV04(push, NV50_3D(CB_ADDR), 1); - PUSH_DATA (push, (0 << 8) | NV50_CB_AUX); + PUSH_DATA (push, (NV50_CB_AUX_UCP_OFFSET << 8) | NV50_CB_AUX); BEGIN_NI04(push, NV50_3D(CB_DATA(0)), PIPE_MAX_CLIP_PLANES * 4); PUSH_DATAp(push, &nv50->clip.ucp[0][0], PIPE_MAX_CLIP_PLANES * 4); } -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 16/19] nv50: copy nvc0's get_sample_position implementation
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/nv50/nv50_context.c | 46 +++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.c b/src/gallium/drivers/nouveau/nv50/nv50_context.c index 11afc48..db3bd3a 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_context.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_context.c @@ -196,6 +196,10 @@ nv50_invalidate_resource_storage(struct nouveau_context *ctx, return ref; } +static void +nv50_context_get_sample_position(struct pipe_context *, unsigned, unsigned, + float *); + struct pipe_context * nv50_create(struct pipe_screen *pscreen, void *priv) { @@ -239,6 +243,7 @@ nv50_create(struct pipe_screen *pscreen, void *priv) pipe->flush = nv50_flush; pipe->texture_barrier = nv50_texture_barrier; + pipe->get_sample_position = nv50_context_get_sample_position; if (!screen->cur_ctx) { screen->cur_ctx = nv50; @@ -317,3 +322,44 @@ nv50_bufctx_fence(struct nouveau_bufctx *bufctx, boolean on_flush) nv50_resource_validate(res, (unsigned)ref->priv_data); } } + +static void +nv50_context_get_sample_position(struct pipe_context *pipe, + unsigned sample_count, unsigned sample_index, + float *xy) +{ + static const uint8_t ms1[1][2] = { { 0x8, 0x8 } }; + static const uint8_t ms2[2][2] = { + { 0x4, 0x4 }, { 0xc, 0xc } }; /* surface coords (0,0), (1,0) */ + static const uint8_t ms4[4][2] = { + { 0x6, 0x2 }, { 0xe, 0x6 }, /* (0,0), (1,0) */ + { 0x2, 0xa }, { 0xa, 0xe } }; /* (0,1), (1,1) */ + static const uint8_t ms8[8][2] = { + { 0x1, 0x7 }, { 0x5, 0x3 }, /* (0,0), (1,0) */ + { 0x3, 0xd }, { 0x7, 0xb }, /* (0,1), (1,1) */ + { 0x9, 0x5 }, { 0xf, 0x1 }, /* (2,0), (3,0) */ + { 0xb, 0xf }, { 0xd, 0x9 } }; /* (2,1), (3,1) */ +#if 0 + /* NOTE: there are alternative modes for MS2 and MS8, currently not used */ + static const uint8_t ms8_alt[8][2] = { + { 0x9, 0x5 }, { 0x7, 0xb }, /* (2,0), (1,1) */ + { 0xd, 0x9 }, { 0x5, 0x3 }, /* (3,1), (1,0) */ + { 0x3, 0xd }, { 0x1, 0x7 }, /* (0,1), (0,0) */ + { 0xb, 0xf }, { 0xf, 0x1 } }; /* (2,1), (3,0) */ +#endif + + const uint8_t (*ptr)[2]; + + switch (sample_count) { + case 0: + case 1: ptr = ms1; break; + case 2: ptr = ms2; break; + case 4: ptr = ms4; break; + case 8: ptr = ms8; break; + default: + assert(0); + return; /* bad sample count -> undefined locations */ + } + xy[0] = ptr[sample_index][0] * 0.0625f; + xy[1] = ptr[sample_index][1] * 0.0625f; +} -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [PATCH 17/19] nv50: add support for textureFetch'ing MS textures, ARB_texture_multisample
Creates two areas in the AUX constbuf: - Sample offsets for MS textures - Per-texture MS settings When executing a textureFetch with a MS sampler, looks up that texture's settings and adjusts the parameters given to the texfetch instruction. With this change, all the ARB_texture_multisample piglits pass, so turn on PIPE_CAP_TEXTURE_MULTISAMPLE. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 8 +++ .../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 1 + .../nouveau/codegen/nv50_ir_lowering_nv50.cpp | 60 +++++++++++++++++ src/gallium/drivers/nouveau/nv50/nv50_context.h | 13 +++- src/gallium/drivers/nouveau/nv50/nv50_program.c | 7 +- src/gallium/drivers/nouveau/nv50/nv50_screen.c | 7 +- src/gallium/drivers/nouveau/nv50/nv50_tex.c | 75 +++++++++++++++++++++- 7 files changed, 164 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h index 6a001d3..857980d 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h @@ -827,6 +827,14 @@ public: int isShadow() const { return descTable[target].shadow ? 1 : 0; } int isMS() const { return target == TEX_TARGET_2D_MS || target == TEX_TARGET_2D_MS_ARRAY; } + void clearMS() { + if (isMS()) { + if (isArray()) + target = TEX_TARGET_2D_ARRAY; + else + target = TEX_TARGET_2D; + } + } Target& operator=(TexTarget targ) { diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp index a6ed4b0..8f9b7de 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp @@ -1221,6 +1221,7 @@ CodeEmitterNV50::emitCVT(const Instruction *i) case TYPE_S32: code[1] = 0x44014000; break; case TYPE_U32: code[1] = 0x44004000; break; case TYPE_F16: code[1] = 0xc4000000; break; + case TYPE_U16: code[1] = 0x44000000; break; default: assert(0); break; diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp index 1d13aea..984a8ca 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp @@ -549,6 +549,8 @@ private: bool handleCONT(Instruction *); void checkPredicate(Instruction *); + void loadTexMsInfo(uint32_t off, Value **ms, Value **ms_x, Value **ms_y); + void loadMsInfo(Value *ms, Value *s, Value **dx, Value **dy); private: const Target *const targ; @@ -582,6 +584,41 @@ NV50LoweringPreSSA::visit(Function *f) return true; } +void NV50LoweringPreSSA::loadTexMsInfo(uint32_t off, Value **ms, + Value **ms_x, Value **ms_y) { + // This loads the texture-indexed ms setting from the constant buffer + Value *tmp = new_LValue(func, FILE_GPR); + uint8_t b = prog->driver->io.resInfoCBSlot; + off += prog->driver->io.suInfoBase; + *ms_x = bld.mkLoadv(TYPE_U32, bld.mkSymbol( + FILE_MEMORY_CONST, b, TYPE_U32, off + 0), NULL); + *ms_y = bld.mkLoadv(TYPE_U32, bld.mkSymbol( + FILE_MEMORY_CONST, b, TYPE_U32, off + 4), NULL); + *ms = bld.mkOp2v(OP_ADD, TYPE_U32, tmp, *ms_x, *ms_y); +} + +void NV50LoweringPreSSA::loadMsInfo(Value *ms, Value *s, Value **dx, Value **dy) { + // Given a MS level, and a sample id, compute the delta x/y + uint8_t b = prog->driver->io.msInfoCBSlot; + Value *off = new_LValue(func, FILE_ADDRESS), *t = new_LValue(func, FILE_GPR); + + // The required information is at mslevel * 16 * 4 + sample * 8 + // = (mslevel * 8 + sample) * 8 + bld.mkOp2(OP_SHL, + TYPE_U32, + off, + bld.mkOp2v(OP_ADD, TYPE_U32, t, + bld.mkOp2v(OP_SHL, TYPE_U32, t, ms, bld.mkImm(3)), + s), + bld.mkImm(3)); + *dx = bld.mkLoadv(TYPE_U32, bld.mkSymbol( + FILE_MEMORY_CONST, b, TYPE_U32, + prog->driver->io.msInfoBase), off); + *dy = bld.mkLoadv(TYPE_U32, bld.mkSymbol( + FILE_MEMORY_CONST, b, TYPE_U32, + prog->driver->io.msInfoBase + 4), off); +} + bool NV50LoweringPreSSA::handleTEX(TexInstruction *i) { @@ -589,6 +626,29 @@ NV50LoweringPreSSA::handleTEX(TexInstruction *i) const int dref = arg; const int lod = i->tex.target.isShadow() ? (arg + 1) : arg; + // handle MS, which means looking up the MS params for this texture, and + // adjusting the input coordinates to point at the right sample. + if (i->tex.target.isMS()) { + Value *x = i->getSrc(0); + Value *y = i->getSrc(1); + Value *s = i->getSrc(arg - 1); + Value *tx = new_LValue(func, FILE_GPR), *ty = new_LValue(func, FILE_GPR), + *ms, *ms_x, *ms_y, *dx, *dy; + + i->tex.target.clearMS(); + + loadTexMsInfo(i->tex.r * 4 * 2, &ms, &ms_x, &ms_y); + loadMsInfo(ms, s, &dx, &dy); + + bld.mkOp2(OP_SHL, TYPE_U32, tx, x, ms_x); + bld.mkOp2(OP_SHL, TYPE_U32, ty, y, ms_y); + bld.mkOp2(OP_ADD, TYPE_U32, tx, tx, dx); + bld.mkOp2(OP_ADD, TYPE_U32, ty, ty, dy); + i->setSrc(0, tx); + i->setSrc(1, ty); + i->setSrc(arg - 1, bld.loadImm(NULL, 0)); + } + // dref comes before bias/lod if (i->tex.target.isShadow()) if (i->op == OP_TXB || i->op == OP_TXL) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.h b/src/gallium/drivers/nouveau/nv50/nv50_context.h index 7bf4ce3..1ce52c9 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_context.h +++ b/src/gallium/drivers/nouveau/nv50/nv50_context.h @@ -75,9 +75,15 @@ /* size of the buffer: 64k. not all taken up, can be reduced if needed. */ #define NV50_CB_AUX_SIZE (1 << 16) /* 8 user clip planes, at 4 32-bit floats each */ -#define NV50_CB_AUX_UCP_OFFSET 0x0 -/* 256 textures, each with 2 16-bit integers specifying the x/y MS shift */ -#define NV50_CB_AUX_MS_OFFSET 0x80 +#define NV50_CB_AUX_UCP_OFFSET 0x0000 +#define NV50_CB_AUX_UCP_SIZE (8 * 4 * 4) +/* 256 textures, each with ms_x, ms_y u32 pairs */ +#define NV50_CB_AUX_TEX_MS_OFFSET 0x0080 +#define NV50_CB_AUX_TEX_MS_SIZE (256 * 2 * 4) +/* For each MS level (4), 8 sets of 32-bit integer pairs sample offsets */ +#define NV50_CB_AUX_MS_OFFSET 0x880 +#define NV50_CB_AUX_MS_SIZE (4 * 8 * 4 * 2) +/* next spot: 0x980 */ /* 4 32-bit floats for the vertex runout, put at the end */ #define NV50_CB_AUX_RUNOUT_OFFSET (NV50_CB_AUX_SIZE - 0x10) @@ -251,6 +257,7 @@ extern void nv50_init_surface_functions(struct nv50_context *); /* nv50_tex.c */ void nv50_validate_textures(struct nv50_context *); void nv50_validate_samplers(struct nv50_context *); +void nv50_upload_ms_info(struct nouveau_pushbuf *); struct pipe_sampler_view * nv50_create_texture_view(struct pipe_context *, diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c b/src/gallium/drivers/nouveau/nv50/nv50_program.c index e7609fa..73583bd 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_program.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c @@ -323,9 +323,14 @@ nv50_program_translate(struct nv50_program *prog, uint16_t chipset) info->bin.source = (void *)prog->pipe.tokens; info->io.ucpCBSlot = 15; - info->io.ucpBase = 0; + info->io.ucpBase = NV50_CB_AUX_UCP_OFFSET; info->io.genUserClip = prog->vp.clpd_nr; + info->io.resInfoCBSlot = 15; + info->io.suInfoBase = NV50_CB_AUX_TEX_MS_OFFSET; + info->io.msInfoCBSlot = 15; + info->io.msInfoBase = NV50_CB_AUX_MS_OFFSET; + info->assignSlots = nv50_program_assign_varying_slots; prog->vp.bfc[0] = 0xff; diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index 9ed2d01..5732b21 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -184,8 +184,9 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY: case PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY: case PIPE_CAP_TGSI_TEXCOORD: - case PIPE_CAP_TEXTURE_MULTISAMPLE: return 0; + case PIPE_CAP_TEXTURE_MULTISAMPLE: + return 1; case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER: return 1; case PIPE_CAP_QUERY_PIPELINE_STATISTICS: @@ -481,7 +482,7 @@ nv50_screen_init_hwctx(struct nv50_screen *screen) /* return { 0.0, 0.0, 0.0, 0.0 } on out-of-bounds vtxbuf access */ BEGIN_NV04(push, NV50_3D(CB_ADDR), 1); - PUSH_DATA (push, (NV50_CB_AUX_RUNOUT_OFFSET << 6) | NV50_CB_AUX); + PUSH_DATA (push, (NV50_CB_AUX_RUNOUT_OFFSET << (8 - 2)) | NV50_CB_AUX); BEGIN_NI04(push, NV50_3D(CB_DATA(0)), 4); PUSH_DATAf(push, 0.0f); PUSH_DATAf(push, 0.0f); @@ -491,6 +492,8 @@ nv50_screen_init_hwctx(struct nv50_screen *screen) PUSH_DATAh(push, screen->uniforms->offset + (3 << 16) + NV50_CB_AUX_RUNOUT_OFFSET); PUSH_DATA (push, screen->uniforms->offset + (3 << 16) + NV50_CB_AUX_RUNOUT_OFFSET); + nv50_upload_ms_info(push); + /* max TIC (bits 4:8) & TSC bindings, per program type */ for (i = 0; i < 3; ++i) { BEGIN_NV04(push, NV50_3D(TEX_LIMITS(i)), 1); diff --git a/src/gallium/drivers/nouveau/nv50/nv50_tex.c b/src/gallium/drivers/nouveau/nv50/nv50_tex.c index 6663a61..e76876d 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_tex.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_tex.c @@ -143,7 +143,7 @@ nv50_create_texture_view(struct pipe_context *pipe, tic[2] |= NV50_TIC_2_LINEAR | NV50_TIC_2_TARGET_RECT; tic[3] = mt->level[0].pitch; tic[4] = mt->base.base.width0; - tic[5] = (1 << 16) | mt->base.base.height0; + tic[5] = (1 << 16) | (mt->base.base.height0); } tic[6] tic[7] = 0; @@ -283,6 +283,24 @@ nv50_validate_tic(struct nv50_context *nv50, int s) BEGIN_NV04(push, NV50_3D(BIND_TIC(s)), 1); PUSH_DATA (push, (i << 1) | 0); } + if (nv50->num_textures[s]) { + BEGIN_NV04(push, NV50_3D(CB_ADDR), 1); + PUSH_DATA (push, (NV50_CB_AUX_TEX_MS_OFFSET << (8 - 2)) | NV50_CB_AUX); + BEGIN_NI04(push, NV50_3D(CB_DATA(0)), nv50->num_textures[s] * 2); + for (i = 0; i < nv50->num_textures[s]; i++) { + struct nv50_tic_entry *tic = nv50_tic_entry(nv50->textures[s][i]); + struct nv50_miptree *res; + + if (!tic) { + PUSH_DATA (push, 0); + PUSH_DATA (push, 0); + continue; + } + res = nv50_miptree(tic->pipe.texture); + PUSH_DATA (push, res->ms_x); + PUSH_DATA (push, res->ms_y); + } + } nv50->state.num_textures[s] = nv50->num_textures[s]; return need_flush; @@ -352,3 +370,58 @@ void nv50_validate_samplers(struct nv50_context *nv50) PUSH_DATA (nv50->base.pushbuf, 0); } } + +/* There can be up to 4 different MS levels (1, 2, 4, 8). To simplify the + * shader logic, allow each one to take up 8 offsets. + */ +#define COMBINE(x, y) x, y +#define DUMMY 0, 0 +static const uint32_t msaa_sample_xy_offsets[] = { + /* MS1 */ + COMBINE(0, 0), + DUMMY, + DUMMY, + DUMMY, + DUMMY, + DUMMY, + DUMMY, + DUMMY, + + /* MS2 */ + COMBINE(0, 0), + COMBINE(1, 0), + DUMMY, + DUMMY, + DUMMY, + DUMMY, + DUMMY, + DUMMY, + + /* MS4 */ + COMBINE(0, 0), + COMBINE(1, 0), + COMBINE(0, 1), + COMBINE(1, 1), + DUMMY, + DUMMY, + DUMMY, + DUMMY, + + /* MS8 */ + COMBINE(0, 0), + COMBINE(1, 0), + COMBINE(0, 1), + COMBINE(1, 1), + COMBINE(2, 0), + COMBINE(3, 0), + COMBINE(2, 1), + COMBINE(3, 1), +}; + +void nv50_upload_ms_info(struct nouveau_pushbuf *push) +{ + BEGIN_NV04(push, NV50_3D(CB_ADDR), 1); + PUSH_DATA (push, (NV50_CB_AUX_MS_OFFSET << (8 - 2)) | NV50_CB_AUX); + BEGIN_NI04(push, NV50_3D(CB_DATA(0)), Elements(msaa_sample_xy_offsets)); + PUSH_DATAp(push, msaa_sample_xy_offsets, Elements(msaa_sample_xy_offsets)); +} -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [RFC PATCH 18/19] nv50: report glsl 1.50 now that gp tests pass
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- There are still some things that fail -- mostly gl_Layer stuff, and also using gl_PositionID without a gp. src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index 5732b21..123bdab 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -126,7 +126,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_SM3: return 1; case PIPE_CAP_GLSL_FEATURE_LEVEL: - return 140; + return 150; case PIPE_CAP_MAX_RENDER_TARGETS: return 8; case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS: -- 1.8.3.2
Ilia Mirkin
2014-Jan-13 19:19 UTC
[Nouveau] [RFC PATCH 19/19] nv50: enable seamless cube maps on all hw for OpenGL 3.2
Some of the hardware support is missing. The NVIDIA-provided driver, which claims 3.3 support fails a slew of the relevant tests as well. This allows us to expose geometry shaders without doing the additional work involved in supporting ARB_geometry_shader4. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index 123bdab..a108ece 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -111,7 +111,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE: return 65536; case PIPE_CAP_SEAMLESS_CUBE_MAP: - return nv50_screen(pscreen)->tesla->oclass >= NVA0_3D_CLASS; + return 1; //nv50_screen(pscreen)->tesla->oclass >= NVA0_3D_CLASS; case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE: return 0; case PIPE_CAP_CUBE_MAP_ARRAY: -- 1.8.3.2
Ilia Mirkin
2014-Jan-15 20:07 UTC
[Nouveau] [PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2
On Mon, Jan 13, 2014 at 2:19 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:> OK, so there's a bunch of stuff in here. The geometry stuff is based on the > work started by Bryan Cain and Christoph Bumiller. > > Patches 01-12: Add support for geometry shaders and fix related issues > Patches 13-14: Make it possible for fb clears to operate on texture attachments > with an explicit layer set (as is allowed in gl 3.2). > Patches 15-17: Make ARB_texture_multisample work > Patch 18: Enable GLSL 1.50 > Patch 19: Turn on ARB_seamless_cube_map irrespective of HW support so that all nv50 cards can get OpenGL 3.2 and geometry shaders (which > are otherwise unsupported) > > There are still a few geometry-related piglits that fail -- specifically: > primitive-id-no-gs > gl-3.2-layered-rendering-gl-layer*For those who care, these should now be fixed in my github repo as well. I won't repost the full series, as these are just incremental patches, but you can see them at: https://github.com/imirkin/mesa/commit/5eb4ad1115d0c4cb9f06a5ebb19501c1afc433bd https://github.com/imirkin/mesa/commit/fcd6a8661ba9ac19faf205a2025b001bb31146a8 The nv50-gs branch now also contains Christoph Bumiller's patches from late December which effectively allow us to enable GL3.3.> > I need to trace the blob to figure out exactly how to configure the HW for > those situations, but I suspect that the fixes will be fairly small and > self-contained. > > Note that there are also a bunch of EXT_framebuffer_multisample tests that are > failing, but that has nothing to do with these changes. There's something > wrong with the blit_3d function, at the very least to do with depth/stencil, > but also some color tests fail as well. > > These patches are available at https://github.com/imirkin/mesa.git nv50-gs > or https://github.com/imirkin/mesa/commits/nv50-gs for those who prefer a > web ui. > > Bryan Cain (2): > nv50/ir: delay calculation of indirect addresses > nv50: add support for geometry shaders > > Christoph Bumiller (1): > nv50/ir: fix PFETCH and add RDSV to get VSTRIDE for GPs > > Ilia Mirkin (16): > nv50: allow vert_count to be >255 > nv50/ir: disallow predicates on emit/restart ops > nv50/ir: disallow shader input propagation for gp > nv50/ir: comment out code to allow input/immed loads > nv50/ir: add support for gl_PrimitiveIDIn > nv50: properly set the PRIMITIVE_ID enable flag when it is a gp input. > nv50: VP_RESULT_MAP_SIZE has to be positive > nv50: GP_REG_ALLOC_RESULT must be positive > nv50: allocate an extra code bo to avoid dmesg spam > nv50: don't forget to also clear additional layers > nvc0: don't forget to also clear additional layers > nv50: add comments about CB_AUX contents > nv50: copy nvc0's get_sample_position implementation > nv50: add support for textureFetch'ing MS textures, > ARB_texture_multisample > nv50: report glsl 1.50 now that gp tests pass > nv50: enable seamless cube maps on all hw for OpenGL 3.2 > > src/gallium/drivers/nouveau/codegen/nv50_ir.h | 9 ++ > .../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 92 ++++++++++-- > .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 41 ++++-- > .../nouveau/codegen/nv50_ir_lowering_nv50.cpp | 164 ++++++++++++++++++++- > .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 7 + > .../drivers/nouveau/codegen/nv50_ir_print.cpp | 1 + > .../nouveau/codegen/nv50_ir_target_nv50.cpp | 18 ++- > src/gallium/drivers/nouveau/nv50/nv50_context.c | 46 ++++++ > src/gallium/drivers/nouveau/nv50/nv50_context.h | 17 +++ > src/gallium/drivers/nouveau/nv50/nv50_program.c | 30 +++- > src/gallium/drivers/nouveau/nv50/nv50_program.h | 2 +- > src/gallium/drivers/nouveau/nv50/nv50_screen.c | 23 ++- > .../drivers/nouveau/nv50/nv50_shader_state.c | 6 + > .../drivers/nouveau/nv50/nv50_state_validate.c | 2 +- > src/gallium/drivers/nouveau/nv50/nv50_surface.c | 25 ++-- > src/gallium/drivers/nouveau/nv50/nv50_tex.c | 77 +++++++++- > src/gallium/drivers/nouveau/nvc0/nvc0_surface.c | 22 ++- > 17 files changed, 526 insertions(+), 56 deletions(-) > > -- > 1.8.3.2 >
Reasonably Related Threads
- [PATCH] nv50: fix setting of texture ms info to be per-stage
- [PATCH 1/3] nv50: rework primid logic
- [Bug 69155] New: codegen/nv50_ir_emit_nv50.cpp:169:srcAddr8: Assertion `(offset <= 0x1fc || offset == 0x3fc) && !(offset & 0x3)' failed.
- [PATCH 00/12] Cherry-pick nv50/nvc0 patches from gallium-nine
- [PATCH try 2 1/2] gallium/nouveau: decouple nouveau_fence implementation from screen