thr3ads.net - Nouveau - [Nouveau] [PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2 [Jan 2014]

If this information is useful, please help other people find it:
Share via:

Ilia Mirkin

2014-Jan-13 19:19 UTC

[Nouveau] [PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2

OK, so there's a bunch of stuff in here. The geometry stuff is based on the
work started by Bryan Cain and Christoph Bumiller.

Patches 01-12: Add support for geometry shaders and fix related issues
Patches 13-14: Make it possible for fb clears to operate on texture attachments
               with an explicit layer set (as is allowed in gl 3.2).
Patches 15-17: Make ARB_texture_multisample work
Patch      18: Enable GLSL 1.50
Patch      19: Turn on ARB_seamless_cube_map irrespective of HW support so that 
all nv50 cards can get OpenGL 3.2 and geometry shaders (which
               are otherwise unsupported)

There are still a few geometry-related piglits that fail -- specifically:
  primitive-id-no-gs
  gl-3.2-layered-rendering-gl-layer*

I need to trace the blob to figure out exactly how to configure the HW for
those situations, but I suspect that the fixes will be fairly small and
self-contained.

Note that there are also a bunch of EXT_framebuffer_multisample tests that are
failing, but that has nothing to do with these changes. There's something
wrong with the blit_3d function, at the very least to do with depth/stencil,
but also some color tests fail as well.

These patches are available at https://github.com/imirkin/mesa.git nv50-gs
or https://github.com/imirkin/mesa/commits/nv50-gs for those who prefer a
web ui.

Bryan Cain (2):
  nv50/ir: delay calculation of indirect addresses
  nv50: add support for geometry shaders

Christoph Bumiller (1):
  nv50/ir: fix PFETCH and add RDSV to get VSTRIDE for GPs

Ilia Mirkin (16):
  nv50: allow vert_count to be >255
  nv50/ir: disallow predicates on emit/restart ops
  nv50/ir: disallow shader input propagation for gp
  nv50/ir: comment out code to allow input/immed loads
  nv50/ir: add support for gl_PrimitiveIDIn
  nv50: properly set the PRIMITIVE_ID enable flag when it is a gp input.
  nv50: VP_RESULT_MAP_SIZE has to be positive
  nv50: GP_REG_ALLOC_RESULT must be positive
  nv50: allocate an extra code bo to avoid dmesg spam
  nv50: don't forget to also clear additional layers
  nvc0: don't forget to also clear additional layers
  nv50: add comments about CB_AUX contents
  nv50: copy nvc0's get_sample_position implementation
  nv50: add support for textureFetch'ing MS textures,
    ARB_texture_multisample
  nv50: report glsl 1.50 now that gp tests pass
  nv50: enable seamless cube maps on all hw for OpenGL 3.2

 src/gallium/drivers/nouveau/codegen/nv50_ir.h      |   9 ++
 .../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp  |  92 ++++++++++--
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  |  41 ++++--
 .../nouveau/codegen/nv50_ir_lowering_nv50.cpp      | 164 ++++++++++++++++++++-
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp      |   7 +
 .../drivers/nouveau/codegen/nv50_ir_print.cpp      |   1 +
 .../nouveau/codegen/nv50_ir_target_nv50.cpp        |  18 ++-
 src/gallium/drivers/nouveau/nv50/nv50_context.c    |  46 ++++++
 src/gallium/drivers/nouveau/nv50/nv50_context.h    |  17 +++
 src/gallium/drivers/nouveau/nv50/nv50_program.c    |  30 +++-
 src/gallium/drivers/nouveau/nv50/nv50_program.h    |   2 +-
 src/gallium/drivers/nouveau/nv50/nv50_screen.c     |  23 ++-
 .../drivers/nouveau/nv50/nv50_shader_state.c       |   6 +
 .../drivers/nouveau/nv50/nv50_state_validate.c     |   2 +-
 src/gallium/drivers/nouveau/nv50/nv50_surface.c    |  25 ++--
 src/gallium/drivers/nouveau/nv50/nv50_tex.c        |  77 +++++++++-
 src/gallium/drivers/nouveau/nvc0/nvc0_surface.c    |  22 ++-
 17 files changed, 526 insertions(+), 56 deletions(-)

-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 01/19] nv50/ir: fix PFETCH and add RDSV to get VSTRIDE for GPs

From: Christoph Bumiller <e0425955 at student.tuwien.ac.at>

---
 src/gallium/drivers/nouveau/codegen/nv50_ir.h      |  1 +
 .../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp  | 62 ++++++++++++++++++++--
 .../drivers/nouveau/codegen/nv50_ir_print.cpp      |  1 +
 3 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index 68c76e5..6a001d3 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -366,6 +366,7 @@ enum SVSemantic
    SV_CLOCK,
    SV_LBASE,
    SV_SBASE,
+   SV_VERTEX_STRIDE,
    SV_UNDEFINED,
    SV_LAST
 };
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
index 3eca27d..cf82e2f 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
@@ -87,6 +87,7 @@ private:
    void emitLOAD(const Instruction *);
    void emitSTORE(const Instruction *);
    void emitMOV(const Instruction *);
+   void emitRDSV(const Instruction *);
    void emitNOP();
    void emitINTERP(const Instruction *);
    void emitPFETCH(const Instruction *);
@@ -772,6 +773,29 @@ CodeEmitterNV50::emitMOV(const Instruction *i)
    }
 }
 
+static inline uint8_t getSRegEncoding(const ValueRef &ref)
+{
+   switch (SDATA(ref).sv.sv) {
+   case SV_PHYSID:        return 0;
+   case SV_CLOCK:         return 1;
+   case SV_VERTEX_STRIDE: return 3;
+// case SV_PM_COUNTER:    return 4 + SDATA(ref).sv.index;
+   case SV_SAMPLE_INDEX:  return 8;
+   default:
+      assert(!"no sreg for system value");
+      return 0;
+   }
+}
+
+void
+CodeEmitterNV50::emitRDSV(const Instruction *i)
+{
+   code[0] = 0x00000001;
+   code[1] = 0x60000000 | (getSRegEncoding(i->src(0)) << 14);
+   defId(i->def(0), 2);
+   emitFlagsRd(i);
+}
+
 void
 CodeEmitterNV50::emitNOP()
 {
@@ -794,15 +818,40 @@ CodeEmitterNV50::emitQUADOP(const Instruction *i, uint8_t
lane, uint8_t quOp)
       srcId(i->src(0), 32 + 14);
 }
 
+/* NOTE: This returns the base address of a vertex inside the primitive.
+ * src0 is an immediate, the index (not offset) of the vertex
+ * inside the primitive. XXX: signed or unsigned ?
+ * src1 (may be NULL) should use whatever units the hardware requires
+ * (on nv50 this is bytes, so, relative index * 4; signed 16 bit value).
+ */
 void
 CodeEmitterNV50::emitPFETCH(const Instruction *i)
 {
-   code[0] = 0x11800001;
-   code[1] = 0x04200000 | (0xf << 14);
+   const uint32_t prim = i->src(0).get()->reg.data.u32;
+   assert(prim <= 127);
 
-   defId(i->def(0), 2);
-   srcAddr8(i->src(0), 9);
-   setAReg16(i, 0);
+   if (i->def(0).getFile() == FILE_ADDRESS) {
+      // shl $aX a[] 0
+      code[0] = 0x00000001 | ((DDATA(i->def(0)).id + 1) << 2);
+      code[1] = 0xc0200000;
+      code[0] |= prim << 9;
+      assert(!i->srcExists(1));
+   } else
+   if (i->srcExists(1)) {
+      // ld b32 $rX a[$aX+base]
+      code[0] = 0x00000001;
+      code[1] = 0x04200000 | (0xf << 14);
+      defId(i->def(0), 2);
+      code[0] |= prim << 9;
+      setARegBits(SDATA(i->src(1)).id + 1);
+   } else {
+      // mov b32 $rX a[]
+      code[0] = 0x10000001;
+      code[1] = 0x04200000 | (0xf << 14);
+      defId(i->def(0), 2);
+      code[0] |= prim << 9;
+   }
+   emitFlagsRd(i);
 }
 
 void
@@ -1620,6 +1669,9 @@ CodeEmitterNV50::emitInstruction(Instruction *insn)
    case OP_PFETCH:
       emitPFETCH(insn);
       break;
+   case OP_RDSV:
+      emitRDSV(insn);
+      break;
    case OP_LINTERP:
    case OP_PINTERP:
       emitINTERP(insn);
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
index ee39b3c..ae42d03 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
@@ -265,6 +265,7 @@ static const char *SemanticStr[SV_LAST + 1]    
"CLOCK",
    "LBASE",
    "SBASE",
+   "VERTEX_STRIDE",
    "?",
    "(INVALID)"
 };
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 02/19] nv50/ir: delay calculation of indirect addresses

From: Bryan Cain <bryancain3 at gmail.com>

Instead of emitting an SHL 4 io an address register on the TGSI ARL and UARL
instructions, emit the shift when the loaded address is actually used.  This
is necessary because input vertex and attribute indices in geometry shaders on
nv50 need to be shifted left by 2 instead of 4.

Signed-off-by: Bryan Cain <bryancain3 at gmail.com>
[calim: various updates to the indirect address logic]
Signed-off-by: Christoph Bumiller <e0425955 at student.tuwien.ac.at>
[imirkin: remove OP_MAD change that calim made, add OP_RESTART handling
          same as OP_EMIT for code flow analysis]
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  |  38 ++++++--
 .../nouveau/codegen/nv50_ir_lowering_nv50.cpp      | 104 ++++++++++++++++++++-
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp      |   7 ++
 3 files changed, 136 insertions(+), 13 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 49a45f8..3c790cf 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -1126,6 +1126,7 @@ private:
       ValueMap values;
    };
 
+   Value *shiftAddress(Value *);
    Value *getVertexBase(int s);
    DataArray *getArrayForFile(unsigned file, int idx);
    Value *fetchSrc(int s, int c);
@@ -1344,7 +1345,8 @@ Converter::getVertexBase(int s)
       if (tgsi.getSrc(s).isIndirect(1))
          rel = fetchSrc(tgsi.getSrc(s).getIndirect(1), 0, NULL);
       vtxBaseValid |= 1 << s;
-      vtxBase[s] = mkOp2v(OP_PFETCH, TYPE_U32, getSSA(), mkImm(index), rel);
+      vtxBase[s] = mkOp2v(OP_PFETCH, TYPE_U32, getSSA(4, FILE_ADDRESS),
+                          mkImm(index), rel);
    }
    return vtxBase[s];
 }
@@ -1403,6 +1405,14 @@ Converter::getArrayForFile(unsigned file, int idx)
 }
 
 Value *
+Converter::shiftAddress(Value *index)
+{
+   if (!index)
+      return NULL;
+   return mkOp2v(OP_SHL, TYPE_U32, getSSA(4, FILE_ADDRESS), index, mkImm(4));
+}
+
+Value *
 Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr)
 {
    const int idx2d = src.is2D() ? src.getIndex(1) : 0;
@@ -1414,7 +1424,7 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister src,
int c, Value *ptr)
       assert(!ptr);
       return loadImm(NULL, info->immd.data[idx * 4 + swz]);
    case TGSI_FILE_CONSTANT:
-      return mkLoadv(TYPE_U32, srcToSym(src, c), ptr);
+      return mkLoadv(TYPE_U32, srcToSym(src, c), shiftAddress(ptr));
    case TGSI_FILE_INPUT:
       if (prog->getType() == Program::TYPE_FRAGMENT) {
          // don't load masked inputs, won't be assigned a slot
@@ -1422,9 +1432,17 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister src,
int c, Value *ptr)
             return loadImm(NULL, swz == TGSI_SWIZZLE_W ? 1.0f : 0.0f);
 	 if (!ptr && info->in[idx].sn == TGSI_SEMANTIC_FACE)
             return mkOp1v(OP_RDSV, TYPE_F32, getSSA(), mkSysVal(SV_FACE, 0));
-         return interpolate(src, c, ptr);
+         return interpolate(src, c, shiftAddress(ptr));
+      } else
+      if (ptr && prog->getType() == Program::TYPE_GEOMETRY) {
+         // XXX: This is going to be a problem with scalar arrays, i.e. when
+         // we cannot assume that the address is given in units of vec4.
+         //
+         // nv50 and nvc0 need different things here, so let the lowering
+         // passes decide what to do with the address
+         return mkLoadv(TYPE_U32, srcToSym(src, c), ptr);
       }
-      return mkLoadv(TYPE_U32, srcToSym(src, c), ptr);
+      return mkLoadv(TYPE_U32, srcToSym(src, c), shiftAddress(ptr));
    case TGSI_FILE_OUTPUT:
       assert(!"load from output file");
       return NULL;
@@ -1433,7 +1451,7 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister src,
int c, Value *ptr)
       return mkOp1v(OP_RDSV, TYPE_U32, getSSA(), srcToSym(src, c));
    default:
       return getArrayForFile(src.getFile(), idx2d)->load(
-         sub.cur->values, idx, swz, ptr);
+         sub.cur->values, idx, swz, shiftAddress(ptr));
    }
 }
 
@@ -1476,8 +1494,9 @@ Converter::storeDst(int d, int c, Value *val)
       break;
    }
 
-   Value *ptr = dst.isIndirect(0) ?
-      fetchSrc(dst.getIndirect(0), 0, NULL) : NULL;
+   Value *ptr = NULL;
+   if (dst.isIndirect(0))
+      ptr = shiftAddress(fetchSrc(dst.getIndirect(0), 0, NULL));
 
    if (info->io.genUserClip > 0 &&
        dst.getFile() == TGSI_FILE_OUTPUT &&
@@ -2179,12 +2198,11 @@ Converter::handleInstruction(const struct
tgsi_full_instruction *insn)
       FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) {
          src0 = fetchSrc(0, c);
          mkCvt(OP_CVT, TYPE_S32, dst0[c], TYPE_F32, src0)->rnd = ROUND_M;
-         mkOp2(OP_SHL, TYPE_U32, dst0[c], dst0[c], mkImm(4));
       }
       break;
    case TGSI_OPCODE_UARL:
       FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi)
-         mkOp2(OP_SHL, TYPE_U32, dst0[c], fetchSrc(0, c), mkImm(4));
+         mkOp1(OP_MOV, TYPE_U32, dst0[c], fetchSrc(0, c));
       break;
    case TGSI_OPCODE_EX2:
    case TGSI_OPCODE_LG2:
@@ -2721,7 +2739,7 @@ Converter::Converter(Program *ir, const tgsi::Source
*code) : BuildUtil(ir),
 
    tData.setup(TGSI_FILE_TEMPORARY, 0, 0, tSize, 4, 4, tFile, 0);
    pData.setup(TGSI_FILE_PREDICATE, 0, 0, pSize, 4, 4, FILE_PREDICATE, 0);
-   aData.setup(TGSI_FILE_ADDRESS, 0, 0, aSize, 4, 4, FILE_ADDRESS, 0);
+   aData.setup(TGSI_FILE_ADDRESS, 0, 0, aSize, 4, 4, FILE_GPR, 0);
    oData.setup(TGSI_FILE_OUTPUT, 0, 0, oSize, 4, 4, FILE_GPR, 0);
 
    zero = mkImm((uint32_t)0);
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
index 07f3a21..1d13aea 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
@@ -278,10 +278,24 @@ NV50LegalizeSSA::propagateWriteToOutput(Instruction *st)
    // TODO: move exports (if beneficial) in common opt pass
    if (di->isPseudo() || isTextureOp(di->op) || di->defCount(0xff,
true) > 1)
       return;
+
    for (int s = 0; di->srcExists(s); ++s)
       if (di->src(s).getFile() == FILE_IMMEDIATE)
          return;
 
+   if (prog->getType() == Program::TYPE_GEOMETRY) {
+      // Only propagate output writes in geometry shaders when we can be sure
+      // that we are propagating to the same output vertex.
+      if (di->bb != st->bb)
+         return;
+      Instruction *i;
+      for (i = di; i != st; i = i->next) {
+         if (i->op == OP_EMIT || i->op == OP_RESTART)
+            return;
+      }
+      assert(i); // st after di
+   }
+
    // We cannot set defs to non-lvalues before register allocation, so
    // save & remove (to save registers) the exports and replace later.
    outWrites->push_back(st);
@@ -307,6 +321,9 @@ NV50LegalizeSSA::handleAddrDef(Instruction *i)
 
    i->getDef(0)->reg.size = 2; // $aX are only 16 bit
 
+   // PFETCH can always write to $a
+   if (i->op == OP_PFETCH)
+      return;
    // only ADDR <- SHL(GPR, IMM) and ADDR <- ADD(ADDR, IMM) are valid
    if (i->srcExists(1) && i->src(1).getFile() == FILE_IMMEDIATE)
{
       if (i->op == OP_SHL && i->src(0).getFile() == FILE_GPR)
@@ -473,6 +490,9 @@ NV50LegalizeSSA::visit(BasicBlock *bb)
    for (insn = bb->getEntry(); insn; insn = next) {
       next = insn->next;
 
+      if (insn->defExists(0) && insn->getDef(0)->reg.file ==
FILE_ADDRESS)
+         handleAddrDef(insn);
+
       switch (insn->op) {
       case OP_EXPORT:
          if (outWrites)
@@ -491,9 +511,6 @@ NV50LegalizeSSA::visit(BasicBlock *bb)
       default:
          break;
       }
-
-      if (insn->defExists(0) && insn->getDef(0)->reg.file ==
FILE_ADDRESS)
-         handleAddrDef(insn);
    }
    return true;
 }
@@ -510,7 +527,9 @@ private:
    bool handleRDSV(Instruction *);
    bool handleWRSV(Instruction *);
 
+   bool handlePFETCH(Instruction *);
    bool handleEXPORT(Instruction *);
+   bool handleLOAD(Instruction *);
 
    bool handleDIV(Instruction *);
    bool handleSQRT(Instruction *);
@@ -1002,6 +1021,81 @@ NV50LoweringPreSSA::handleEXPORT(Instruction *i)
    return true;
 }
 
+// Handle indirect addressing in geometry shaders:
+//
+// ld $r0 a[$a1][$a2+k] ->
+// ld $r0 a[($a1 + $a2 * $vstride) + k], where k *= $vstride is implicit
+//
+bool
+NV50LoweringPreSSA::handleLOAD(Instruction *i)
+{
+   ValueRef src = i->src(0);
+
+   if (src.isIndirect(1)) {
+      assert(prog->getType() == Program::TYPE_GEOMETRY);
+      Value *addr = i->getIndirect(0, 1);
+
+      if (src.isIndirect(0)) {
+         // base address is in an address register, so move to a GPR
+         Value *base = bld.getScratch();
+         bld.mkMov(base, addr);
+
+         Symbol *sv = bld.mkSysVal(SV_VERTEX_STRIDE, 0);
+         Value *vstride = bld.mkOp1v(OP_RDSV, TYPE_U32, bld.getSSA(), sv);
+         Value *attrib = bld.mkOp2v(OP_SHL, TYPE_U32, bld.getSSA(),
+                                    i->getIndirect(0, 0), bld.mkImm(2));
+
+         // Calculate final address: addr = base + attr*vstride; use 16-bit
+         // multiplication since 32-bit would be lowered to multiple
+         // instructions, and we only need the low 16 bits of the result
+         Value *a[2], *b[2];
+         bld.mkSplit(a, 2, attrib);
+         bld.mkSplit(b, 2, vstride);
+         Value *sum = bld.mkOp3v(OP_MAD, TYPE_U16, bld.getSSA(), a[0], b[0],
+                                 base);
+
+         // move address from GPR into an address register
+         addr = bld.getSSA(2, FILE_ADDRESS);
+         bld.mkMov(addr, sum);
+      }
+
+      i->setIndirect(0, 1, NULL);
+      i->setIndirect(0, 0, addr);
+   }
+
+   return true;
+}
+
+bool
+NV50LoweringPreSSA::handlePFETCH(Instruction *i)
+{
+   assert(prog->getType() == Program::TYPE_GEOMETRY);
+
+   // NOTE: cannot use getImmediate here, not in SSA form yet, move to
+   // later phase if that assertion ever triggers:
+
+   ImmediateValue *imm = i->getSrc(0)->asImm();
+   assert(imm);
+
+   assert(imm->reg.data.u32 <= 127); // TODO: use address reg if that
happens
+
+   if (i->srcExists(1)) {
+      // indirect addressing of vertex in primitive space
+
+      LValue *val = bld.getScratch();
+      Value *ptr = bld.getSSA(2, FILE_ADDRESS);
+      bld.mkOp2v(OP_SHL, TYPE_U32, ptr, i->getSrc(1), bld.mkImm(2));
+      bld.mkOp2v(OP_PFETCH, TYPE_U32, val, imm, ptr);
+
+      // NOTE: PFETCH directly to an $aX only works with direct addressing
+      i->op = OP_SHL;
+      i->setSrc(0, val);
+      i->setSrc(1, bld.mkImm(0));
+   }
+
+   return true;
+}
+
 // Set flags according to predicate and make the instruction read $cX.
 void
 NV50LoweringPreSSA::checkPredicate(Instruction *insn)
@@ -1060,6 +1154,8 @@ NV50LoweringPreSSA::visit(Instruction *i)
       return handleSQRT(i);
    case OP_EXPORT:
       return handleEXPORT(i);
+   case OP_LOAD:
+      return handleLOAD(i);
    case OP_RDSV:
       return handleRDSV(i);
    case OP_WRSV:
@@ -1070,6 +1166,8 @@ NV50LoweringPreSSA::visit(Instruction *i)
       return handlePRECONT(i);
    case OP_CONT:
       return handleCONT(i);
+   case OP_PFETCH:
+      return handlePFETCH(i);
    default:
       break;
    }
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index a838004..3840f75 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -1548,6 +1548,13 @@ NVC0LoweringPass::visit(Instruction *i)
          if (prog->getType() == Program::TYPE_COMPUTE) {
             i->getSrc(0)->reg.file = FILE_MEMORY_CONST;
             i->getSrc(0)->reg.fileIndex = 0;
+         } else
+         if (prog->getType() == Program::TYPE_GEOMETRY &&
+             i->src(0).isIndirect(0)) {
+            // XXX: this assumes vec4 units
+            Value *ptr = bld.mkOp2v(OP_SHL, TYPE_U32, bld.getSSA(),
+                                    i->getIndirect(0, 0), bld.mkImm(4));
+            i->setIndirect(0, 0, ptr);
          } else {
             i->op = OP_VFETCH;
             assert(prog->getType() != Program::TYPE_FRAGMENT); // INTERP
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 03/19] nv50: add support for geometry shaders

From: Bryan Cain <bryancain3 at gmail.com>

Layer output probably doesn't work yet, but other than that everything seems
to be working.

Signed-off-by: Bryan Cain <bryancain3 at gmail.com>
[calim: fix up minor bugs, code formatting]
Signed-off-by: Christoph Bumiller <e0425955 at student.tuwien.ac.at>
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 .../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp  | 25 ++++++++++++++++------
 src/gallium/drivers/nouveau/nv50/nv50_program.c    | 16 ++++++++++++++
 .../drivers/nouveau/nv50/nv50_shader_state.c       |  2 ++
 src/gallium/drivers/nouveau/nv50/nv50_tex.c        |  2 ++
 4 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
index cf82e2f..f4db2ed 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
@@ -493,7 +493,12 @@ CodeEmitterNV50::emitForm_MAD(const Instruction *i)
    setSrc(i, 1, 1);
    setSrc(i, 2, 2);
 
-   setAReg16(i, 1);
+   if (i->getIndirect(0, 0)) {
+      assert(!i->getIndirect(1, 0));
+      setAReg16(i, 0);
+   } else {
+      setAReg16(i, 1);
+   }
 }
 
 // like default form, but 2nd source in slot 2, and no 3rd source
@@ -512,7 +517,12 @@ CodeEmitterNV50::emitForm_ADD(const Instruction *i)
    setSrc(i, 0, 0);
    setSrc(i, 1, 2);
 
-   setAReg16(i, 1);
+   if (i->getIndirect(0, 0)) {
+      assert(!i->getIndirect(1, 0));
+      setAReg16(i, 0);
+   } else {
+      setAReg16(i, 1);
+   }
 }
 
 // default short form (rr, ar, rc, gr)
@@ -602,8 +612,11 @@ CodeEmitterNV50::emitLOAD(const Instruction *i)
 
    switch (sf) {
    case FILE_SHADER_INPUT:
-      // use 'mov' where we can
-      code[0] = i->src(0).isIndirect(0) ? 0x00000001 : 0x10000001;
+      if (progType == Program::TYPE_GEOMETRY)
+         code[0] = 0x11800001;
+      else
+         // use 'mov' where we can
+         code[0] = i->src(0).isIndirect(0) ? 0x00000001 : 0x10000001;
       code[1] = 0x00200000 | (i->lanes << 14);
       if (typeSizeof(i->dType) == 4)
          code[1] |= 0x04000000;
@@ -1399,8 +1412,8 @@ CodeEmitterNV50::emitShift(const Instruction *i)
 void
 CodeEmitterNV50::emitOUT(const Instruction *i)
 {
-   code[0] = (i->op == OP_EMIT) ? 0xf0000200 : 0xf0000400;
-   code[1] = 0xc0000001;
+   code[0] = (i->op == OP_EMIT) ? 0xf0000201 : 0xf0000401;
+   code[1] = 0xc0000000;
 
    emitFlagsRd(i);
 }
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index 97857d7..78a12e3 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -358,6 +358,22 @@ nv50_program_translate(struct nv50_program *prog, uint16_t
chipset)
       }
       if (info->prop.fp.usesDiscard)
          prog->fp.flags[0] |= NV50_3D_FP_CONTROL_USES_KIL;
+   } else
+   if (prog->type == PIPE_SHADER_GEOMETRY) {
+      switch (info->prop.gp.outputPrim) {
+      case PIPE_PRIM_LINE_STRIP:
+         prog->gp.prim_type = NV50_3D_GP_OUTPUT_PRIMITIVE_TYPE_LINE_STRIP;
+         break;
+      case PIPE_PRIM_TRIANGLE_STRIP:
+         prog->gp.prim_type =
NV50_3D_GP_OUTPUT_PRIMITIVE_TYPE_TRIANGLE_STRIP;
+         break;
+      case PIPE_PRIM_POINTS:
+      default:
+         assert(info->prop.gp.outputPrim == PIPE_PRIM_POINTS);
+         prog->gp.prim_type = NV50_3D_GP_OUTPUT_PRIMITIVE_TYPE_POINTS;
+         break;
+      }
+      prog->gp.vert_count = info->prop.gp.maxVertices;
    }
 
    if (prog->pipe.stream_output.num_outputs)
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
index 9144fc4..ba4f592 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
@@ -193,6 +193,8 @@ nv50_gmtyprog_validate(struct nv50_context *nv50)
    struct nv50_program *gp = nv50->gmtyprog;
 
    if (gp) {
+      if (!nv50_program_validate(nv50, gp))
+         return;
       BEGIN_NV04(push, NV50_3D(GP_REG_ALLOC_TEMP), 1);
       PUSH_DATA (push, gp->max_gpr);
       BEGIN_NV04(push, NV50_3D(GP_REG_ALLOC_RESULT), 1);
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_tex.c
b/src/gallium/drivers/nouveau/nv50/nv50_tex.c
index f7284fa..6663a61 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_tex.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_tex.c
@@ -293,6 +293,7 @@ void nv50_validate_textures(struct nv50_context *nv50)
    boolean need_flush;
 
    need_flush  = nv50_validate_tic(nv50, 0);
+   need_flush |= nv50_validate_tic(nv50, 1);
    need_flush |= nv50_validate_tic(nv50, 2);
 
    if (need_flush) {
@@ -343,6 +344,7 @@ void nv50_validate_samplers(struct nv50_context *nv50)
    boolean need_flush;
 
    need_flush  = nv50_validate_tsc(nv50, 0);
+   need_flush |= nv50_validate_tsc(nv50, 1);
    need_flush |= nv50_validate_tsc(nv50, 2);
 
    if (need_flush) {
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 04/19] nv50: allow vert_count to be >255

---
 src/gallium/drivers/nouveau/nv50/nv50_program.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.h
b/src/gallium/drivers/nouveau/nv50/nv50_program.h
index 13b9516..f63352f 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.h
@@ -88,7 +88,7 @@ struct nv50_program {
 
    struct {
       ubyte primid; /* primitive id output register */
-      uint8_t vert_count;
+      uint32_t vert_count;
       uint8_t prim_type; /* point, line strip or tri strip */
    } gp;
 
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 05/19] nv50/ir: disallow predicates on emit/restart ops

---
 src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
index ade9be0..52257a8 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
@@ -130,7 +130,8 @@ void TargetNV50::initOpInfo()
    };
    static const operation noPredList[]     {
-      OP_CALL, OP_PREBREAK, OP_PRERET, OP_QUADON, OP_QUADPOP, OP_JOINAT
+      OP_CALL, OP_PREBREAK, OP_PRERET, OP_QUADON, OP_QUADPOP, OP_JOINAT,
+      OP_EMIT, OP_RESTART
    };
 
    for (i = 0; i < DATA_FILE_COUNT; ++i)
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 06/19] nv50/ir: disallow shader input propagation for gp

For some reason, shader input accesses don't work correctly in non-ld
instructions. Disallow those loads from being propagated.

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---

I'm not particularly happy with this patch. Some investigation needs to
happen
as to what's going on here. NVIDIA's shaders include p[] accesses in
various
instructions just fine. Perhaps this is just masking some other bug. However
this works for now for all the piglit tests in the repo.

 src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
index 52257a8..18fa069 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
@@ -297,14 +297,19 @@ TargetNV50::insnCanLoad(const Instruction *i, int s,
 
    switch (mode) {
    case 0x00:
-   case 0x01:
    case 0x03:
    case 0x08:
-   case 0x09:
    case 0x0c:
    case 0x20:
    case 0x21:
       break;
+   case 0x01:
+   case 0x09:
+      // TODO: Figure out why a[] accesses can't be propagated into non-ld
+      // instructions. Something to do with vstride maybe?
+      if (ld->bb->getProgram()->getType() == Program::TYPE_GEOMETRY)
+         return false;
+      break;
    case 0x0d:
       if (ld->bb->getProgram()->getType() != Program::TYPE_GEOMETRY)
          return false;
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 07/19] nv50/ir: comment out code to allow input/immed loads

This code was missing a break which made it ineffective. But since
shader input loads have been disallowed, define the code out.

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
index 18fa069..a84a54a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
@@ -310,9 +310,12 @@ TargetNV50::insnCanLoad(const Instruction *i, int s,
       if (ld->bb->getProgram()->getType() == Program::TYPE_GEOMETRY)
          return false;
       break;
+#if 0
    case 0x0d:
       if (ld->bb->getProgram()->getType() != Program::TYPE_GEOMETRY)
          return false;
+      break;
+#endif
    default:
       return false;
    }
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 08/19] nv50/ir: add support for gl_PrimitiveIDIn

Note that the primitive id is stored in a[0x18], while usually the
geometry instructions are of the form a[$a1 + 0x4] which gets mapped to
p[] space. We need to avoid the change from a[] to p[] here, so it's
keyed on whether the access is indirect or not.

Note that there's also a use-case for accessing e.g. a[$r1], however
that's not supported for now. (Could be added by checking the register
file of the indirect parameter.)

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp   | 6 +++---
 src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp   | 7 +++++--
 src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 3 +++
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
index f4db2ed..a6ed4b0 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
@@ -381,7 +381,7 @@ CodeEmitterNV50::setSrcFileBits(const Instruction *i, int
enc)
    case 0x00: // rrr
       break;
    case 0x01: // arr/grr
-      if (progType == Program::TYPE_GEOMETRY) {
+      if (progType == Program::TYPE_GEOMETRY &&
i->src(0).isIndirect(0)) {
          code[0] |= 0x01800000;
          if (enc == NV50_OP_ENC_LONG || enc == NV50_OP_ENC_LONG_ALT)
             code[1] |= 0x00200000;
@@ -407,7 +407,7 @@ CodeEmitterNV50::setSrcFileBits(const Instruction *i, int
enc)
       code[1] |= (i->getSrc(1)->reg.fileIndex << 22);
       break;
    case 0x09: // acr/gcr
-      if (progType == Program::TYPE_GEOMETRY) {
+      if (progType == Program::TYPE_GEOMETRY &&
i->src(0).isIndirect(0)) {
          code[0] |= 0x01800000;
       } else {
          code[0] |= (enc == NV50_OP_ENC_LONG_ALT) ? 0x01000000 : 0x00800000;
@@ -612,7 +612,7 @@ CodeEmitterNV50::emitLOAD(const Instruction *i)
 
    switch (sf) {
    case FILE_SHADER_INPUT:
-      if (progType == Program::TYPE_GEOMETRY)
+      if (progType == Program::TYPE_GEOMETRY &&
i->src(0).isIndirect(0))
          code[0] = 0x11800001;
       else
          // use 'mov' where we can
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 3c790cf..321410e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -1434,13 +1434,16 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister src,
int c, Value *ptr)
             return mkOp1v(OP_RDSV, TYPE_F32, getSSA(), mkSysVal(SV_FACE, 0));
          return interpolate(src, c, shiftAddress(ptr));
       } else
-      if (ptr && prog->getType() == Program::TYPE_GEOMETRY) {
+      if (prog->getType() == Program::TYPE_GEOMETRY) {
+         if (!ptr && info->in[idx].sn == TGSI_SEMANTIC_PRIMID)
+            return mkOp1v(OP_RDSV, TYPE_U32, getSSA(),
mkSysVal(SV_PRIMITIVE_ID, 0));
          // XXX: This is going to be a problem with scalar arrays, i.e. when
          // we cannot assume that the address is given in units of vec4.
          //
          // nv50 and nvc0 need different things here, so let the lowering
          // passes decide what to do with the address
-         return mkLoadv(TYPE_U32, srcToSym(src, c), ptr);
+         if (ptr)
+            return mkLoadv(TYPE_U32, srcToSym(src, c), ptr);
       }
       return mkLoadv(TYPE_U32, srcToSym(src, c), shiftAddress(ptr));
    case TGSI_FILE_OUTPUT:
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
index a84a54a..1925c09 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
@@ -238,6 +238,9 @@ TargetNV50::getSVAddress(DataFile shaderFile, const Symbol
*sym) const
             addr += 4;
       return addr;
    }
+   case SV_PRIMITIVE_ID:
+      return shaderFile == FILE_SHADER_INPUT ? 0x18 :
+         sysvalLocation[sym->reg.data.sv.sv];
    case SV_NCTAID:
       return 0x8 + 2 * sym->reg.data.sv.index;
    case SV_CTAID:
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 09/19] nv50: properly set the PRIMITIVE_ID enable flag when it is a gp input.

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 src/gallium/drivers/nouveau/nv50/nv50_program.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index 78a12e3..f46f240 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -52,6 +52,9 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info *info)
       for (c = 0; c < 4; ++c)
          if (info->in[i].mask & (1 << c))
             info->in[i].slot[c] = n++;
+
+      if (info->in[i].sn == TGSI_SEMANTIC_PRIMID)
+         prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_PRIMITIVE_ID;
    }
    prog->in_nr = info->numInputs;
 
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 10/19] nv50: VP_RESULT_MAP_SIZE has to be positive

Make sure that we never try to use a 0-sized map. This can happen when
using a gp, so add a dummy mapping when computing vp_gp_mapping in that
case.

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 src/gallium/drivers/nouveau/nv50/nv50_shader_state.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
index ba4f592..265ef20 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
@@ -457,6 +457,7 @@ nv50_fp_linkage_validate(struct nv50_context *nv50)
       BEGIN_NV04(push, NV50_3D(SEMANTIC_PRIM_ID), 1);
       PUSH_DATA (push, primid);
 
+      assert(m > 0);
       BEGIN_NV04(push, NV50_3D(VP_RESULT_MAP_SIZE), 1);
       PUSH_DATA (push, m);
       BEGIN_NV04(push, NV50_3D(VP_RESULT_MAP(0)), n);
@@ -516,6 +517,8 @@ nv50_vp_gp_mapping(uint8_t *map, int m,
          oid += mv & 1;
       }
    }
+   if (!m)
+      map[m++] = 0;
    return m;
 }
 
@@ -540,6 +543,7 @@ nv50_gp_linkage_validate(struct nv50_context *nv50)
    BEGIN_NV04(push, NV50_3D(VP_GP_BUILTIN_ATTR_EN), 1);
    PUSH_DATA (push, vp->vp.attrs[2] | gp->vp.attrs[2]);
 
+   assert(m > 0);
    BEGIN_NV04(push, NV50_3D(VP_RESULT_MAP_SIZE), 1);
    PUSH_DATA (push, m);
    BEGIN_NV04(push, NV50_3D(VP_RESULT_MAP(0)), n);
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 11/19] nv50: GP_REG_ALLOC_RESULT must be positive

Set max_out to 1 when there are no outputs.

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 src/gallium/drivers/nouveau/nv50/nv50_program.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index f46f240..813795f 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -118,6 +118,8 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info *info)
    }
    prog->out_nr = info->numOutputs;
    prog->max_out = n;
+   if (!prog->max_out)
+      prog->max_out = 1;
 
    if (prog->vp.psiz < info->numOutputs)
       prog->vp.psiz = prog->out[prog->vp.psiz].hw;
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 12/19] nv50: allocate an extra code bo to avoid dmesg spam

Each code BO is a heap that allocates at the end first, and so GPs are
allocated at the very end of the allocated space. When executing, we see
PAGE_NOT_PRESENT errors for the next page. Just over-allocate to make
sure that there's something there.

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 src/gallium/drivers/nouveau/nv50/nv50_screen.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 43e0f50..82b0207 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -739,8 +739,12 @@ nv50_screen_create(struct nouveau_device *dev)
       goto fail;
    }
 
+   /* This over-allocates by a whole code BO. The GP, which would execute at
+    * the end of the last page, would trigger faults. The going theory is that
+    * it prefetches up to a certain amount. This avoids dmesg spam.
+    */
    ret = nouveau_bo_new(dev, NOUVEAU_BO_VRAM, 1 << 16,
-                        3 << NV50_CODE_BO_SIZE_LOG2, NULL,
&screen->code);
+                        4 << NV50_CODE_BO_SIZE_LOG2, NULL,
&screen->code);
    if (ret) {
       NOUVEAU_ERR("Failed to allocate code bo: %d\n", ret);
       goto fail;
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 13/19] nv50: don't forget to also clear additional layers

Fixes most of the tests/spec/gl-3.2/layered-rendering/* piglits.

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 src/gallium/drivers/nouveau/nv50/nv50_surface.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_surface.c
b/src/gallium/drivers/nouveau/nv50/nv50_surface.c
index 358f57a..16a4369 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_surface.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_surface.c
@@ -395,7 +395,7 @@ nv50_clear(struct pipe_context *pipe, unsigned buffers,
    struct nv50_context *nv50 = nv50_context(pipe);
    struct nouveau_pushbuf *push = nv50->base.pushbuf;
    struct pipe_framebuffer_state *fb = &nv50->framebuffer;
-   unsigned i;
+   unsigned i, j;
    uint32_t mode = 0;
 
    /* don't need NEW_BLEND, COLOR_MASK doesn't affect CLEAR_BUFFERS */
@@ -408,9 +408,6 @@ nv50_clear(struct pipe_context *pipe, unsigned buffers,
       PUSH_DATAf(push, color->f[1]);
       PUSH_DATAf(push, color->f[2]);
       PUSH_DATAf(push, color->f[3]);
-      mode -         NV50_3D_CLEAR_BUFFERS_R | NV50_3D_CLEAR_BUFFERS_G |
-         NV50_3D_CLEAR_BUFFERS_B | NV50_3D_CLEAR_BUFFERS_A;
    }
 
    if (buffers & PIPE_CLEAR_DEPTH) {
@@ -425,12 +422,22 @@ nv50_clear(struct pipe_context *pipe, unsigned buffers,
       mode |= NV50_3D_CLEAR_BUFFERS_S;
    }
 
-   BEGIN_NV04(push, NV50_3D(CLEAR_BUFFERS), 1);
-   PUSH_DATA (push, mode);
+   if ((buffers & PIPE_CLEAR_DEPTH) || (buffers & PIPE_CLEAR_STENCIL))
{
+      for (j = fb->zsbuf->u.tex.first_layer; j <=
fb->zsbuf->u.tex.last_layer; j++) {
+         BEGIN_NV04(push, NV50_3D(CLEAR_BUFFERS), 1);
+         PUSH_DATA(push, mode | (j <<
NV50_3D_CLEAR_BUFFERS_LAYER__SHIFT));
+      }
+   }
 
-   for (i = 1; i < fb->nr_cbufs; i++) {
-      BEGIN_NV04(push, NV50_3D(CLEAR_BUFFERS), 1);
-      PUSH_DATA (push, (i << 6) | 0x3c);
+   if (buffers & PIPE_CLEAR_COLOR) {
+      for (i = 0; i < fb->nr_cbufs; i++) {
+         struct pipe_surface *sf = fb->cbufs[i];
+         for (j = sf->u.tex.first_layer; j <= sf->u.tex.last_layer;
j++) {
+            BEGIN_NV04(push, NV50_3D(CLEAR_BUFFERS), 1);
+            PUSH_DATA (push, (i << 6) | 0x3c |
+                       (j << NV50_3D_CLEAR_BUFFERS_LAYER__SHIFT));
+         }
+      }
    }
 }
 
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 14/19] nvc0: don't forget to also clear additional layers

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 src/gallium/drivers/nouveau/nv50/nv50_program.c |  2 ++
 src/gallium/drivers/nouveau/nvc0/nvc0_surface.c | 22 ++++++++++++++++------
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index 813795f..e7609fa 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -166,6 +166,8 @@ nv50_fragprog_assign_slots(struct nv50_ir_prog_info *info)
 
          if (info->in[i].sn == TGSI_SEMANTIC_COLOR)
             prog->vp.bfc[info->in[i].si] = j;
+         if (info->in[i].sn == TGSI_SEMANTIC_PRIMID)
+            prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_PRIMITIVE_ID;
 
          prog->in[j].id = i;
          prog->in[j].mask = info->in[i].mask;
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
index 5375bd4..8cc7021 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
@@ -414,7 +414,7 @@ nvc0_clear(struct pipe_context *pipe, unsigned buffers,
    struct nvc0_context *nvc0 = nvc0_context(pipe);
    struct nouveau_pushbuf *push = nvc0->base.pushbuf;
    struct pipe_framebuffer_state *fb = &nvc0->framebuffer;
-   unsigned i;
+   unsigned i, j;
    uint32_t mode = 0;
 
    /* don't need NEW_BLEND, COLOR_MASK doesn't affect CLEAR_BUFFERS */
@@ -444,12 +444,22 @@ nvc0_clear(struct pipe_context *pipe, unsigned buffers,
       mode |= NVC0_3D_CLEAR_BUFFERS_S;
    }
 
-   BEGIN_NVC0(push, NVC0_3D(CLEAR_BUFFERS), 1);
-   PUSH_DATA (push, mode);
+   if ((buffers & PIPE_CLEAR_DEPTH) || (buffers & PIPE_CLEAR_STENCIL))
{
+      for (j = fb->zsbuf->u.tex.first_layer; j <=
fb->zsbuf->u.tex.last_layer; j++) {
+         BEGIN_NVC0(push, NVC0_3D(CLEAR_BUFFERS), 1);
+         PUSH_DATA(push, mode | (j <<
NVC0_3D_CLEAR_BUFFERS_LAYER__SHIFT));
+      }
+   }
 
-   for (i = 1; i < fb->nr_cbufs; i++) {
-      BEGIN_NVC0(push, NVC0_3D(CLEAR_BUFFERS), 1);
-      PUSH_DATA (push, (i << 6) | 0x3c);
+   if (buffers & PIPE_CLEAR_COLOR) {
+      for (i = 0; i < fb->nr_cbufs; i++) {
+         struct pipe_surface *sf = fb->cbufs[i];
+         for (j = sf->u.tex.first_layer; j <= sf->u.tex.last_layer;
j++) {
+            BEGIN_NVC0(push, NVC0_3D(CLEAR_BUFFERS), 1);
+            PUSH_DATA (push, (i << 6) | 0x3c |
+                       (j << NVC0_3D_CLEAR_BUFFERS_LAYER__SHIFT));
+         }
+      }
    }
 }
 
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 15/19] nv50: add comments about CB_AUX contents

Updates a few inconsistencies as well, like the size of the buffer,
location of the runout, etc.

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 src/gallium/drivers/nouveau/nv50/nv50_context.h        | 10 ++++++++++
 src/gallium/drivers/nouveau/nv50/nv50_screen.c         |  8 ++++----
 src/gallium/drivers/nouveau/nv50/nv50_state_validate.c |  2 +-
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.h
b/src/gallium/drivers/nouveau/nv50/nv50_context.h
index ee6eb0e..7bf4ce3 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_context.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_context.h
@@ -70,7 +70,17 @@
 #define NV50_CB_PVP 124
 #define NV50_CB_PGP 126
 #define NV50_CB_PFP 125
+/* constant buffer permanently mapped in as c15[] */
 #define NV50_CB_AUX 127
+/* size of the buffer: 64k. not all taken up, can be reduced if needed. */
+#define NV50_CB_AUX_SIZE          (1 << 16)
+/* 8 user clip planes, at 4 32-bit floats each */
+#define NV50_CB_AUX_UCP_OFFSET    0x0
+/* 256 textures, each with 2 16-bit integers specifying the x/y MS shift */
+#define NV50_CB_AUX_MS_OFFSET     0x80
+/* 4 32-bit floats for the vertex runout, put at the end */
+#define NV50_CB_AUX_RUNOUT_OFFSET (NV50_CB_AUX_SIZE - 0x10)
+
 
 
 struct nv50_blitctx;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 82b0207..9ed2d01 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -472,7 +472,7 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)
    BEGIN_NV04(push, NV50_3D(CB_DEF_ADDRESS_HIGH), 3);
    PUSH_DATAh(push, screen->uniforms->offset + (3 << 16));
    PUSH_DATA (push, screen->uniforms->offset + (3 << 16));
-   PUSH_DATA (push, (NV50_CB_AUX << 16) | 0x0200);
+   PUSH_DATA (push, (NV50_CB_AUX << 16) | (NV50_CB_AUX_SIZE &
0xffff));
 
    BEGIN_NI04(push, NV50_3D(SET_PROGRAM_CB), 3);
    PUSH_DATA (push, (NV50_CB_AUX << 12) | 0xf01);
@@ -481,15 +481,15 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)
 
    /* return { 0.0, 0.0, 0.0, 0.0 } on out-of-bounds vtxbuf access */
    BEGIN_NV04(push, NV50_3D(CB_ADDR), 1);
-   PUSH_DATA (push, ((1 << 9) << 6) | NV50_CB_AUX);
+   PUSH_DATA (push, (NV50_CB_AUX_RUNOUT_OFFSET << 6) | NV50_CB_AUX);
    BEGIN_NI04(push, NV50_3D(CB_DATA(0)), 4);
    PUSH_DATAf(push, 0.0f);
    PUSH_DATAf(push, 0.0f);
    PUSH_DATAf(push, 0.0f);
    PUSH_DATAf(push, 0.0f);
    BEGIN_NV04(push, NV50_3D(VERTEX_RUNOUT_ADDRESS_HIGH), 2);
-   PUSH_DATAh(push, screen->uniforms->offset + (3 << 16) + (1
<< 9));
-   PUSH_DATA (push, screen->uniforms->offset + (3 << 16) + (1
<< 9));
+   PUSH_DATAh(push, screen->uniforms->offset + (3 << 16) +
NV50_CB_AUX_RUNOUT_OFFSET);
+   PUSH_DATA (push, screen->uniforms->offset + (3 << 16) +
NV50_CB_AUX_RUNOUT_OFFSET);
 
    /* max TIC (bits 4:8) & TSC bindings, per program type */
    for (i = 0; i < 3; ++i) {
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c
b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c
index 86b9a23..3d99b73 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_state_validate.c
@@ -238,7 +238,7 @@ nv50_validate_clip(struct nv50_context *nv50)
 
    if (nv50->dirty & NV50_NEW_CLIP) {
       BEGIN_NV04(push, NV50_3D(CB_ADDR), 1);
-      PUSH_DATA (push, (0 << 8) | NV50_CB_AUX);
+      PUSH_DATA (push, (NV50_CB_AUX_UCP_OFFSET << 8) | NV50_CB_AUX);
       BEGIN_NI04(push, NV50_3D(CB_DATA(0)), PIPE_MAX_CLIP_PLANES * 4);
       PUSH_DATAp(push, &nv50->clip.ucp[0][0], PIPE_MAX_CLIP_PLANES * 4);
    }
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 16/19] nv50: copy nvc0's get_sample_position implementation

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 src/gallium/drivers/nouveau/nv50/nv50_context.c | 46 +++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.c
b/src/gallium/drivers/nouveau/nv50/nv50_context.c
index 11afc48..db3bd3a 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_context.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_context.c
@@ -196,6 +196,10 @@ nv50_invalidate_resource_storage(struct nouveau_context
*ctx,
    return ref;
 }
 
+static void
+nv50_context_get_sample_position(struct pipe_context *, unsigned, unsigned,
+                                 float *);
+
 struct pipe_context *
 nv50_create(struct pipe_screen *pscreen, void *priv)
 {
@@ -239,6 +243,7 @@ nv50_create(struct pipe_screen *pscreen, void *priv)
 
    pipe->flush = nv50_flush;
    pipe->texture_barrier = nv50_texture_barrier;
+   pipe->get_sample_position = nv50_context_get_sample_position;
 
    if (!screen->cur_ctx) {
       screen->cur_ctx = nv50;
@@ -317,3 +322,44 @@ nv50_bufctx_fence(struct nouveau_bufctx *bufctx, boolean
on_flush)
          nv50_resource_validate(res, (unsigned)ref->priv_data);
    }
 }
+
+static void
+nv50_context_get_sample_position(struct pipe_context *pipe,
+                                 unsigned sample_count, unsigned sample_index,
+                                 float *xy)
+{
+   static const uint8_t ms1[1][2] = { { 0x8, 0x8 } };
+   static const uint8_t ms2[2][2] = {
+      { 0x4, 0x4 }, { 0xc, 0xc } }; /* surface coords (0,0), (1,0) */
+   static const uint8_t ms4[4][2] = {
+      { 0x6, 0x2 }, { 0xe, 0x6 },   /* (0,0), (1,0) */
+      { 0x2, 0xa }, { 0xa, 0xe } }; /* (0,1), (1,1) */
+   static const uint8_t ms8[8][2] = {
+      { 0x1, 0x7 }, { 0x5, 0x3 },   /* (0,0), (1,0) */
+      { 0x3, 0xd }, { 0x7, 0xb },   /* (0,1), (1,1) */
+      { 0x9, 0x5 }, { 0xf, 0x1 },   /* (2,0), (3,0) */
+      { 0xb, 0xf }, { 0xd, 0x9 } }; /* (2,1), (3,1) */
+#if 0
+   /* NOTE: there are alternative modes for MS2 and MS8, currently not used */
+   static const uint8_t ms8_alt[8][2] = {
+      { 0x9, 0x5 }, { 0x7, 0xb },   /* (2,0), (1,1) */
+      { 0xd, 0x9 }, { 0x5, 0x3 },   /* (3,1), (1,0) */
+      { 0x3, 0xd }, { 0x1, 0x7 },   /* (0,1), (0,0) */
+      { 0xb, 0xf }, { 0xf, 0x1 } }; /* (2,1), (3,0) */
+#endif
+
+   const uint8_t (*ptr)[2];
+
+   switch (sample_count) {
+   case 0:
+   case 1: ptr = ms1; break;
+   case 2: ptr = ms2; break;
+   case 4: ptr = ms4; break;
+   case 8: ptr = ms8; break;
+   default:
+      assert(0);
+      return; /* bad sample count -> undefined locations */
+   }
+   xy[0] = ptr[sample_index][0] * 0.0625f;
+   xy[1] = ptr[sample_index][1] * 0.0625f;
+}
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [PATCH 17/19] nv50: add support for textureFetch'ing MS textures, ARB_texture_multisample

Creates two areas in the AUX constbuf:
 - Sample offsets for MS textures
 - Per-texture MS settings

When executing a textureFetch with a MS sampler, looks up that texture's
settings and adjusts the parameters given to the texfetch instruction.

With this change, all the ARB_texture_multisample piglits pass, so turn
on PIPE_CAP_TEXTURE_MULTISAMPLE.

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.h      |  8 +++
 .../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp  |  1 +
 .../nouveau/codegen/nv50_ir_lowering_nv50.cpp      | 60 +++++++++++++++++
 src/gallium/drivers/nouveau/nv50/nv50_context.h    | 13 +++-
 src/gallium/drivers/nouveau/nv50/nv50_program.c    |  7 +-
 src/gallium/drivers/nouveau/nv50/nv50_screen.c     |  7 +-
 src/gallium/drivers/nouveau/nv50/nv50_tex.c        | 75 +++++++++++++++++++++-
 7 files changed, 164 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index 6a001d3..857980d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -827,6 +827,14 @@ public:
       int isShadow() const { return descTable[target].shadow ? 1 : 0; }
       int isMS() const {
         return target == TEX_TARGET_2D_MS || target == TEX_TARGET_2D_MS_ARRAY;
}
+      void clearMS() {
+         if (isMS()) {
+            if (isArray())
+               target = TEX_TARGET_2D_ARRAY;
+            else
+               target = TEX_TARGET_2D;
+         }
+      }
 
       Target& operator=(TexTarget targ)
       {
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
index a6ed4b0..8f9b7de 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
@@ -1221,6 +1221,7 @@ CodeEmitterNV50::emitCVT(const Instruction *i)
       case TYPE_S32: code[1] = 0x44014000; break;
       case TYPE_U32: code[1] = 0x44004000; break;
       case TYPE_F16: code[1] = 0xc4000000; break;
+      case TYPE_U16: code[1] = 0x44000000; break;
       default:
          assert(0);
          break;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
index 1d13aea..984a8ca 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
@@ -549,6 +549,8 @@ private:
    bool handleCONT(Instruction *);
 
    void checkPredicate(Instruction *);
+   void loadTexMsInfo(uint32_t off, Value **ms, Value **ms_x, Value **ms_y);
+   void loadMsInfo(Value *ms, Value *s, Value **dx, Value **dy);
 
 private:
    const Target *const targ;
@@ -582,6 +584,41 @@ NV50LoweringPreSSA::visit(Function *f)
    return true;
 }
 
+void NV50LoweringPreSSA::loadTexMsInfo(uint32_t off, Value **ms,
+                                       Value **ms_x, Value **ms_y) {
+   // This loads the texture-indexed ms setting from the constant buffer
+   Value *tmp = new_LValue(func, FILE_GPR);
+   uint8_t b = prog->driver->io.resInfoCBSlot;
+   off += prog->driver->io.suInfoBase;
+   *ms_x = bld.mkLoadv(TYPE_U32, bld.mkSymbol(
+                             FILE_MEMORY_CONST, b, TYPE_U32, off + 0), NULL);
+   *ms_y = bld.mkLoadv(TYPE_U32, bld.mkSymbol(
+                             FILE_MEMORY_CONST, b, TYPE_U32, off + 4), NULL);
+   *ms = bld.mkOp2v(OP_ADD, TYPE_U32, tmp, *ms_x, *ms_y);
+}
+
+void NV50LoweringPreSSA::loadMsInfo(Value *ms, Value *s, Value **dx, Value
**dy) {
+   // Given a MS level, and a sample id, compute the delta x/y
+   uint8_t b = prog->driver->io.msInfoCBSlot;
+   Value *off = new_LValue(func, FILE_ADDRESS), *t = new_LValue(func,
FILE_GPR);
+
+   // The required information is at mslevel * 16 * 4 + sample * 8
+   // = (mslevel * 8 + sample) * 8
+   bld.mkOp2(OP_SHL,
+             TYPE_U32,
+             off,
+             bld.mkOp2v(OP_ADD, TYPE_U32, t,
+                        bld.mkOp2v(OP_SHL, TYPE_U32, t, ms, bld.mkImm(3)),
+                        s),
+             bld.mkImm(3));
+   *dx = bld.mkLoadv(TYPE_U32, bld.mkSymbol(
+                           FILE_MEMORY_CONST, b, TYPE_U32,
+                           prog->driver->io.msInfoBase), off);
+   *dy = bld.mkLoadv(TYPE_U32, bld.mkSymbol(
+                           FILE_MEMORY_CONST, b, TYPE_U32,
+                           prog->driver->io.msInfoBase + 4), off);
+}
+
 bool
 NV50LoweringPreSSA::handleTEX(TexInstruction *i)
 {
@@ -589,6 +626,29 @@ NV50LoweringPreSSA::handleTEX(TexInstruction *i)
    const int dref = arg;
    const int lod = i->tex.target.isShadow() ? (arg + 1) : arg;
 
+   // handle MS, which means looking up the MS params for this texture, and
+   // adjusting the input coordinates to point at the right sample.
+   if (i->tex.target.isMS()) {
+      Value *x = i->getSrc(0);
+      Value *y = i->getSrc(1);
+      Value *s = i->getSrc(arg - 1);
+      Value *tx = new_LValue(func, FILE_GPR), *ty = new_LValue(func, FILE_GPR),
+         *ms, *ms_x, *ms_y, *dx, *dy;
+
+      i->tex.target.clearMS();
+
+      loadTexMsInfo(i->tex.r * 4 * 2, &ms, &ms_x, &ms_y);
+      loadMsInfo(ms, s, &dx, &dy);
+
+      bld.mkOp2(OP_SHL, TYPE_U32, tx, x, ms_x);
+      bld.mkOp2(OP_SHL, TYPE_U32, ty, y, ms_y);
+      bld.mkOp2(OP_ADD, TYPE_U32, tx, tx, dx);
+      bld.mkOp2(OP_ADD, TYPE_U32, ty, ty, dy);
+      i->setSrc(0, tx);
+      i->setSrc(1, ty);
+      i->setSrc(arg - 1, bld.loadImm(NULL, 0));
+   }
+
    // dref comes before bias/lod
    if (i->tex.target.isShadow())
       if (i->op == OP_TXB || i->op == OP_TXL)
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.h
b/src/gallium/drivers/nouveau/nv50/nv50_context.h
index 7bf4ce3..1ce52c9 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_context.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_context.h
@@ -75,9 +75,15 @@
 /* size of the buffer: 64k. not all taken up, can be reduced if needed. */
 #define NV50_CB_AUX_SIZE          (1 << 16)
 /* 8 user clip planes, at 4 32-bit floats each */
-#define NV50_CB_AUX_UCP_OFFSET    0x0
-/* 256 textures, each with 2 16-bit integers specifying the x/y MS shift */
-#define NV50_CB_AUX_MS_OFFSET     0x80
+#define NV50_CB_AUX_UCP_OFFSET    0x0000
+#define NV50_CB_AUX_UCP_SIZE      (8 * 4 * 4)
+/* 256 textures, each with ms_x, ms_y u32 pairs */
+#define NV50_CB_AUX_TEX_MS_OFFSET 0x0080
+#define NV50_CB_AUX_TEX_MS_SIZE   (256 * 2 * 4)
+/* For each MS level (4), 8 sets of 32-bit integer pairs sample offsets */
+#define NV50_CB_AUX_MS_OFFSET     0x880
+#define NV50_CB_AUX_MS_SIZE       (4 * 8 * 4 * 2)
+/* next spot: 0x980 */
 /* 4 32-bit floats for the vertex runout, put at the end */
 #define NV50_CB_AUX_RUNOUT_OFFSET (NV50_CB_AUX_SIZE - 0x10)
 
@@ -251,6 +257,7 @@ extern void nv50_init_surface_functions(struct nv50_context
*);
 /* nv50_tex.c */
 void nv50_validate_textures(struct nv50_context *);
 void nv50_validate_samplers(struct nv50_context *);
+void nv50_upload_ms_info(struct nouveau_pushbuf *);
 
 struct pipe_sampler_view *
 nv50_create_texture_view(struct pipe_context *,
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index e7609fa..73583bd 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -323,9 +323,14 @@ nv50_program_translate(struct nv50_program *prog, uint16_t
chipset)
    info->bin.source = (void *)prog->pipe.tokens;
 
    info->io.ucpCBSlot = 15;
-   info->io.ucpBase = 0;
+   info->io.ucpBase = NV50_CB_AUX_UCP_OFFSET;
    info->io.genUserClip = prog->vp.clpd_nr;
 
+   info->io.resInfoCBSlot = 15;
+   info->io.suInfoBase = NV50_CB_AUX_TEX_MS_OFFSET;
+   info->io.msInfoCBSlot = 15;
+   info->io.msInfoBase = NV50_CB_AUX_MS_OFFSET;
+
    info->assignSlots = nv50_program_assign_varying_slots;
 
    prog->vp.bfc[0] = 0xff;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 9ed2d01..5732b21 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -184,8 +184,9 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum
pipe_cap param)
    case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY:
    case PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY:
    case PIPE_CAP_TGSI_TEXCOORD:
-   case PIPE_CAP_TEXTURE_MULTISAMPLE:
       return 0;
+   case PIPE_CAP_TEXTURE_MULTISAMPLE:
+      return 1;
    case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
       return 1;
    case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
@@ -481,7 +482,7 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)
 
    /* return { 0.0, 0.0, 0.0, 0.0 } on out-of-bounds vtxbuf access */
    BEGIN_NV04(push, NV50_3D(CB_ADDR), 1);
-   PUSH_DATA (push, (NV50_CB_AUX_RUNOUT_OFFSET << 6) | NV50_CB_AUX);
+   PUSH_DATA (push, (NV50_CB_AUX_RUNOUT_OFFSET << (8 - 2)) |
NV50_CB_AUX);
    BEGIN_NI04(push, NV50_3D(CB_DATA(0)), 4);
    PUSH_DATAf(push, 0.0f);
    PUSH_DATAf(push, 0.0f);
@@ -491,6 +492,8 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)
    PUSH_DATAh(push, screen->uniforms->offset + (3 << 16) +
NV50_CB_AUX_RUNOUT_OFFSET);
    PUSH_DATA (push, screen->uniforms->offset + (3 << 16) +
NV50_CB_AUX_RUNOUT_OFFSET);
 
+   nv50_upload_ms_info(push);
+
    /* max TIC (bits 4:8) & TSC bindings, per program type */
    for (i = 0; i < 3; ++i) {
       BEGIN_NV04(push, NV50_3D(TEX_LIMITS(i)), 1);
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_tex.c
b/src/gallium/drivers/nouveau/nv50/nv50_tex.c
index 6663a61..e76876d 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_tex.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_tex.c
@@ -143,7 +143,7 @@ nv50_create_texture_view(struct pipe_context *pipe,
          tic[2] |= NV50_TIC_2_LINEAR | NV50_TIC_2_TARGET_RECT;
          tic[3] = mt->level[0].pitch;
          tic[4] = mt->base.base.width0;
-         tic[5] = (1 << 16) | mt->base.base.height0;
+         tic[5] = (1 << 16) | (mt->base.base.height0);
       }
       tic[6]        tic[7] = 0;
@@ -283,6 +283,24 @@ nv50_validate_tic(struct nv50_context *nv50, int s)
       BEGIN_NV04(push, NV50_3D(BIND_TIC(s)), 1);
       PUSH_DATA (push, (i << 1) | 0);
    }
+   if (nv50->num_textures[s]) {
+      BEGIN_NV04(push, NV50_3D(CB_ADDR), 1);
+      PUSH_DATA (push, (NV50_CB_AUX_TEX_MS_OFFSET << (8 - 2)) |
NV50_CB_AUX);
+      BEGIN_NI04(push, NV50_3D(CB_DATA(0)), nv50->num_textures[s] * 2);
+      for (i = 0; i < nv50->num_textures[s]; i++) {
+         struct nv50_tic_entry *tic = nv50_tic_entry(nv50->textures[s][i]);
+         struct nv50_miptree *res;
+
+         if (!tic) {
+            PUSH_DATA (push, 0);
+            PUSH_DATA (push, 0);
+            continue;
+         }
+         res = nv50_miptree(tic->pipe.texture);
+         PUSH_DATA (push, res->ms_x);
+         PUSH_DATA (push, res->ms_y);
+      }
+   }
    nv50->state.num_textures[s] = nv50->num_textures[s];
 
    return need_flush;
@@ -352,3 +370,58 @@ void nv50_validate_samplers(struct nv50_context *nv50)
       PUSH_DATA (nv50->base.pushbuf, 0);
    }
 }
+
+/* There can be up to 4 different MS levels (1, 2, 4, 8). To simplify the
+ * shader logic, allow each one to take up 8 offsets.
+ */
+#define COMBINE(x, y) x, y
+#define DUMMY 0, 0
+static const uint32_t msaa_sample_xy_offsets[] = {
+   /* MS1 */
+   COMBINE(0, 0),
+   DUMMY,
+   DUMMY,
+   DUMMY,
+   DUMMY,
+   DUMMY,
+   DUMMY,
+   DUMMY,
+
+   /* MS2 */
+   COMBINE(0, 0),
+   COMBINE(1, 0),
+   DUMMY,
+   DUMMY,
+   DUMMY,
+   DUMMY,
+   DUMMY,
+   DUMMY,
+
+   /* MS4 */
+   COMBINE(0, 0),
+   COMBINE(1, 0),
+   COMBINE(0, 1),
+   COMBINE(1, 1),
+   DUMMY,
+   DUMMY,
+   DUMMY,
+   DUMMY,
+
+   /* MS8 */
+   COMBINE(0, 0),
+   COMBINE(1, 0),
+   COMBINE(0, 1),
+   COMBINE(1, 1),
+   COMBINE(2, 0),
+   COMBINE(3, 0),
+   COMBINE(2, 1),
+   COMBINE(3, 1),
+};
+
+void nv50_upload_ms_info(struct nouveau_pushbuf *push)
+{
+   BEGIN_NV04(push, NV50_3D(CB_ADDR), 1);
+   PUSH_DATA (push, (NV50_CB_AUX_MS_OFFSET << (8 - 2)) | NV50_CB_AUX);
+   BEGIN_NI04(push, NV50_3D(CB_DATA(0)), Elements(msaa_sample_xy_offsets));
+   PUSH_DATAp(push, msaa_sample_xy_offsets, Elements(msaa_sample_xy_offsets));
+}
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [RFC PATCH 18/19] nv50: report glsl 1.50 now that gp tests pass

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---

There are still some things that fail -- mostly gl_Layer stuff, and also using
gl_PositionID without a gp.

 src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 5732b21..123bdab 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -126,7 +126,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum
pipe_cap param)
    case PIPE_CAP_SM3:
       return 1;
    case PIPE_CAP_GLSL_FEATURE_LEVEL:
-      return 140;
+      return 150;
    case PIPE_CAP_MAX_RENDER_TARGETS:
       return 8;
    case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS:
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-13 19:19 UTC

head link

[Nouveau] [RFC PATCH 19/19] nv50: enable seamless cube maps on all hw for OpenGL 3.2

Some of the hardware support is missing. The NVIDIA-provided driver,
which claims 3.3 support fails a slew of the relevant tests as well.
This allows us to expose geometry shaders without doing the additional
work involved in supporting ARB_geometry_shader4.

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---

 src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 123bdab..a108ece 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -111,7 +111,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum
pipe_cap param)
    case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
       return 65536;
    case PIPE_CAP_SEAMLESS_CUBE_MAP:
-      return nv50_screen(pscreen)->tesla->oclass >= NVA0_3D_CLASS;
+      return 1; //nv50_screen(pscreen)->tesla->oclass >=
NVA0_3D_CLASS;
    case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
       return 0;
    case PIPE_CAP_CUBE_MAP_ARRAY:
-- 
1.8.3.2

Ilia Mirkin

2014-Jan-15 20:07 UTC

head link

[Nouveau] [PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2

On Mon, Jan 13, 2014 at 2:19 PM, Ilia Mirkin <imirkin at alum.mit.edu>
wrote:> OK, so there's a bunch of stuff in here. The geometry stuff is based on
the
> work started by Bryan Cain and Christoph Bumiller.
>
> Patches 01-12: Add support for geometry shaders and fix related issues
> Patches 13-14: Make it possible for fb clears to operate on texture
attachments
>                with an explicit layer set (as is allowed in gl 3.2).
> Patches 15-17: Make ARB_texture_multisample work
> Patch      18: Enable GLSL 1.50
> Patch      19: Turn on ARB_seamless_cube_map irrespective of HW support so
that                all nv50 cards can get OpenGL 3.2 and geometry shaders
(which
>                are otherwise unsupported)
>
> There are still a few geometry-related piglits that fail -- specifically:
>   primitive-id-no-gs
>   gl-3.2-layered-rendering-gl-layer*
For those who care, these should now be fixed in my github repo as
well. I won't repost the full series, as these are just incremental
patches, but you can see them at:

https://github.com/imirkin/mesa/commit/5eb4ad1115d0c4cb9f06a5ebb19501c1afc433bd
https://github.com/imirkin/mesa/commit/fcd6a8661ba9ac19faf205a2025b001bb31146a8

The nv50-gs branch now also contains Christoph Bumiller's patches from
late December which effectively allow us to enable GL3.3.
>
> I need to trace the blob to figure out exactly how to configure the HW for
> those situations, but I suspect that the fixes will be fairly small and
> self-contained.
>
> Note that there are also a bunch of EXT_framebuffer_multisample tests that
are
> failing, but that has nothing to do with these changes. There's
something
> wrong with the blit_3d function, at the very least to do with
depth/stencil,
> but also some color tests fail as well.
>
> These patches are available at https://github.com/imirkin/mesa.git nv50-gs
> or https://github.com/imirkin/mesa/commits/nv50-gs for those who prefer a
> web ui.
>
> Bryan Cain (2):
>   nv50/ir: delay calculation of indirect addresses
>   nv50: add support for geometry shaders
>
> Christoph Bumiller (1):
>   nv50/ir: fix PFETCH and add RDSV to get VSTRIDE for GPs
>
> Ilia Mirkin (16):
>   nv50: allow vert_count to be >255
>   nv50/ir: disallow predicates on emit/restart ops
>   nv50/ir: disallow shader input propagation for gp
>   nv50/ir: comment out code to allow input/immed loads
>   nv50/ir: add support for gl_PrimitiveIDIn
>   nv50: properly set the PRIMITIVE_ID enable flag when it is a gp input.
>   nv50: VP_RESULT_MAP_SIZE has to be positive
>   nv50: GP_REG_ALLOC_RESULT must be positive
>   nv50: allocate an extra code bo to avoid dmesg spam
>   nv50: don't forget to also clear additional layers
>   nvc0: don't forget to also clear additional layers
>   nv50: add comments about CB_AUX contents
>   nv50: copy nvc0's get_sample_position implementation
>   nv50: add support for textureFetch'ing MS textures,
>     ARB_texture_multisample
>   nv50: report glsl 1.50 now that gp tests pass
>   nv50: enable seamless cube maps on all hw for OpenGL 3.2
>
>  src/gallium/drivers/nouveau/codegen/nv50_ir.h      |   9 ++
>  .../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp  |  92 ++++++++++--
>  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  |  41 ++++--
>  .../nouveau/codegen/nv50_ir_lowering_nv50.cpp      | 164
++++++++++++++++++++-
>  .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp      |   7 +
>  .../drivers/nouveau/codegen/nv50_ir_print.cpp      |   1 +
>  .../nouveau/codegen/nv50_ir_target_nv50.cpp        |  18 ++-
>  src/gallium/drivers/nouveau/nv50/nv50_context.c    |  46 ++++++
>  src/gallium/drivers/nouveau/nv50/nv50_context.h    |  17 +++
>  src/gallium/drivers/nouveau/nv50/nv50_program.c    |  30 +++-
>  src/gallium/drivers/nouveau/nv50/nv50_program.h    |   2 +-
>  src/gallium/drivers/nouveau/nv50/nv50_screen.c     |  23 ++-
>  .../drivers/nouveau/nv50/nv50_shader_state.c       |   6 +
>  .../drivers/nouveau/nv50/nv50_state_validate.c     |   2 +-
>  src/gallium/drivers/nouveau/nv50/nv50_surface.c    |  25 ++--
>  src/gallium/drivers/nouveau/nv50/nv50_tex.c        |  77 +++++++++-
>  src/gallium/drivers/nouveau/nvc0/nvc0_surface.c    |  22 ++-
>  17 files changed, 526 insertions(+), 56 deletions(-)
>
> --
> 1.8.3.2
>

Reasonably Related Threads

Search for more possibly parallel threads

Nouveau - Jan 2014 - [PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2

[Nouveau] [PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2

[Nouveau] [PATCH 01/19] nv50/ir: fix PFETCH and add RDSV to get VSTRIDE for GPs

[Nouveau] [PATCH 02/19] nv50/ir: delay calculation of indirect addresses

[Nouveau] [PATCH 03/19] nv50: add support for geometry shaders

[Nouveau] [PATCH 04/19] nv50: allow vert_count to be >255

[Nouveau] [PATCH 05/19] nv50/ir: disallow predicates on emit/restart ops

[Nouveau] [PATCH 06/19] nv50/ir: disallow shader input propagation for gp

[Nouveau] [PATCH 07/19] nv50/ir: comment out code to allow input/immed loads

[Nouveau] [PATCH 08/19] nv50/ir: add support for gl_PrimitiveIDIn

[Nouveau] [PATCH 09/19] nv50: properly set the PRIMITIVE_ID enable flag when it is a gp input.

[Nouveau] [PATCH 10/19] nv50: VP_RESULT_MAP_SIZE has to be positive

[Nouveau] [PATCH 11/19] nv50: GP_REG_ALLOC_RESULT must be positive

[Nouveau] [PATCH 12/19] nv50: allocate an extra code bo to avoid dmesg spam

[Nouveau] [PATCH 13/19] nv50: don't forget to also clear additional layers

[Nouveau] [PATCH 14/19] nvc0: don't forget to also clear additional layers

[Nouveau] [PATCH 15/19] nv50: add comments about CB_AUX contents

[Nouveau] [PATCH 16/19] nv50: copy nvc0's get_sample_position implementation

[Nouveau] [PATCH 17/19] nv50: add support for textureFetch'ing MS textures, ARB_texture_multisample

[Nouveau] [RFC PATCH 18/19] nv50: report glsl 1.50 now that gp tests pass

[Nouveau] [RFC PATCH 19/19] nv50: enable seamless cube maps on all hw for OpenGL 3.2

[Nouveau] [PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2

Reasonably Related Threads