thr3ads.net - Nouveau - [Nouveau] [PATCH 0/5] nvc0: fp64 preparation [Jul 2014]

If this information is useful, please help other people find it:
Share via:

Ilia Mirkin

2014-Jul-18 13:57 UTC

[Nouveau] [PATCH 0/5] nvc0: fp64 preparation

Most of codegen is already FP64-ready. There are a few edge-cases that I ran
into, many of which can apply even to non-fp64-enabled programs (although the
double-wide registers are not very common without fp64).

I've yet to give this a full piglit run, but wanted to send these out in
case
someone wanted to comment. They do not depend on the preliminary core fp64
work.

Ilia Mirkin (5):
  nvc0: make sure that the local memory allocation is aligned to 0x10
  nv50/ir: keep track of whether the program uses fp64
  nvc0: mark shader header if fp64 is used
  nv50/ir: fix hard-coded TYPE_U32 sized register
  nv50/ir: fix phi/union sources when their def has been merged

 src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h   |  1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp     | 15 ++++++++++++---
 src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp |  8 ++++++--
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c        |  4 +++-
 4 files changed, 22 insertions(+), 6 deletions(-)

-- 
1.8.5.5

Ilia Mirkin

2014-Jul-18 13:57 UTC

head link

[Nouveau] [PATCH 1/5] nvc0: make sure that the local memory allocation is aligned to 0x10

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
Cc: <mesa-stable at lists.freedesktop.org>
---

Was getting weird shader errors in dmat4*dmat4 which spilled one double-wide
register (i.e. size 8). envytools docs apparently list this as having to be
aligned to 0x10, and this indeed fixes it.

 src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
index 1c82a9a..c624e21 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
@@ -626,7 +626,7 @@ nvc0_program_translate(struct nvc0_program *prog, uint16_t
chipset)
    if (info->bin.tlsSpace) {
       assert(info->bin.tlsSpace < (1 << 24));
       prog->hdr[0] |= 1 << 26;
-      prog->hdr[1] |= info->bin.tlsSpace; /* l[] size */
+      prog->hdr[1] |= align(info->bin.tlsSpace, 0x10); /* l[] size */
       prog->need_tls = TRUE;
    }
    /* TODO: factor 2 only needed where joinat/precont is used,
-- 
1.8.5.5

Ilia Mirkin

2014-Jul-18 13:57 UTC

head link

[Nouveau] [PATCH 2/5] nv50/ir: keep track of whether the program uses fp64

---
 src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h   | 1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp | 8 ++++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
index f829aac..3a89a29 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h
@@ -183,6 +183,7 @@ struct nv50_ir_prog_info
       boolean sampleInterp;      /* perform sample interp on all fp inputs */
       uint8_t backFaceColor[2];  /* input/output indices of back face colour */
       uint8_t globalAccess;      /* 1 for read, 2 for wr, 3 for rw */
+      boolean fp64;              /* program uses fp64 math */
       boolean nv50styleSurfaces; /* generate gX[] access for raw buffers */
       uint8_t resInfoCBSlot;     /* cX[] used for tex handles, surface info */
       uint16_t texBindBase;      /* base address for tex handles (nve4) */
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
index 0397bdc..7992f53 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
@@ -379,9 +379,13 @@ Program::emitBinary(struct nv50_ir_prog_info *info)
 
       assert(emit->getCodeSize() == fn->binPos);
 
-      for (int b = 0; b < fn->bbCount; ++b)
-         for (Instruction *i = fn->bbArray[b]->getEntry(); i; i =
i->next)
+      for (int b = 0; b < fn->bbCount; ++b) {
+         for (Instruction *i = fn->bbArray[b]->getEntry(); i; i =
i->next) {
             emit->emitInstruction(i);
+            if (i->sType == TYPE_F64 || i->dType == TYPE_F64)
+               info->io.fp64 = true;
+         }
+      }
    }
    info->bin.relocData = emit->getRelocInfo();
 
-- 
1.8.5.5

Ilia Mirkin

2014-Jul-18 13:57 UTC

head link

[Nouveau] [PATCH 3/5] nvc0: mark shader header if fp64 is used

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
index c624e21..ce0207a 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
@@ -640,6 +640,8 @@ nvc0_program_translate(struct nvc0_program *prog, uint16_t
chipset)
    */
    if (info->io.globalAccess)
       prog->hdr[0] |= 1 << 16;
+   if (info->io.fp64)
+      prog->hdr[0] |= 1 << 27;
 
    if (prog->pipe.stream_output.num_outputs)
       prog->tfb = nvc0_program_create_tfb_state(info,
-- 
1.8.5.5

Ilia Mirkin

2014-Jul-18 13:57 UTC

head link

[Nouveau] [PATCH 4/5] nv50/ir: fix hard-coded TYPE_U32 sized register

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---

I noticed this in a review of the code trying to figure out why the next
problem was happening. This doesn't actually fix anything, but there's
no
reason why phi nodes must be restricted to 32-bit registers. (Although they
are, for now.)

 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
index e4f56b1..117da94 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
@@ -389,11 +389,12 @@ RegAlloc::PhiMovesPass::visit(BasicBlock *bb)
          pb->insertTail(new_FlowInstruction(func, OP_BRA, bb));
 
       for (phi = bb->getPhi(); phi && phi->op == OP_PHI; phi =
phi->next) {
-         mov = new_Instruction(func, OP_MOV, TYPE_U32);
+         LValue *tmp = new_LValue(func, phi->getDef(0)->asLValue());
+         mov = new_Instruction(func, OP_MOV, typeOfSize(tmp->reg.size));
 
          mov->setSrc(0, phi->getSrc(j));
-         mov->setDef(0, new_LValue(func, phi->getDef(0)->asLValue()));
-         phi->setSrc(j, mov->getDef(0));
+         mov->setDef(0, tmp);
+         phi->setSrc(j, tmp);
 
          pb->insertBefore(pb->getExit(), mov);
       }
-- 
1.8.5.5

Ilia Mirkin

2014-Jul-18 13:57 UTC

head link

[Nouveau] [PATCH 5/5] nv50/ir: fix phi/union sources when their def has been merged

In a situation where double-register values are used, the phi nodes can
still end up being u32 values. They all get merged into one RA node
though. When fixing up the merge (which comes after the phi node), the
phi node's def would get fixed, but not its sources which would remain
at the low register value.

This maintains the invariant that a phi node's defs and sources are
allocated the same register.

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---

I _think_ that the split case might also need this, in case there's a split
that feeds into phi nodes, and those phi nodes are never merged. But this
fixes a real issue, and this stuff is pretty finicky... rather not poke the
bear too hard.

 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
index 117da94..21d7fd0 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
@@ -1702,6 +1702,14 @@ GCRA::resolveSplitsAndMerges()
          Value *v = merge->getSrc(s);
          v->reg.data.id = regs.bytesToId(v, reg);
          v->join = v;
+         // If the value is defined by a phi/union node, we also need to
+         // perform the same fixup on that node's sources, since after RA
+         // their registers should be identical.
+         if (v->getInsn()->op == OP_PHI || v->getInsn()->op ==
OP_UNION) {
+            Instruction *phi = v->getInsn();
+            for (int phis = 0; phi->srcExists(phis); ++phis)
+               phi->getSrc(phis)->join = v;
+         }
          reg += v->reg.size;
       }
    }
-- 
1.8.5.5

Reasonably Related Threads

Search for more possibly parallel threads

Nouveau - Jul 2014 - [PATCH 0/5] nvc0: fp64 preparation

[Nouveau] [PATCH 0/5] nvc0: fp64 preparation

[Nouveau] [PATCH 1/5] nvc0: make sure that the local memory allocation is aligned to 0x10

[Nouveau] [PATCH 2/5] nv50/ir: keep track of whether the program uses fp64

[Nouveau] [PATCH 3/5] nvc0: mark shader header if fp64 is used

[Nouveau] [PATCH 4/5] nv50/ir: fix hard-coded TYPE_U32 sized register

[Nouveau] [PATCH 5/5] nv50/ir: fix phi/union sources when their def has been merged

Reasonably Related Threads