Most of codegen is already FP64-ready. There are a few edge-cases that I ran into, many of which can apply even to non-fp64-enabled programs (although the double-wide registers are not very common without fp64). I've yet to give this a full piglit run, but wanted to send these out in case someone wanted to comment. They do not depend on the preliminary core fp64 work. Ilia Mirkin (5): nvc0: make sure that the local memory allocation is aligned to 0x10 nv50/ir: keep track of whether the program uses fp64 nvc0: mark shader header if fp64 is used nv50/ir: fix hard-coded TYPE_U32 sized register nv50/ir: fix phi/union sources when their def has been merged src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 15 ++++++++++++--- src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp | 8 ++++++-- src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 4 +++- 4 files changed, 22 insertions(+), 6 deletions(-) -- 1.8.5.5
Ilia Mirkin
2014-Jul-18 13:57 UTC
[Nouveau] [PATCH 1/5] nvc0: make sure that the local memory allocation is aligned to 0x10
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: <mesa-stable at lists.freedesktop.org> --- Was getting weird shader errors in dmat4*dmat4 which spilled one double-wide register (i.e. size 8). envytools docs apparently list this as having to be aligned to 0x10, and this indeed fixes it. src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c index 1c82a9a..c624e21 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c @@ -626,7 +626,7 @@ nvc0_program_translate(struct nvc0_program *prog, uint16_t chipset) if (info->bin.tlsSpace) { assert(info->bin.tlsSpace < (1 << 24)); prog->hdr[0] |= 1 << 26; - prog->hdr[1] |= info->bin.tlsSpace; /* l[] size */ + prog->hdr[1] |= align(info->bin.tlsSpace, 0x10); /* l[] size */ prog->need_tls = TRUE; } /* TODO: factor 2 only needed where joinat/precont is used, -- 1.8.5.5
Ilia Mirkin
2014-Jul-18 13:57 UTC
[Nouveau] [PATCH 2/5] nv50/ir: keep track of whether the program uses fp64
--- src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp | 8 ++++++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h index f829aac..3a89a29 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h @@ -183,6 +183,7 @@ struct nv50_ir_prog_info boolean sampleInterp; /* perform sample interp on all fp inputs */ uint8_t backFaceColor[2]; /* input/output indices of back face colour */ uint8_t globalAccess; /* 1 for read, 2 for wr, 3 for rw */ + boolean fp64; /* program uses fp64 math */ boolean nv50styleSurfaces; /* generate gX[] access for raw buffers */ uint8_t resInfoCBSlot; /* cX[] used for tex handles, surface info */ uint16_t texBindBase; /* base address for tex handles (nve4) */ diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp index 0397bdc..7992f53 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp @@ -379,9 +379,13 @@ Program::emitBinary(struct nv50_ir_prog_info *info) assert(emit->getCodeSize() == fn->binPos); - for (int b = 0; b < fn->bbCount; ++b) - for (Instruction *i = fn->bbArray[b]->getEntry(); i; i = i->next) + for (int b = 0; b < fn->bbCount; ++b) { + for (Instruction *i = fn->bbArray[b]->getEntry(); i; i = i->next) { emit->emitInstruction(i); + if (i->sType == TYPE_F64 || i->dType == TYPE_F64) + info->io.fp64 = true; + } + } } info->bin.relocData = emit->getRelocInfo(); -- 1.8.5.5
Ilia Mirkin
2014-Jul-18 13:57 UTC
[Nouveau] [PATCH 3/5] nvc0: mark shader header if fp64 is used
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c index c624e21..ce0207a 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c @@ -640,6 +640,8 @@ nvc0_program_translate(struct nvc0_program *prog, uint16_t chipset) */ if (info->io.globalAccess) prog->hdr[0] |= 1 << 16; + if (info->io.fp64) + prog->hdr[0] |= 1 << 27; if (prog->pipe.stream_output.num_outputs) prog->tfb = nvc0_program_create_tfb_state(info, -- 1.8.5.5
Ilia Mirkin
2014-Jul-18 13:57 UTC
[Nouveau] [PATCH 4/5] nv50/ir: fix hard-coded TYPE_U32 sized register
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- I noticed this in a review of the code trying to figure out why the next problem was happening. This doesn't actually fix anything, but there's no reason why phi nodes must be restricted to 32-bit registers. (Although they are, for now.) src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp index e4f56b1..117da94 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp @@ -389,11 +389,12 @@ RegAlloc::PhiMovesPass::visit(BasicBlock *bb) pb->insertTail(new_FlowInstruction(func, OP_BRA, bb)); for (phi = bb->getPhi(); phi && phi->op == OP_PHI; phi = phi->next) { - mov = new_Instruction(func, OP_MOV, TYPE_U32); + LValue *tmp = new_LValue(func, phi->getDef(0)->asLValue()); + mov = new_Instruction(func, OP_MOV, typeOfSize(tmp->reg.size)); mov->setSrc(0, phi->getSrc(j)); - mov->setDef(0, new_LValue(func, phi->getDef(0)->asLValue())); - phi->setSrc(j, mov->getDef(0)); + mov->setDef(0, tmp); + phi->setSrc(j, tmp); pb->insertBefore(pb->getExit(), mov); } -- 1.8.5.5
Ilia Mirkin
2014-Jul-18 13:57 UTC
[Nouveau] [PATCH 5/5] nv50/ir: fix phi/union sources when their def has been merged
In a situation where double-register values are used, the phi nodes can still end up being u32 values. They all get merged into one RA node though. When fixing up the merge (which comes after the phi node), the phi node's def would get fixed, but not its sources which would remain at the low register value. This maintains the invariant that a phi node's defs and sources are allocated the same register. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- I _think_ that the split case might also need this, in case there's a split that feeds into phi nodes, and those phi nodes are never merged. But this fixes a real issue, and this stuff is pretty finicky... rather not poke the bear too hard. src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp index 117da94..21d7fd0 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp @@ -1702,6 +1702,14 @@ GCRA::resolveSplitsAndMerges() Value *v = merge->getSrc(s); v->reg.data.id = regs.bytesToId(v, reg); v->join = v; + // If the value is defined by a phi/union node, we also need to + // perform the same fixup on that node's sources, since after RA + // their registers should be identical. + if (v->getInsn()->op == OP_PHI || v->getInsn()->op == OP_UNION) { + Instruction *phi = v->getInsn(); + for (int phis = 0; phi->srcExists(phis); ++phis) + phi->getSrc(phis)->join = v; + } reg += v->reg.size; } } -- 1.8.5.5
Reasonably Related Threads
- [RFC PATCH] nouveau/compiler: Allow to omit line numbers when printing instructions
- [PATCH v2] nouveau/compiler: Allow to omit line numbers when printing instructions
- [PATCH 00/12] Tessellation support for nvc0
- [PATCH v3] nouveau/compiler: Allow to omit line numbers when printing instructions
- [PATCH 0/3] ARB_viewport_array for nvc0