Hans de Goede
2015-Nov-05 13:32 UTC
[Nouveau] [PATCH mesa 0/5] nouveau: codegen: Make use of double immediates
Hi All, This series implements using double immediates in the nouveau codegen code. This turns the following (nvc0) code: 1: mov u32 $r2 0x00000000 (8) 2: mov u32 $r3 0x3fe00000 (8) 3: add f64 $r0d $r0d $r2d (8) Into: 1: add f64 $r0d $r0d 0.500000 (8) This has been tested with the 2 double shader tests which I just send to the piglet list. On a gk208 (gk110 / SM35) card, and by checking the output of nouveau_compiler with both nvdisasm and envydis on gf100 / gk104 / gm107. Regards, Hans
Hans de Goede
2015-Nov-05 13:32 UTC
[Nouveau] [PATCH mesa 1/5] nouveau: codegen: emit_nvc0: Add support for double immediates
Add support for encoding double immediates (up to 20 bits of precision) into the generated nvc0 machine-code. Signed-off-by: Hans de Goede <hdegoede at redhat.com> --- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index fd10314..8784f3b 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp @@ -323,6 +323,14 @@ CodeEmitterNVC0::setImmediate(const Instruction *i, const int s) assert(imm); u32 = imm->reg.data.u32; + if ((code[0] & 0xf) == 0x1) { + // double immediate + uint64_t u64 = imm->reg.data.u64; + assert(!(u64 & 0x00000fffffffffffULL)); + assert(!(code[1] & 0xc000)); + code[0] |= ((u64 >> 44) & 0x3f) << 26; + code[1] |= 0xc000 | (u64 >> 50); + } else if ((code[0] & 0xf) == 0x2) { // LIMM code[0] |= (u32 & 0x3f) << 26; -- 2.5.0
Hans de Goede
2015-Nov-05 13:32 UTC
[Nouveau] [PATCH mesa 2/5] nouveau: codegen: emit_gm107: Add support for double immediates
Add support for encoding double immediates (up to 20 bits of precision) into the generated gm107 machine-code. Signed-off-by: Hans de Goede <hdegoede at redhat.com> --- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp index a327d57..7e6ed84 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp @@ -310,9 +310,12 @@ CodeEmitterGM107::emitIMMD(int pos, int len, const ValueRef &ref) uint32_t val = imm->reg.data.u32; if (len == 19) { - if (isFloatType(insn->sType)) { + if (insn->sType == TYPE_F32 || insn->sType == TYPE_F16) { assert(!(val & 0x00000fff)); val >>= 12; + } else if (insn->sType == TYPE_F64) { + assert(!(imm->reg.data.u64 & 0x00000fffffffffffULL)); + val = imm->reg.data.u64 >> 44; } assert(!(val & 0xfff00000) || (val & 0xfff00000) == 0xfff00000); emitField( 56, 1, (val & 0x80000) >> 19); -- 2.5.0
Hans de Goede
2015-Nov-05 13:32 UTC
[Nouveau] [PATCH mesa 3/5] nouveau: codegen: Add support for merge-s to the ConstantFolding pass
This allows later passes like LoadPropagation to properly deal with 64 bit immediates. If the new 64 bit load this introduces does not get optimized away then split64BitOpPostRA() will split this into 2 instructions again. Signed-off-by: Hans de Goede <hdegoede at redhat.com> --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 44f74c6..8e241f1 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -447,6 +447,7 @@ ConstantFolding::expr(Instruction *i, { struct Storage *const a = &imm0.reg, *const b = &imm1.reg; struct Storage res; + uint8_t fixSrc0Size = 0; memset(&res.data, 0, sizeof(res.data)); @@ -589,6 +590,18 @@ ConstantFolding::expr(Instruction *i, // the second argument will not be constant, but that can happen. res.data.u32 = a->data.u32 + b->data.u32; break; + case OP_MERGE: + switch (i->dType) { + case TYPE_U64: + case TYPE_S64: + case TYPE_F64: + res.data.u64 = (((uint64_t)b->data.u32) << 32) | a->data.u32; + fixSrc0Size = 8; + break; + default: + return; + } + break; default: return; } @@ -602,6 +615,8 @@ ConstantFolding::expr(Instruction *i, i->setSrc(1, NULL); i->getSrc(0)->reg.data = res.data; + if (fixSrc0Size) + i->getSrc(0)->reg.size = fixSrc0Size; switch (i->op) { case OP_MAD: -- 2.5.0
Hans de Goede
2015-Nov-05 13:32 UTC
[Nouveau] [PATCH mesa 4/5] nouveau: codegen: Teach insnCanLoad about double immediates
Teach insnCanLoad about double immediates, together with the "Add support for merge-s to the ConstantFolding pass" This turns the following (nvc0) code: 1: mov u32 $r2 0x00000000 (8) 2: mov u32 $r3 0x3fe00000 (8) 3: add f64 $r0d $r0d $r2d (8) Into: 1: add f64 $r0d $r0d 0.500000 (8) Signed-off-by: Hans de Goede <hdegoede at redhat.com> --- .../nouveau/codegen/nv50_ir_target_nvc0.cpp | 25 ++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp index 27df0eb..8f59d86 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp @@ -338,17 +338,30 @@ TargetNVC0::insnCanLoad(const Instruction *i, int s, if (sf == FILE_IMMEDIATE) { Storage ® = ld->getSrc(0)->asImm()->reg; - if (typeSizeof(i->sType) > 4) - return false; - if (opInfo[i->op].immdBits != 0xffffffff) { - if (i->sType == TYPE_F32) { + if (opInfo[i->op].immdBits != 0xffffffff || typeSizeof(i->sType) > 4) { + switch (i->sType) { + case TYPE_F64: + if (reg.data.u64 & 0x00000fffffffffffULL) + return false; + break; + case TYPE_F32: if (reg.data.u32 & 0xfff) return false; - } else - if (i->sType == TYPE_S32 || i->sType == TYPE_U32) { + break; + case TYPE_S32: + case TYPE_U32: // with u32, 0xfffff counts as 0xffffffff as well if (reg.data.s32 > 0x7ffff || reg.data.s32 < -0x80000) return false; + break; + case TYPE_U8: + case TYPE_S8: + case TYPE_U16: + case TYPE_S16: + case TYPE_F16: + break; + default: + return false; } } else if (i->op == OP_MAD || i->op == OP_FMA) { -- 2.5.0
Hans de Goede
2015-Nov-05 13:32 UTC
[Nouveau] [PATCH mesa 5/5] nouveau: codegen: Add support for 64bit immediates to checkSwapSrc01
Now that we support 64 bit immediates in insnCanLoad, we need to swap 64 bit immediate sources too for optimal effect. Signed-off-by: Hans de Goede <hdegoede at redhat.com> --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 8e241f1..b952c76 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -155,7 +155,7 @@ private: void checkSwapSrc01(Instruction *); bool isCSpaceLoad(Instruction *); - bool isImmd32Load(Instruction *); + bool isImmdLoad(Instruction *); bool isAttribOrSharedLoad(Instruction *); }; @@ -166,9 +166,10 @@ LoadPropagation::isCSpaceLoad(Instruction *ld) } bool -LoadPropagation::isImmd32Load(Instruction *ld) +LoadPropagation::isImmdLoad(Instruction *ld) { - if (!ld || (ld->op != OP_MOV) || (typeSizeof(ld->dType) != 4)) + if (!ld || (ld->op != OP_MOV) || + ((typeSizeof(ld->dType) != 4) && (typeSizeof(ld->dType) != 8))) return false; return ld->src(0).getFile() == FILE_IMMEDIATE; } @@ -201,8 +202,8 @@ LoadPropagation::checkSwapSrc01(Instruction *insn) else return; } else - if (isImmd32Load(i0)) { - if (!isCSpaceLoad(i1) && !isImmd32Load(i1)) + if (isImmdLoad(i0)) { + if (!isCSpaceLoad(i1) && !isImmdLoad(i1)) insn->swapSources(0, 1); else return; -- 2.5.0
Ilia Mirkin
2015-Nov-07 00:59 UTC
[Nouveau] [PATCH mesa 0/5] nouveau: codegen: Make use of double immediates
Hi Hans, All pushed. I made a few additional fixes and improvement to fp64 immediate handling along the way, but all your commits were fine as-is. (Except that they enabled fp64 immediates on nv50 implicitly which is wrong -- there are no immediate-taking variants on nv50, so I fixed that glitch. But only the G200 can do fp64 in the first place, and nouveau doesn't actually expose it. Corner case of a corner case :) ) Thanks for taking care of this... it was a small bit of fp64 which I always felt bad about not having finished up. (But not bad enough to actually finish it myself.) Cheers, -ilia On Thu, Nov 5, 2015 at 8:32 AM, Hans de Goede <hdegoede at redhat.com> wrote:> Hi All, > > This series implements using double immediates in the nouveau codegen code. > > This turns the following (nvc0) code: > 1: mov u32 $r2 0x00000000 (8) > 2: mov u32 $r3 0x3fe00000 (8) > 3: add f64 $r0d $r0d $r2d (8) > > Into: > 1: add f64 $r0d $r0d 0.500000 (8) > > This has been tested with the 2 double shader tests which I just send to > the piglet list. On a gk208 (gk110 / SM35) card, and by checking the output > of nouveau_compiler with both nvdisasm and envydis on gf100 / gk104 / gm107. > > Regards, > > Hans
Hans de Goede
2015-Nov-08 11:58 UTC
[Nouveau] [PATCH mesa 0/5] nouveau: codegen: Make use of double immediates
Hi, On 07-11-15 01:59, Ilia Mirkin wrote:> Hi Hans, > > All pushed. I made a few additional fixes and improvement to fp64 > immediate handling along the way, but all your commits were fine > as-is. (Except that they enabled fp64 immediates on nv50 implicitly > which is wrong -- there are no immediate-taking variants on nv50, so I > fixed that glitch. But only the G200 can do fp64 in the first place, > and nouveau doesn't actually expose it. Corner case of a corner case > :) )Right, I did actually think about that one a bit since Compute capability 1.3 does include doubles, but I figured that since we do not support doubles on nv50 at all that that would not be an issue, guess I should have mentioned this in one of the commit messages.> Thanks for taking care of this... it was a small bit of fp64 which I > always felt bad about not having finished up. (But not bad enough to > actually finish it myself.)You're welcome, this was a fun learning experience for me and I look forward to doing more work on the codegen bits in the future. But for now I will be spending my time on a tgsi backend for llvm, so sorry I will not be looking into: https://trello.com/c/DX357llE/71-fold-immediates-into-const-load-offsets Anytime soon, but I do plan to work more on the codegen code in the future. I will make sure to coordinate with you when I have time to work on codegen again to avoid doing double work. Regards, Hans
Reasonably Related Threads
- [PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax
- [PATCH mesa 1/6] tgsi_build: Fix return of uninitialized memory in tgsi_*_instruction_memory
- [PATCH mesa 5/6] nouveau: codegen: Add support for OpenCL global memory buffers
- [PATCH mesa 0/5] nouveau: codegen: Make use of double immediates
- [PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/