thr3ads.net - Nouveau - [Nouveau] [PATCH mesa 0/5] nouveau: codegen: Make use of double immediates [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Hans de Goede

2015-Nov-05 13:32 UTC

[Nouveau] [PATCH mesa 0/5] nouveau: codegen: Make use of double immediates

Hi All,

This series implements using double immediates in the nouveau codegen code.

This turns the following (nvc0) code:
      1: mov u32 $r2 0x00000000 (8)
      2: mov u32 $r3 0x3fe00000 (8)
      3: add f64 $r0d $r0d $r2d (8)
    
Into:
      1: add f64 $r0d $r0d 0.500000 (8)

This has been tested with the 2 double shader tests which I just send to
the piglet list. On a gk208 (gk110 / SM35) card, and by checking the output
of nouveau_compiler with both nvdisasm and envydis on gf100 / gk104 / gm107.

Regards,

Hans

Hans de Goede

2015-Nov-05 13:32 UTC

head link

[Nouveau] [PATCH mesa 1/5] nouveau: codegen: emit_nvc0: Add support for double immediates

Add support for encoding double immediates (up to 20 bits of precision)
into the generated nvc0 machine-code.

Signed-off-by: Hans de Goede <hdegoede at redhat.com>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index fd10314..8784f3b 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -323,6 +323,14 @@ CodeEmitterNVC0::setImmediate(const Instruction *i, const
int s)
    assert(imm);
    u32 = imm->reg.data.u32;
 
+   if ((code[0] & 0xf) == 0x1) {
+      // double immediate
+      uint64_t u64 = imm->reg.data.u64;
+      assert(!(u64 & 0x00000fffffffffffULL));
+      assert(!(code[1] & 0xc000));
+      code[0] |= ((u64 >> 44) & 0x3f) << 26;
+      code[1] |= 0xc000 | (u64 >> 50);
+   } else
    if ((code[0] & 0xf) == 0x2) {
       // LIMM
       code[0] |= (u32 & 0x3f) << 26;
-- 
2.5.0

Hans de Goede

2015-Nov-05 13:32 UTC

head link

[Nouveau] [PATCH mesa 2/5] nouveau: codegen: emit_gm107: Add support for double immediates

Add support for encoding double immediates (up to 20 bits of precision)
into the generated gm107 machine-code.

Signed-off-by: Hans de Goede <hdegoede at redhat.com>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index a327d57..7e6ed84 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -310,9 +310,12 @@ CodeEmitterGM107::emitIMMD(int pos, int len, const ValueRef
&ref)
    uint32_t val = imm->reg.data.u32;
 
    if (len == 19) {
-      if (isFloatType(insn->sType)) {
+      if (insn->sType == TYPE_F32 || insn->sType == TYPE_F16) {
          assert(!(val & 0x00000fff));
          val >>= 12;
+      } else if (insn->sType == TYPE_F64) {
+         assert(!(imm->reg.data.u64 & 0x00000fffffffffffULL));
+         val = imm->reg.data.u64 >> 44;
       }
       assert(!(val & 0xfff00000) || (val & 0xfff00000) == 0xfff00000);
       emitField( 56,   1, (val & 0x80000) >> 19);
-- 
2.5.0

Hans de Goede

2015-Nov-05 13:32 UTC

head link

[Nouveau] [PATCH mesa 3/5] nouveau: codegen: Add support for merge-s to the ConstantFolding pass

This allows later passes like LoadPropagation to properly deal with 64
bit immediates.

If the new 64 bit load this introduces does not get optimized away then
split64BitOpPostRA() will split this into 2 instructions again.

Signed-off-by: Hans de Goede <hdegoede at redhat.com>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 44f74c6..8e241f1 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -447,6 +447,7 @@ ConstantFolding::expr(Instruction *i,
 {
    struct Storage *const a = &imm0.reg, *const b = &imm1.reg;
    struct Storage res;
+   uint8_t fixSrc0Size = 0;
 
    memset(&res.data, 0, sizeof(res.data));
 
@@ -589,6 +590,18 @@ ConstantFolding::expr(Instruction *i,
       // the second argument will not be constant, but that can happen.
       res.data.u32 = a->data.u32 + b->data.u32;
       break;
+   case OP_MERGE:
+      switch (i->dType) {
+      case TYPE_U64:
+      case TYPE_S64:
+      case TYPE_F64:
+         res.data.u64 = (((uint64_t)b->data.u32) << 32) |
a->data.u32;
+         fixSrc0Size = 8;
+         break;
+      default:
+         return;
+      }
+      break;
    default:
       return;
    }
@@ -602,6 +615,8 @@ ConstantFolding::expr(Instruction *i,
    i->setSrc(1, NULL);
 
    i->getSrc(0)->reg.data = res.data;
+   if (fixSrc0Size)
+      i->getSrc(0)->reg.size = fixSrc0Size;
 
    switch (i->op) {
    case OP_MAD:
-- 
2.5.0

Hans de Goede

2015-Nov-05 13:32 UTC

head link

[Nouveau] [PATCH mesa 4/5] nouveau: codegen: Teach insnCanLoad about double immediates

Teach insnCanLoad about double immediates, together with the
"Add support for merge-s to the ConstantFolding pass"

This turns the following (nvc0) code:
  1: mov u32 $r2 0x00000000 (8)
  2: mov u32 $r3 0x3fe00000 (8)
  3: add f64 $r0d $r0d $r2d (8)

Into:
  1: add f64 $r0d $r0d 0.500000 (8)

Signed-off-by: Hans de Goede <hdegoede at redhat.com>
---
 .../nouveau/codegen/nv50_ir_target_nvc0.cpp        | 25 ++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
index 27df0eb..8f59d86 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
@@ -338,17 +338,30 @@ TargetNVC0::insnCanLoad(const Instruction *i, int s,
    if (sf == FILE_IMMEDIATE) {
       Storage &reg = ld->getSrc(0)->asImm()->reg;
 
-      if (typeSizeof(i->sType) > 4)
-         return false;
-      if (opInfo[i->op].immdBits != 0xffffffff) {
-         if (i->sType == TYPE_F32) {
+      if (opInfo[i->op].immdBits != 0xffffffff || typeSizeof(i->sType)
> 4) {
+         switch (i->sType) {
+         case TYPE_F64:
+            if (reg.data.u64 & 0x00000fffffffffffULL)
+               return false;
+            break;
+         case TYPE_F32:
             if (reg.data.u32 & 0xfff)
                return false;
-         } else
-         if (i->sType == TYPE_S32 || i->sType == TYPE_U32) {
+            break;
+         case TYPE_S32:
+         case TYPE_U32:
             // with u32, 0xfffff counts as 0xffffffff as well
             if (reg.data.s32 > 0x7ffff || reg.data.s32 < -0x80000)
                return false;
+            break;
+         case TYPE_U8:
+         case TYPE_S8:
+         case TYPE_U16:
+         case TYPE_S16:
+         case TYPE_F16:
+            break;
+         default:
+            return false;
          }
       } else
       if (i->op == OP_MAD || i->op == OP_FMA) {
-- 
2.5.0

Hans de Goede

2015-Nov-05 13:32 UTC

head link

[Nouveau] [PATCH mesa 5/5] nouveau: codegen: Add support for 64bit immediates to checkSwapSrc01

Now that we support 64 bit immediates in insnCanLoad, we need to swap
64 bit immediate sources too for optimal effect.

Signed-off-by: Hans de Goede <hdegoede at redhat.com>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 8e241f1..b952c76 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -155,7 +155,7 @@ private:
    void checkSwapSrc01(Instruction *);
 
    bool isCSpaceLoad(Instruction *);
-   bool isImmd32Load(Instruction *);
+   bool isImmdLoad(Instruction *);
    bool isAttribOrSharedLoad(Instruction *);
 };
 
@@ -166,9 +166,10 @@ LoadPropagation::isCSpaceLoad(Instruction *ld)
 }
 
 bool
-LoadPropagation::isImmd32Load(Instruction *ld)
+LoadPropagation::isImmdLoad(Instruction *ld)
 {
-   if (!ld || (ld->op != OP_MOV) || (typeSizeof(ld->dType) != 4))
+   if (!ld || (ld->op != OP_MOV) ||
+       ((typeSizeof(ld->dType) != 4) && (typeSizeof(ld->dType) !=
8)))
       return false;
    return ld->src(0).getFile() == FILE_IMMEDIATE;
 }
@@ -201,8 +202,8 @@ LoadPropagation::checkSwapSrc01(Instruction *insn)
       else
          return;
    } else
-   if (isImmd32Load(i0)) {
-      if (!isCSpaceLoad(i1) && !isImmd32Load(i1))
+   if (isImmdLoad(i0)) {
+      if (!isCSpaceLoad(i1) && !isImmdLoad(i1))
          insn->swapSources(0, 1);
       else
          return;
-- 
2.5.0

Ilia Mirkin

2015-Nov-07 00:59 UTC

head link

[Nouveau] [PATCH mesa 0/5] nouveau: codegen: Make use of double immediates

Hi Hans,

All pushed. I made a few additional fixes and improvement to fp64
immediate handling along the way, but all your commits were fine
as-is. (Except that they enabled fp64 immediates on nv50 implicitly
which is wrong -- there are no immediate-taking variants on nv50, so I
fixed that glitch. But only the G200 can do fp64 in the first place,
and nouveau doesn't actually expose it. Corner case of a corner case
:) )

Thanks for taking care of this... it was a small bit of fp64 which I
always felt bad about not having finished up. (But not bad enough to
actually finish it myself.)

Cheers,

  -ilia

On Thu, Nov 5, 2015 at 8:32 AM, Hans de Goede <hdegoede at redhat.com>
wrote:> Hi All,
>
> This series implements using double immediates in the nouveau codegen code.
>
> This turns the following (nvc0) code:
>       1: mov u32 $r2 0x00000000 (8)
>       2: mov u32 $r3 0x3fe00000 (8)
>       3: add f64 $r0d $r0d $r2d (8)
>
> Into:
>       1: add f64 $r0d $r0d 0.500000 (8)
>
> This has been tested with the 2 double shader tests which I just send to
> the piglet list. On a gk208 (gk110 / SM35) card, and by checking the output
> of nouveau_compiler with both nvdisasm and envydis on gf100 / gk104 /
gm107.
>
> Regards,
>
> Hans

Hans de Goede

2015-Nov-08 11:58 UTC

head link

[Nouveau] [PATCH mesa 0/5] nouveau: codegen: Make use of double immediates

Hi,

On 07-11-15 01:59, Ilia Mirkin wrote:> Hi Hans,
>
> All pushed. I made a few additional fixes and improvement to fp64
> immediate handling along the way, but all your commits were fine
> as-is. (Except that they enabled fp64 immediates on nv50 implicitly
> which is wrong -- there are no immediate-taking variants on nv50, so I
> fixed that glitch. But only the G200 can do fp64 in the first place,
> and nouveau doesn't actually expose it. Corner case of a corner case
> :) )
Right, I did actually think about that one a bit since Compute capability
1.3 does include doubles, but I figured that since we do not support doubles
on nv50 at all that that would not be an issue, guess I should have
mentioned this in one of the commit messages.
> Thanks for taking care of this... it was a small bit of fp64 which I
> always felt bad about not having finished up. (But not bad enough to
> actually finish it myself.)
You're welcome, this was a fun learning experience for me and I look
forward to doing more work on the codegen bits in the future. But for now
I will be spending my time on a tgsi backend for llvm, so sorry I will
not be looking into:

https://trello.com/c/DX357llE/71-fold-immediates-into-const-load-offsets

Anytime soon, but I do plan to work more on the codegen code in the
future. I will make sure to coordinate with you when I have time to
work on codegen again to avoid doing double work.

Regards,

Hans

Possibly Parallel Threads

Search for more possibly parallel threads

Nouveau - Nov 2015 - [PATCH mesa 0/5] nouveau: codegen: Make use of double immediates

[Nouveau] [PATCH mesa 0/5] nouveau: codegen: Make use of double immediates

[Nouveau] [PATCH mesa 1/5] nouveau: codegen: emit_nvc0: Add support for double immediates

[Nouveau] [PATCH mesa 2/5] nouveau: codegen: emit_gm107: Add support for double immediates

[Nouveau] [PATCH mesa 3/5] nouveau: codegen: Add support for merge-s to the ConstantFolding pass

[Nouveau] [PATCH mesa 4/5] nouveau: codegen: Teach insnCanLoad about double immediates

[Nouveau] [PATCH mesa 5/5] nouveau: codegen: Add support for 64bit immediates to checkSwapSrc01

[Nouveau] [PATCH mesa 0/5] nouveau: codegen: Make use of double immediates

[Nouveau] [PATCH mesa 0/5] nouveau: codegen: Make use of double immediates

Possibly Parallel Threads