thr3ads.net - Nouveau - [Nouveau] [RFC 0/9] Add precise/invariant semantics to TGSI [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Karol Herbst

2017-Jun-11 18:42 UTC

[Nouveau] [RFC 0/9] Add precise/invariant semantics to TGSI

Running Tomb Raider on Nouveau I found some flicker caused by ignoring precise
modifiers on variables inside Nouveau.

This series add precise/invariant handling to TGSI, which can be then used by
drivers to disable certain unsafe optimisations which may otherwise alter
calculations, which depend on having the same result across shaders.

This series fixes this bug in Tomb Raider and one CTS test for 4.4 and 4.5

Note on Patch 3: I really dislike how I tell glsl_to_tgsi_visitor to apply the
precise flag on instruction emited in ir_assignment->rhs->accept(); but I
found
no other easy way to handle this. Maybe somebody of you has a better idea?

Karol Herbst (9):
  tgsi: add precise flag to tgsi_instruction
  tgsi/dump: print _PRECISE modifier on Instrutions
  st/glsl_to_tgsi: handle precise modifier
  tgsi: populate precise
  tgsi/text: parse _PRECISE modifier
  nv50/ir: add precise field to Instruction
  nv50/ir/tgsi: handle precise for most ALU instructions
  nv50/ir: disable mul+add to mad for precise instructions
  nv50/ir/tgsi: split mad to mul+add

 src/gallium/auxiliary/tgsi/tgsi_build.c            |  4 +
 src/gallium/auxiliary/tgsi/tgsi_dump.c             |  4 +
 src/gallium/auxiliary/tgsi/tgsi_text.c             | 15 +++-
 src/gallium/auxiliary/tgsi/tgsi_ureg.c             | 14 +++-
 src/gallium/auxiliary/tgsi/tgsi_ureg.h             | 20 ++++-
 src/gallium/auxiliary/util/u_simple_shaders.c      |  2 +-
 src/gallium/drivers/nouveau/codegen/nv50_ir.h      |  1 +
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 16 ++++
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   |  6 +-
 src/gallium/include/pipe/p_shader_tokens.h         |  3 +-
 src/gallium/state_trackers/nine/nine_shader.c      |  6 +-
 src/mesa/state_tracker/st_atifs_to_tgsi.c          | 38 ++++-----
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp         | 92 +++++++++++++++++-----
 src/mesa/state_tracker/st_mesa_to_tgsi.c           |  8 +-
 src/mesa/state_tracker/st_pbo.c                    |  2 +-
 15 files changed, 172 insertions(+), 59 deletions(-)

-- 
2.13.1

Karol Herbst

2017-Jun-11 18:42 UTC

head link

[Nouveau] [RFC 1/9] tgsi: add precise flag to tgsi_instruction

Signed-off-by: Karol Herbst <karolherbst at gmail.com>
---
 src/gallium/auxiliary/tgsi/tgsi_build.c    | 1 +
 src/gallium/include/pipe/p_shader_tokens.h | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c
b/src/gallium/auxiliary/tgsi/tgsi_build.c
index 00843241f8..55e4d064ed 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_build.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_build.c
@@ -642,6 +642,7 @@ tgsi_default_instruction( void )
    instruction.Label = 0;
    instruction.Texture = 0;
    instruction.Memory = 0;
+   instruction.Precise = 0;
    instruction.Padding = 0;
 
    return instruction;
diff --git a/src/gallium/include/pipe/p_shader_tokens.h
b/src/gallium/include/pipe/p_shader_tokens.h
index 1e08d97329..aa0fb3e3b3 100644
--- a/src/gallium/include/pipe/p_shader_tokens.h
+++ b/src/gallium/include/pipe/p_shader_tokens.h
@@ -638,7 +638,8 @@ struct tgsi_instruction
    unsigned Label      : 1;
    unsigned Texture    : 1;
    unsigned Memory     : 1;
-   unsigned Padding    : 2;
+   unsigned Precise    : 1;
+   unsigned Padding    : 1;
 };
 
 /*
-- 
2.13.1

Karol Herbst

2017-Jun-11 18:42 UTC

head link

[Nouveau] [RFC 2/9] tgsi/dump: print _PRECISE modifier on Instrutions

Signed-off-by: Karol Herbst <karolherbst at gmail.com>
---
 src/gallium/auxiliary/tgsi/tgsi_dump.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c
b/src/gallium/auxiliary/tgsi/tgsi_dump.c
index f6eba7424b..b58e64511c 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_dump.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c
@@ -584,6 +584,10 @@ iter_instruction(
       TXT( "_SAT" );
    }
 
+   if (inst->Instruction.Precise) {
+      TXT( "_PRECISE" );
+   }
+
    for (i = 0; i < inst->Instruction.NumDstRegs; i++) {
       const struct tgsi_full_dst_register *dst = &inst->Dst[i];
 
-- 
2.13.1

Karol Herbst

2017-Jun-11 18:42 UTC

head link

[Nouveau] [RFC 3/9] st/glsl_to_tgsi: handle precise modifier

all subexpression inside an ir_assignment needs to be tagged as precise.

Signed-off-by: Karol Herbst <karolherbst at gmail.com>
---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 80 ++++++++++++++++++++++++------
 1 file changed, 65 insertions(+), 15 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index c5d2e0fcd2..19f90f21fe 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -87,6 +87,13 @@ static int swizzle_for_type(const glsl_type *type, int
component = 0)
    return swizzle;
 }
 
+static unsigned is_precise(const ir_variable *ir)
+{
+   if (!ir)
+      return 0;
+   return ir->data.precise || ir->data.invariant;
+}
+
 /**
  * This struct is a corresponding struct to TGSI ureg_src.
  */
@@ -296,6 +303,7 @@ public:
    ir_instruction *ir;
 
    unsigned op:8; /**< TGSI opcode */
+   unsigned precise:1;
    unsigned saturate:1;
    unsigned is_64bit_expanded:1;
    unsigned sampler_base:5;
@@ -435,6 +443,7 @@ public:
    bool have_fma;
    bool use_shared_memory;
    bool has_tex_txf_lz;
+   unsigned precise;
 
    variable_storage *find_variable_storage(ir_variable *var);
 
@@ -505,13 +514,29 @@ public:
                                       st_src_reg src0 = undef_src,
                                       st_src_reg src1 = undef_src,
                                       st_src_reg src2 = undef_src,
-                                      st_src_reg src3 = undef_src);
+                                      st_src_reg src3 = undef_src,
+                                      unsigned precise = 0);
 
    glsl_to_tgsi_instruction *emit_asm(ir_instruction *ir, unsigned op,
                                       st_dst_reg dst, st_dst_reg dst1,
                                       st_src_reg src0 = undef_src,
                                       st_src_reg src1 = undef_src,
                                       st_src_reg src2 = undef_src,
+                                      st_src_reg src3 = undef_src,
+                                      unsigned precise = 0);
+
+   glsl_to_tgsi_instruction *emit_asm(ir_expression *ir, unsigned op,
+                                      st_dst_reg dst = undef_dst,
+                                      st_src_reg src0 = undef_src,
+                                      st_src_reg src1 = undef_src,
+                                      st_src_reg src2 = undef_src,
+                                      st_src_reg src3 = undef_src);
+
+   glsl_to_tgsi_instruction *emit_asm(ir_expression *ir, unsigned op,
+                                      st_dst_reg dst, st_dst_reg dst1,
+                                      st_src_reg src0 = undef_src,
+                                      st_src_reg src1 = undef_src,
+                                      st_src_reg src2 = undef_src,
                                       st_src_reg src3 = undef_src);
 
    unsigned get_opcode(unsigned op,
@@ -650,7 +675,8 @@ glsl_to_tgsi_instruction *
 glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op,
                                st_dst_reg dst, st_dst_reg dst1,
                                st_src_reg src0, st_src_reg src1,
-                               st_src_reg src2, st_src_reg src3)
+                               st_src_reg src2, st_src_reg src3,
+                               unsigned precise)
 {
    glsl_to_tgsi_instruction *inst = new(mem_ctx) glsl_to_tgsi_instruction();
    int num_reladdr = 0, i, j;
@@ -691,6 +717,7 @@ glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned
op,
    STATIC_ASSERT(TGSI_OPCODE_LAST <= 255);
 
    inst->op = op;
+   inst->precise = precise;
    inst->info = tgsi_get_opcode_info(op);
    inst->dst[0] = dst;
    inst->dst[1] = dst1;
@@ -881,9 +908,28 @@ glsl_to_tgsi_instruction *
 glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op,
                                st_dst_reg dst,
                                st_src_reg src0, st_src_reg src1,
+                               st_src_reg src2, st_src_reg src3,
+                               unsigned precise)
+{
+   return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3, precise);
+}
+
+glsl_to_tgsi_instruction *
+glsl_to_tgsi_visitor::emit_asm(ir_expression *ir, unsigned op,
+                               st_dst_reg dst,
+                               st_src_reg src0, st_src_reg src1,
+                               st_src_reg src2, st_src_reg src3)
+{
+   return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3,
this->precise);
+}
+
+glsl_to_tgsi_instruction *
+glsl_to_tgsi_visitor::emit_asm(ir_expression *ir, unsigned op,
+                               st_dst_reg dst, st_dst_reg dst1,
+                               st_src_reg src0, st_src_reg src1,
                                st_src_reg src2, st_src_reg src3)
 {
-   return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3);
+   return emit_asm(ir, op, dst, dst1, src0, src1, src2, src3,
this->precise);
 }
 
 /**
@@ -1116,7 +1162,7 @@ glsl_to_tgsi_visitor::emit_arl(ir_instruction *ir,
    if (dst.index >= this->num_address_regs)
       this->num_address_regs = dst.index + 1;
 
-   emit_asm(NULL, op, dst, src0);
+   emit_asm((ir_instruction *)NULL, op, dst, src0);
 }
 
 int
@@ -1406,11 +1452,11 @@ glsl_to_tgsi_visitor::visit(ir_variable *ir)
 void
 glsl_to_tgsi_visitor::visit(ir_loop *ir)
 {
-   emit_asm(NULL, TGSI_OPCODE_BGNLOOP);
+   emit_asm((ir_instruction *)NULL, TGSI_OPCODE_BGNLOOP);
 
    visit_exec_list(&ir->body_instructions, this);
 
-   emit_asm(NULL, TGSI_OPCODE_ENDLOOP);
+   emit_asm((ir_instruction *)NULL, TGSI_OPCODE_ENDLOOP);
 }
 
 void
@@ -1418,10 +1464,10 @@ glsl_to_tgsi_visitor::visit(ir_loop_jump *ir)
 {
    switch (ir->mode) {
    case ir_loop_jump::jump_break:
-      emit_asm(NULL, TGSI_OPCODE_BRK);
+      emit_asm((ir_instruction *)NULL, TGSI_OPCODE_BRK);
       break;
    case ir_loop_jump::jump_continue:
-      emit_asm(NULL, TGSI_OPCODE_CONT);
+      emit_asm((ir_instruction *)NULL, TGSI_OPCODE_CONT);
       break;
    }
 }
@@ -2703,7 +2749,7 @@ glsl_to_tgsi_visitor::visit(ir_dereference_variable *ir)
             st_dst_reg dst = st_dst_reg(get_temp(var->type));
             st_src_reg src = st_src_reg(PROGRAM_OUTPUT, decl->mesa_index,
                                         var->type, component,
decl->array_id);
-            emit_asm(NULL, TGSI_OPCODE_FBFETCH, dst, src);
+            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_FBFETCH, dst, src);
             entry = new(mem_ctx) variable_storage(var, dst.file, dst.index,
                                                   dst.array_id);
          } else {
@@ -3148,7 +3194,10 @@ glsl_to_tgsi_visitor::visit(ir_assignment *ir)
    st_dst_reg l;
    st_src_reg r;
 
+   /* all generated instructions need to be flaged as precise */
+   this->precise = is_precise(ir->lhs->variable_referenced());
    ir->rhs->accept(this);
+   this->precise = 0;
    r = this->result;
 
    l = get_assignment_lhs(ir->lhs, this, &dst_component);
@@ -3233,7 +3282,8 @@ glsl_to_tgsi_visitor::visit(ir_assignment *ir)
        */
       glsl_to_tgsi_instruction *inst, *new_inst;
       inst = (glsl_to_tgsi_instruction *)this->instructions.get_tail();
-      new_inst = emit_asm(ir, inst->op, l, inst->src[0], inst->src[1],
inst->src[2], inst->src[3]);
+      new_inst = emit_asm(ir, inst->op, l, inst->src[0], inst->src[1],
inst->src[2], inst->src[3],
+                          is_precise(ir->lhs->variable_referenced()));
       new_inst->saturate = inst->saturate;
       inst->dead_mask = inst->dst[0].writemask;
    } else {
@@ -4072,16 +4122,16 @@ glsl_to_tgsi_visitor::calc_deref_offsets(ir_dereference
*tail,
 
          deref_arr->array_index->accept(this);
          if (*array_elements != 1)
-            emit_asm(NULL, TGSI_OPCODE_MUL, temp_dst, this->result,
st_src_reg_for_int(*array_elements));
+            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MUL, temp_dst,
this->result, st_src_reg_for_int(*array_elements));
          else
-            emit_asm(NULL, TGSI_OPCODE_MOV, temp_dst, this->result);
+            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MOV, temp_dst,
this->result);
 
          if (indirect->file == PROGRAM_UNDEFINED)
             *indirect = temp_reg;
          else {
             temp_dst = st_dst_reg(*indirect);
             temp_dst.writemask = 1;
-            emit_asm(NULL, TGSI_OPCODE_ADD, temp_dst, *indirect, temp_reg);
+            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_ADD, temp_dst,
*indirect, temp_reg);
          }
       } else
          *index += array_index->value.u[0] * *array_elements;
@@ -4141,7 +4191,7 @@
glsl_to_tgsi_visitor::canonicalize_gather_offset(st_src_reg offset)
       st_src_reg tmp = get_temp(glsl_type::ivec2_type);
       st_dst_reg tmp_dst = st_dst_reg(tmp);
       tmp_dst.writemask = WRITEMASK_XY;
-      emit_asm(NULL, TGSI_OPCODE_MOV, tmp_dst, offset);
+      emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MOV, tmp_dst, offset);
       return tmp;
    }
 
@@ -6777,7 +6827,7 @@ get_mesa_program_tgsi(struct gl_context *ctx,
    v->renumber_registers();
 
    /* Write the END instruction. */
-   v->emit_asm(NULL, TGSI_OPCODE_END);
+   v->emit_asm((ir_instruction *)NULL, TGSI_OPCODE_END);
 
    if (ctx->_Shader->Flags & GLSL_DUMP) {
       _mesa_log("\n");
-- 
2.13.1

Karol Herbst

2017-Jun-11 18:42 UTC

head link

[Nouveau] [RFC 4/9] tgsi: populate precise

Only implemented for glsl->tgsi. Other converters just set precise to 0.

Signed-off-by: Karol Herbst <karolherbst at gmail.com>
---
 src/gallium/auxiliary/tgsi/tgsi_build.c       |  3 +++
 src/gallium/auxiliary/tgsi/tgsi_ureg.c        | 14 +++++++---
 src/gallium/auxiliary/tgsi/tgsi_ureg.h        | 20 +++++++++++---
 src/gallium/auxiliary/util/u_simple_shaders.c |  2 +-
 src/gallium/state_trackers/nine/nine_shader.c |  6 ++---
 src/mesa/state_tracker/st_atifs_to_tgsi.c     | 38 +++++++++++++--------------
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp    | 12 ++++-----
 src/mesa/state_tracker/st_mesa_to_tgsi.c      |  8 +++---
 src/mesa/state_tracker/st_pbo.c               |  2 +-
 9 files changed, 65 insertions(+), 40 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c
b/src/gallium/auxiliary/tgsi/tgsi_build.c
index 55e4d064ed..144a017768 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_build.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_build.c
@@ -651,6 +651,7 @@ tgsi_default_instruction( void )
 static struct tgsi_instruction
 tgsi_build_instruction(unsigned opcode,
                        unsigned saturate,
+                       unsigned precise,
                        unsigned num_dst_regs,
                        unsigned num_src_regs,
                        struct tgsi_header *header)
@@ -665,6 +666,7 @@ tgsi_build_instruction(unsigned opcode,
    instruction = tgsi_default_instruction();
    instruction.Opcode = opcode;
    instruction.Saturate = saturate;
+   instruction.Precise = precise;
    instruction.NumDstRegs = num_dst_regs;
    instruction.NumSrcRegs = num_src_regs;
 
@@ -1061,6 +1063,7 @@ tgsi_build_full_instruction(
 
    *instruction = tgsi_build_instruction(full_inst->Instruction.Opcode,
                                          full_inst->Instruction.Saturate,
+                                         full_inst->Instruction.Precise,
                                          full_inst->Instruction.NumDstRegs,
                                          full_inst->Instruction.NumSrcRegs,
                                          header);
diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
index 5bd779728a..56db2252c5 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
@@ -1213,6 +1213,7 @@ struct ureg_emit_insn_result
 ureg_emit_insn(struct ureg_program *ureg,
                unsigned opcode,
                boolean saturate,
+               unsigned precise,
                unsigned num_dst,
                unsigned num_src)
 {
@@ -1226,6 +1227,7 @@ ureg_emit_insn(struct ureg_program *ureg,
    out[0].insn = tgsi_default_instruction();
    out[0].insn.Opcode = opcode;
    out[0].insn.Saturate = saturate;
+   out[0].insn.Precise = precise;
    out[0].insn.NumDstRegs = num_dst;
    out[0].insn.NumSrcRegs = num_src;
 
@@ -1354,7 +1356,8 @@ ureg_insn(struct ureg_program *ureg,
           const struct ureg_dst *dst,
           unsigned nr_dst,
           const struct ureg_src *src,
-          unsigned nr_src )
+          unsigned nr_src,
+          unsigned precise )
 {
    struct ureg_emit_insn_result insn;
    unsigned i;
@@ -1369,6 +1372,7 @@ ureg_insn(struct ureg_program *ureg,
    insn = ureg_emit_insn(ureg,
                          opcode,
                          saturate,
+                         precise,
                          nr_dst,
                          nr_src);
 
@@ -1391,7 +1395,8 @@ ureg_tex_insn(struct ureg_program *ureg,
               const struct tgsi_texture_offset *texoffsets,
               unsigned nr_offset,
               const struct ureg_src *src,
-              unsigned nr_src )
+              unsigned nr_src,
+              unsigned precise )
 {
    struct ureg_emit_insn_result insn;
    unsigned i;
@@ -1406,6 +1411,7 @@ ureg_tex_insn(struct ureg_program *ureg,
    insn = ureg_emit_insn(ureg,
                          opcode,
                          saturate,
+                         precise,
                          nr_dst,
                          nr_src);
 
@@ -1434,7 +1440,8 @@ ureg_memory_insn(struct ureg_program *ureg,
                  unsigned nr_src,
                  unsigned qualifier,
                  unsigned texture,
-                 unsigned format)
+                 unsigned format,
+                 unsigned precise)
 {
    struct ureg_emit_insn_result insn;
    unsigned i;
@@ -1442,6 +1449,7 @@ ureg_memory_insn(struct ureg_program *ureg,
    insn = ureg_emit_insn(ureg,
                          opcode,
                          FALSE,
+                         precise,
                          nr_dst,
                          nr_src);
 
diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.h
b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
index 54f95ba565..105c85abd5 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_ureg.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
@@ -546,7 +546,8 @@ ureg_insn(struct ureg_program *ureg,
           const struct ureg_dst *dst,
           unsigned nr_dst,
           const struct ureg_src *src,
-          unsigned nr_src );
+          unsigned nr_src,
+          unsigned precise);
 
 
 void
@@ -559,7 +560,8 @@ ureg_tex_insn(struct ureg_program *ureg,
               const struct tgsi_texture_offset *texoffsets,
               unsigned nr_offset,
               const struct ureg_src *src,
-              unsigned nr_src );
+              unsigned nr_src,
+              unsigned precise);
 
 
 void
@@ -571,7 +573,8 @@ ureg_memory_insn(struct ureg_program *ureg,
                  unsigned nr_src,
                  unsigned qualifier,
                  unsigned texture,
-                 unsigned format);
+                 unsigned format,
+                 unsigned precise);
 
 /***********************************************************************
  * Internal instruction helpers, don't call these directly:
@@ -586,6 +589,7 @@ struct ureg_emit_insn_result
 ureg_emit_insn(struct ureg_program *ureg,
                unsigned opcode,
                boolean saturate,
+               unsigned precise,
                unsigned num_dst,
                unsigned num_src);
 
@@ -632,6 +636,7 @@ static inline void ureg_##op( struct ureg_program *ureg )   
\
                          opcode,                                \
                          FALSE,                                 \
                          0,                                     \
+                         0,                                     \
                          0);                                    \
    ureg_fixup_insn_size( ureg, insn.insn_token );               \
 }
@@ -646,6 +651,7 @@ static inline void ureg_##op( struct ureg_program *ureg,    
\
                          opcode,                                \
                          FALSE,                                 \
                          0,                                     \
+                         0,                                     \
                          1);                                    \
    ureg_emit_src( ureg, src );                                  \
    ureg_fixup_insn_size( ureg, insn.insn_token );               \
@@ -661,6 +667,7 @@ static inline void ureg_##op( struct ureg_program *ureg,    
\
                          opcode,                                \
                          FALSE,                                 \
                          0,                                     \
+                         0,                                     \
                          0);                                    \
    ureg_emit_label( ureg, insn.extended_token, label_token );   \
    ureg_fixup_insn_size( ureg, insn.insn_token );               \
@@ -677,6 +684,7 @@ static inline void ureg_##op( struct ureg_program *ureg,    
\
                          opcode,                                \
                          FALSE,                                 \
                          0,                                     \
+                         0,                                     \
                          1);                                    \
    ureg_emit_label( ureg, insn.extended_token, label_token );   \
    ureg_emit_src( ureg, src );                                  \
@@ -694,6 +702,7 @@ static inline void ureg_##op( struct ureg_program *ureg,    
\
    insn = ureg_emit_insn(ureg,                                          \
                          opcode,                                        \
                          dst.Saturate,                                  \
+                         0,                                             \
                          1,                                             \
                          0);                                            \
    ureg_emit_dst( ureg, dst );                                          \
@@ -713,6 +722,7 @@ static inline void ureg_##op( struct ureg_program *ureg,    
\
    insn = ureg_emit_insn(ureg,                                          \
                          opcode,                                        \
                          dst.Saturate,                                  \
+                         0,                                             \
                          1,                                             \
                          1);                                            \
    ureg_emit_dst( ureg, dst );                                          \
@@ -733,6 +743,7 @@ static inline void ureg_##op( struct ureg_program *ureg,    
\
    insn = ureg_emit_insn(ureg,                                          \
                          opcode,                                        \
                          dst.Saturate,                                  \
+                         0,                                             \
                          1,                                             \
                          2);                                            \
    ureg_emit_dst( ureg, dst );                                          \
@@ -756,6 +767,7 @@ static inline void ureg_##op( struct ureg_program *ureg,    
\
    insn = ureg_emit_insn(ureg,                                          \
                          opcode,                                        \
                          dst.Saturate,                                  \
+                         0,                                             \
                          1,                                             \
                          2);                                            \
    ureg_emit_texture( ureg, insn.extended_token, target,                \
@@ -780,6 +792,7 @@ static inline void ureg_##op( struct ureg_program *ureg,    
\
    insn = ureg_emit_insn(ureg,                                          \
                          opcode,                                        \
                          dst.Saturate,                                  \
+                         0,                                             \
                          1,                                             \
                          3);                                            \
    ureg_emit_dst( ureg, dst );                                          \
@@ -806,6 +819,7 @@ static inline void ureg_##op( struct ureg_program *ureg,    
\
    insn = ureg_emit_insn(ureg,                                          \
                          opcode,                                        \
                          dst.Saturate,                                  \
+                         0,                                             \
                          1,                                             \
                          4);                                            \
    ureg_emit_texture( ureg, insn.extended_token, target,                \
diff --git a/src/gallium/auxiliary/util/u_simple_shaders.c
b/src/gallium/auxiliary/util/u_simple_shaders.c
index 5874d0e9aa..79331b5638 100644
--- a/src/gallium/auxiliary/util/u_simple_shaders.c
+++ b/src/gallium/auxiliary/util/u_simple_shaders.c
@@ -954,7 +954,7 @@ util_make_geometry_passthrough_shader(struct pipe_context
*pipe,
    }
 
    /* EMIT IMM[0] */
-   ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, &imm, 1);
+   ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, &imm, 1, 0);
 
    /* END */
    ureg_END(ureg);
diff --git a/src/gallium/state_trackers/nine/nine_shader.c
b/src/gallium/state_trackers/nine/nine_shader.c
index 40fb6be88f..f405090811 100644
--- a/src/gallium/state_trackers/nine/nine_shader.c
+++ b/src/gallium/state_trackers/nine/nine_shader.c
@@ -1879,7 +1879,7 @@ DECL_SPECIAL(IFC)
     struct ureg_dst tmp = ureg_writemask(tx_scratch(tx), TGSI_WRITEMASK_X);
     src[0] = tx_src_param(tx, &tx->insn.src[0]);
     src[1] = tx_src_param(tx, &tx->insn.src[1]);
-    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2);
+    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2, 0);
     ureg_IF(tx->ureg, ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X),
tx_cond(tx));
     return D3D_OK;
 }
@@ -1897,7 +1897,7 @@ DECL_SPECIAL(BREAKC)
     struct ureg_dst tmp = ureg_writemask(tx_scratch(tx), TGSI_WRITEMASK_X);
     src[0] = tx_src_param(tx, &tx->insn.src[0]);
     src[1] = tx_src_param(tx, &tx->insn.src[1]);
-    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2);
+    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2, 0);
     ureg_IF(tx->ureg, ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X),
tx_cond(tx));
     ureg_BRK(tx->ureg);
     tx_endcond(tx);
@@ -3029,7 +3029,7 @@ NineTranslateInstruction_Generic(struct shader_translator
*tx)
 
     ureg_insn(tx->ureg, tx->insn.info->opcode,
               dst, tx->insn.ndst,
-              src, tx->insn.nsrc);
+              src, tx->insn.nsrc, 0);
     return D3D_OK;
 }
 
diff --git a/src/mesa/state_tracker/st_atifs_to_tgsi.c
b/src/mesa/state_tracker/st_atifs_to_tgsi.c
index 338ced56ed..e0a6ff7131 100644
--- a/src/mesa/state_tracker/st_atifs_to_tgsi.c
+++ b/src/mesa/state_tracker/st_atifs_to_tgsi.c
@@ -105,18 +105,18 @@ apply_swizzle(struct st_translate *t,
       imm[0] = src;
       imm[1] = ureg_imm4f(t->ureg, 1.0f, 1.0f, 0.0f, 0.0f);
       imm[2] = ureg_imm4f(t->ureg, 0.0f, 0.0f, 1.0f, 1.0f);
-      ureg_insn(t->ureg, TGSI_OPCODE_MAD, &tmp[0], 1, imm, 3);
+      ureg_insn(t->ureg, TGSI_OPCODE_MAD, &tmp[0], 1, imm, 3, 0);
 
       if (swizzle == GL_SWIZZLE_STR_DR_ATI) {
          imm[0] = ureg_scalar(src, TGSI_SWIZZLE_Z);
       } else {
          imm[0] = ureg_scalar(src, TGSI_SWIZZLE_W);
       }
-      ureg_insn(t->ureg, TGSI_OPCODE_RCP, &tmp[1], 1, &imm[0], 1);
+      ureg_insn(t->ureg, TGSI_OPCODE_RCP, &tmp[1], 1, &imm[0], 1,
0);
 
       imm[0] = ureg_src(tmp[0]);
       imm[1] = ureg_src(tmp[1]);
-      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &tmp[0], 1, imm, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &tmp[0], 1, imm, 2, 0);
 
       return ureg_src(tmp[0]);
    }
@@ -170,35 +170,35 @@ prepare_argument(struct st_translate *t, const unsigned
argId,
       src = ureg_scalar(src, TGSI_SWIZZLE_W);
       break;
    }
-   ureg_insn(t->ureg, TGSI_OPCODE_MOV, &arg, 1, &src, 1);
+   ureg_insn(t->ureg, TGSI_OPCODE_MOV, &arg, 1, &src, 1, 0);
 
    if (srcReg->argMod & GL_COMP_BIT_ATI) {
       struct ureg_src modsrc[2];
       modsrc[0] = ureg_imm1f(t->ureg, 1.0f);
       modsrc[1] = ureg_negate(ureg_src(arg));
 
-      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0);
    }
    if (srcReg->argMod & GL_BIAS_BIT_ATI) {
       struct ureg_src modsrc[2];
       modsrc[0] = ureg_src(arg);
       modsrc[1] = ureg_imm1f(t->ureg, -0.5f);
 
-      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0);
    }
    if (srcReg->argMod & GL_2X_BIT_ATI) {
       struct ureg_src modsrc[2];
       modsrc[0] = ureg_src(arg);
       modsrc[1] = ureg_src(arg);
 
-      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0);
    }
    if (srcReg->argMod & GL_NEGATE_BIT_ATI) {
       struct ureg_src modsrc[2];
       modsrc[0] = ureg_src(arg);
       modsrc[1] = ureg_imm1f(t->ureg, -1.0f);
 
-      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &arg, 1, modsrc, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &arg, 1, modsrc, 2, 0);
    }
    return  ureg_src(arg);
 }
@@ -217,25 +217,25 @@ emit_special_inst(struct st_translate *t, const struct
instruction_desc *desc,
       tmp[0] = get_temp(t, MAX_NUM_FRAGMENT_REGISTERS_ATI + 2); /* re-purpose
a3 */
       src[0] = ureg_imm1f(t->ureg, 0.5f);
       src[1] = ureg_negate(args[2]);
-      ureg_insn(t->ureg, TGSI_OPCODE_ADD, tmp, 1, src, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_ADD, tmp, 1, src, 2, 0);
       src[0] = ureg_src(tmp[0]);
       src[1] = args[0];
       src[2] = args[1];
-      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3);
+      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3, 0);
    } else if (!strcmp(desc->name, "CND0")) {
       src[0] = args[2];
       src[1] = args[1];
       src[2] = args[0];
-      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3);
+      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3, 0);
    } else if (!strcmp(desc->name, "DOT2_ADD")) {
       /* note: DP2A is not implemented in most pipe drivers */
       tmp[0] = get_temp(t, MAX_NUM_FRAGMENT_REGISTERS_ATI); /* re-purpose a1 */
       src[0] = args[0];
       src[1] = args[1];
-      ureg_insn(t->ureg, TGSI_OPCODE_DP2, tmp, 1, src, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_DP2, tmp, 1, src, 2, 0);
       src[0] = ureg_src(tmp[0]);
       src[1] = ureg_scalar(args[2], TGSI_SWIZZLE_Z);
-      ureg_insn(t->ureg, TGSI_OPCODE_ADD, dst, 1, src, 2);
+      ureg_insn(t->ureg, TGSI_OPCODE_ADD, dst, 1, src, 2, 0);
    }
 }
 
@@ -249,7 +249,7 @@ emit_arith_inst(struct st_translate *t,
       return;
    }
 
-   ureg_insn(t->ureg, desc->TGSI_opcode, dst, 1, args, argcount);
+   ureg_insn(t->ureg, desc->TGSI_opcode, dst, 1, args, argcount, 0);
 }
 
 static void
@@ -292,7 +292,7 @@ emit_dstmod(struct st_translate *t,
    if (dstMod & GL_SATURATE_BIT_ATI) {
       dst = ureg_saturate(dst);
    }
-   ureg_insn(t->ureg, TGSI_OPCODE_MUL, &dst, 1, src, 2);
+   ureg_insn(t->ureg, TGSI_OPCODE_MUL, &dst, 1, src, 2, 0);
 }
 
 /**
@@ -334,9 +334,9 @@ compile_setupinst(struct st_translate *t,
       src[1] = t->samplers[r];
       /* the texture target is still unknown, it will be fixed in the draw call
*/
       ureg_tex_insn(t->ureg, TGSI_OPCODE_TEX, dst, 1, TGSI_TEXTURE_2D,
-                    TGSI_RETURN_TYPE_FLOAT, NULL, 0, src, 2);
+                    TGSI_RETURN_TYPE_FLOAT, NULL, 0, src, 2, 0);
    } else if (texinst->Opcode == ATI_FRAGMENT_SHADER_PASS_OP) {
-      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1);
+      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1, 0);
    }
 
    t->regs_written[t->current_pass][r] = true;
@@ -408,11 +408,11 @@ finalize_shader(struct st_translate *t, unsigned
numPasses)
       /* copy the result into the OUT slot */
       dst[0] = t->outputs[t->outputMapping[FRAG_RESULT_COLOR]];
       src[0] = ureg_src(t->temps[0]);
-      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1);
+      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1, 0);
    }
 
    /* signal the end of the program */
-   ureg_insn(t->ureg, TGSI_OPCODE_END, dst, 0, src, 0);
+   ureg_insn(t->ureg, TGSI_OPCODE_END, dst, 0, src, 0, 0);
 }
 
 /**
diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 19f90f21fe..ecd9f9f280 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -5900,7 +5900,7 @@ compile_tgsi_instruction(struct st_translate *t,
    case TGSI_OPCODE_IF:
    case TGSI_OPCODE_UIF:
       assert(num_dst == 0);
-      ureg_insn(ureg, inst->op, NULL, 0, src, num_src);
+      ureg_insn(ureg, inst->op, NULL, 0, src, num_src, inst->precise);
       return;
 
    case TGSI_OPCODE_TEX:
@@ -5935,7 +5935,7 @@ compile_tgsi_instruction(struct st_translate *t,
                     tex_target,
                     st_translate_texture_type(inst->tex_type),
                     texoffsets, inst->tex_offset_num_offset,
-                    src, num_src);
+                    src, num_src, inst->precise);
       return;
 
    case TGSI_OPCODE_RESQ:
@@ -5966,7 +5966,7 @@ compile_tgsi_instruction(struct st_translate *t,
       assert(src[0].File != TGSI_FILE_NULL);
       ureg_memory_insn(ureg, inst->op, dst, num_dst, src, num_src,
                        inst->buffer_access,
-                       tex_target, inst->image_format);
+                       tex_target, inst->image_format, inst->precise);
       break;
 
    case TGSI_OPCODE_STORE:
@@ -5984,19 +5984,19 @@ compile_tgsi_instruction(struct st_translate *t,
       assert(dst[0].File != TGSI_FILE_NULL);
       ureg_memory_insn(ureg, inst->op, dst, num_dst, src, num_src,
                        inst->buffer_access,
-                       tex_target, inst->image_format);
+                       tex_target, inst->image_format, inst->precise);
       break;
 
    case TGSI_OPCODE_SCS:
       dst[0] = ureg_writemask(dst[0], TGSI_WRITEMASK_XY);
-      ureg_insn(ureg, inst->op, dst, num_dst, src, num_src);
+      ureg_insn(ureg, inst->op, dst, num_dst, src, num_src,
inst->precise);
       break;
 
    default:
       ureg_insn(ureg,
                 inst->op,
                 dst, num_dst,
-                src, num_src);
+                src, num_src, inst->precise);
       break;
    }
 }
diff --git a/src/mesa/state_tracker/st_mesa_to_tgsi.c
b/src/mesa/state_tracker/st_mesa_to_tgsi.c
index 984ff92130..f11013c116 100644
--- a/src/mesa/state_tracker/st_mesa_to_tgsi.c
+++ b/src/mesa/state_tracker/st_mesa_to_tgsi.c
@@ -558,7 +558,7 @@ compile_instruction(
                                                inst->TexShadow ),
                      TGSI_RETURN_TYPE_FLOAT,
                      NULL, 0,
-                     src, num_src );
+                     src, num_src, 0 );
       return;
 
    case OPCODE_SCS:
@@ -566,7 +566,7 @@ compile_instruction(
       ureg_insn( ureg, 
                  translate_opcode( inst->Opcode ), 
                  dst, num_dst, 
-                 src, num_src );
+                 src, num_src, 0 );
       break;
 
    case OPCODE_XPD:
@@ -574,7 +574,7 @@ compile_instruction(
       ureg_insn( ureg, 
                  translate_opcode( inst->Opcode ), 
                  dst, num_dst, 
-                 src, num_src );
+                 src, num_src, 0 );
       break;
 
    case OPCODE_RSQ:
@@ -593,7 +593,7 @@ compile_instruction(
       ureg_insn( ureg, 
                  translate_opcode( inst->Opcode ), 
                  dst, num_dst, 
-                 src, num_src );
+                 src, num_src, 0);
       break;
    }
 }
diff --git a/src/mesa/state_tracker/st_pbo.c b/src/mesa/state_tracker/st_pbo.c
index 303c8535b2..3dff1609e8 100644
--- a/src/mesa/state_tracker/st_pbo.c
+++ b/src/mesa/state_tracker/st_pbo.c
@@ -528,7 +528,7 @@ create_fs(struct st_context *st, bool download, enum
pipe_texture_target target,
       op[0] = ureg_src(temp0);
       op[1] = ureg_src(temp1);
       ureg_memory_insn(ureg, TGSI_OPCODE_STORE, &out, 1, op, 2, 0,
-                             TGSI_TEXTURE_BUFFER, PIPE_FORMAT_NONE);
+                             TGSI_TEXTURE_BUFFER, PIPE_FORMAT_NONE, 0);
 
       ureg_release_temporary(ureg, temp1);
    } else {
-- 
2.13.1

Karol Herbst

2017-Jun-11 18:42 UTC

head link

[Nouveau] [RFC 5/9] tgsi/text: parse _PRECISE modifier

Signed-off-by: Karol Herbst <karolherbst at gmail.com>
---
 src/gallium/auxiliary/tgsi/tgsi_text.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c
b/src/gallium/auxiliary/tgsi/tgsi_text.c
index 93a05568f4..c5fcb3283d 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_text.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
@@ -999,6 +999,7 @@ parse_texoffset_operand(
 static boolean
 match_inst(const char **pcur,
            unsigned *saturate,
+           unsigned *precise,
            const struct tgsi_opcode_info *info)
 {
    const char *cur = *pcur;
@@ -1007,6 +1008,7 @@ match_inst(const char **pcur,
    if (str_match_nocase_whole(&cur, info->mnemonic)) {
       *pcur = cur;
       *saturate = 0;
+      *precise = 0;
       return TRUE;
    }
 
@@ -1015,8 +1017,15 @@ match_inst(const char **pcur,
       if (str_match_nocase_whole(&cur, "_SAT")) {
          *pcur = cur;
          *saturate = 1;
-         return TRUE;
       }
+
+      if (str_match_nocase_whole(&cur, "_PRECISE")) {
+         *pcur = cur;
+         *precise = 1;
+      }
+
+      if (*precise || *saturate)
+         return TRUE;
    }
 
    return FALSE;
@@ -1029,6 +1038,7 @@ parse_instruction(
 {
    uint i;
    uint saturate = 0;
+   uint precise = 0;
    const struct tgsi_opcode_info *info;
    struct tgsi_full_instruction inst;
    const char *cur;
@@ -1043,7 +1053,7 @@ parse_instruction(
       cur = ctx->cur;
 
       info = tgsi_get_opcode_info( i );
-      if (match_inst(&cur, &saturate, info)) {
+      if (match_inst(&cur, &saturate, &precise, info)) {
          if (info->num_dst + info->num_src + info->is_tex == 0) {
             ctx->cur = cur;
             break;
@@ -1064,6 +1074,7 @@ parse_instruction(
 
    inst.Instruction.Opcode = i;
    inst.Instruction.Saturate = saturate;
+   inst.Instruction.Precise = precise;
    inst.Instruction.NumDstRegs = info->num_dst;
    inst.Instruction.NumSrcRegs = info->num_src;
 
-- 
2.13.1

Karol Herbst

2017-Jun-11 18:42 UTC

head link

[Nouveau] [RFC 6/9] nv50/ir: add precise field to Instruction

Signed-off-by: Karol Herbst <karolherbst at gmail.com>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index 5c09fed05c..6835c4fa8c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -884,6 +884,7 @@ public:
    unsigned perPatch   : 1;
    unsigned exit       : 1; // terminate program after insn
    unsigned mask       : 4; // for vector ops
+   unsigned precise    : 1; // prevent algebraic optimisations like mul+add to
mad
 
    int8_t postFactor; // MUL/DIV(if < 0) by 1 << postFactor
 
-- 
2.13.1

Karol Herbst

2017-Jun-11 18:42 UTC

head link

[Nouveau] [RFC 7/9] nv50/ir/tgsi: handle precise for most ALU instructions

Signed-off-by: Karol Herbst <karolherbst at gmail.com>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 1264dd4834..c633185893 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -3179,6 +3179,7 @@ Converter::handleInstruction(const struct
tgsi_full_instruction *insn)
          geni->subOp = tgsi::opcodeToSubOp(tgsi.getOpcode());
          if (op == OP_MUL && dstTy == TYPE_F32)
             geni->dnz = info->io.mul_zero_wins;
+         geni->precise = insn->Instruction.Precise;
       }
       break;
    case TGSI_OPCODE_MAD:
@@ -3192,6 +3193,7 @@ Converter::handleInstruction(const struct
tgsi_full_instruction *insn)
          geni = mkOp3(op, dstTy, dst0[c], src0, src1, src2);
          if (dstTy == TYPE_F32)
             geni->dnz = info->io.mul_zero_wins;
+         geni->precise = insn->Instruction.Precise;
       }
       break;
    case TGSI_OPCODE_MOV:
-- 
2.13.1

Karol Herbst

2017-Jun-11 18:42 UTC

head link

[Nouveau] [RFC 8/9] nv50/ir: disable mul+add to mad for precise instructions

fixes missrendering in TombRaider

Signed-off-by: Karol Herbst <karolherbst at gmail.com>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 4c92a1efb5..85f3f44832 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1669,6 +1669,10 @@ AlgebraicOpt::handleABS(Instruction *abs)
 bool
 AlgebraicOpt::handleADD(Instruction *add)
 {
+   // we can't optimize to SAD/MAD if the instruction is tagged as precise
+   if (add->precise)
+      return false;
+
    Value *src0 = add->getSrc(0);
    Value *src1 = add->getSrc(1);
 
@@ -1712,7 +1716,7 @@ AlgebraicOpt::tryADDToMADOrSAD(Instruction *add, operation
toOp)
       return false;
 
    if (src->getInsn()->saturate || src->getInsn()->postFactor ||
-       src->getInsn()->dnz)
+       src->getInsn()->dnz || src->getInsn()->precise)
       return false;
 
    if (toOp == OP_SAD) {
-- 
2.13.1

Karol Herbst

2017-Jun-11 18:42 UTC

head link

[Nouveau] [RFC 9/9] nv50/ir/tgsi: split mad to mul+add

fixes
KHR-GL44.gpu_shader5.precise_qualifier
KHR-GL45.gpu_shader5.precise_qualifier

Signed-off-by: Karol Herbst <karolherbst at gmail.com>
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index c633185893..cd45e82426 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -3184,6 +3184,20 @@ Converter::handleInstruction(const struct
tgsi_full_instruction *insn)
       break;
    case TGSI_OPCODE_MAD:
    case TGSI_OPCODE_UMAD:
+      FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) {
+         val0 = getSSA();
+         src0 = fetchSrc(0, c);
+         src1 = fetchSrc(1, c);
+         src2 = fetchSrc(2, c);
+         geni = mkOp2(OP_MUL, dstTy, val0, src0, src1);
+         if (dstTy == TYPE_F32)
+            geni->dnz = info->io.mul_zero_wins;
+         geni->precise = insn->Instruction.Precise;
+
+         geni = mkOp2(OP_ADD, dstTy, dst0[c], val0, src2);
+         geni->precise = insn->Instruction.Precise;
+      }
+      break;
    case TGSI_OPCODE_SAD:
    case TGSI_OPCODE_FMA:
       FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) {
-- 
2.13.1

Nicolai Hähnle

2017-Jun-12 10:31 UTC

head link

[Nouveau] [Mesa-dev] [RFC 5/9] tgsi/text: parse _PRECISE modifier

On 11.06.2017 20:42, Karol Herbst wrote:> Signed-off-by: Karol Herbst <karolherbst at gmail.com>
> ---
>   src/gallium/auxiliary/tgsi/tgsi_text.c | 15 +++++++++++++--
>   1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c
b/src/gallium/auxiliary/tgsi/tgsi_text.c
> index 93a05568f4..c5fcb3283d 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_text.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
> @@ -999,6 +999,7 @@ parse_texoffset_operand(
>   static boolean
>   match_inst(const char **pcur,
>              unsigned *saturate,
> +           unsigned *precise,
>              const struct tgsi_opcode_info *info)
>   {
>      const char *cur = *pcur;
> @@ -1007,6 +1008,7 @@ match_inst(const char **pcur,
>      if (str_match_nocase_whole(&cur, info->mnemonic)) {
>         *pcur = cur;
>         *saturate = 0;
> +      *precise = 0;
>         return TRUE;
>      }
>   
> @@ -1015,8 +1017,15 @@ match_inst(const char **pcur,
>         if (str_match_nocase_whole(&cur, "_SAT")) {
>            *pcur = cur;
>            *saturate = 1;
> -         return TRUE;
>         }
> +
> +      if (str_match_nocase_whole(&cur, "_PRECISE")) {
> +         *pcur = cur;
> +         *precise = 1;
> +      }
I think this doesn't properly handle the case where both _SAT and 
_PRECISE are present, because of using str_match_nocase_whole.

Cheers,
Nicolai
> +
> +      if (*precise || *saturate)
> +         return TRUE;
>      }
>   
>      return FALSE;
> @@ -1029,6 +1038,7 @@ parse_instruction(
>   {
>      uint i;
>      uint saturate = 0;
> +   uint precise = 0;
>      const struct tgsi_opcode_info *info;
>      struct tgsi_full_instruction inst;
>      const char *cur;
> @@ -1043,7 +1053,7 @@ parse_instruction(
>         cur = ctx->cur;
>   
>         info = tgsi_get_opcode_info( i );
> -      if (match_inst(&cur, &saturate, info)) {
> +      if (match_inst(&cur, &saturate, &precise, info)) {
>            if (info->num_dst + info->num_src + info->is_tex == 0)
{
>               ctx->cur = cur;
>               break;
> @@ -1064,6 +1074,7 @@ parse_instruction(
>   
>      inst.Instruction.Opcode = i;
>      inst.Instruction.Saturate = saturate;
> +   inst.Instruction.Precise = precise;
>      inst.Instruction.NumDstRegs = info->num_dst;
>      inst.Instruction.NumSrcRegs = info->num_src;
>   
> 

-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

Nicolai Hähnle

2017-Jun-12 10:33 UTC

head link

[Nouveau] [Mesa-dev] [RFC 4/9] tgsi: populate precise

On 11.06.2017 20:42, Karol Herbst wrote:> Only implemented for glsl->tgsi. Other converters just set precise to 0.
> 
> Signed-off-by: Karol Herbst <karolherbst at gmail.com>
> ---
>   src/gallium/auxiliary/tgsi/tgsi_build.c       |  3 +++
>   src/gallium/auxiliary/tgsi/tgsi_ureg.c        | 14 +++++++---
>   src/gallium/auxiliary/tgsi/tgsi_ureg.h        | 20 +++++++++++---
>   src/gallium/auxiliary/util/u_simple_shaders.c |  2 +-
>   src/gallium/state_trackers/nine/nine_shader.c |  6 ++---
>   src/mesa/state_tracker/st_atifs_to_tgsi.c     | 38
+++++++++++++--------------
>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp    | 12 ++++-----
>   src/mesa/state_tracker/st_mesa_to_tgsi.c      |  8 +++---
>   src/mesa/state_tracker/st_pbo.c               |  2 +-
>   9 files changed, 65 insertions(+), 40 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c
b/src/gallium/auxiliary/tgsi/tgsi_build.c
> index 55e4d064ed..144a017768 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_build.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_build.c
> @@ -651,6 +651,7 @@ tgsi_default_instruction( void )
>   static struct tgsi_instruction
>   tgsi_build_instruction(unsigned opcode,
>                          unsigned saturate,
> +                       unsigned precise,
>                          unsigned num_dst_regs,
>                          unsigned num_src_regs,
>                          struct tgsi_header *header)
> @@ -665,6 +666,7 @@ tgsi_build_instruction(unsigned opcode,
>      instruction = tgsi_default_instruction();
>      instruction.Opcode = opcode;
>      instruction.Saturate = saturate;
> +   instruction.Precise = precise;
>      instruction.NumDstRegs = num_dst_regs;
>      instruction.NumSrcRegs = num_src_regs;
>   
> @@ -1061,6 +1063,7 @@ tgsi_build_full_instruction(
>   
>      *instruction =
tgsi_build_instruction(full_inst->Instruction.Opcode,
>                                           
full_inst->Instruction.Saturate,
> +                                        
full_inst->Instruction.Precise,
>                                           
full_inst->Instruction.NumDstRegs,
>                                           
full_inst->Instruction.NumSrcRegs,
>                                            header);
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
> index 5bd779728a..56db2252c5 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
> @@ -1213,6 +1213,7 @@ struct ureg_emit_insn_result
>   ureg_emit_insn(struct ureg_program *ureg,
>                  unsigned opcode,
>                  boolean saturate,
> +               unsigned precise,
>                  unsigned num_dst,
>                  unsigned num_src)
>   {
> @@ -1226,6 +1227,7 @@ ureg_emit_insn(struct ureg_program *ureg,
>      out[0].insn = tgsi_default_instruction();
>      out[0].insn.Opcode = opcode;
>      out[0].insn.Saturate = saturate;
> +   out[0].insn.Precise = precise;
>      out[0].insn.NumDstRegs = num_dst;
>      out[0].insn.NumSrcRegs = num_src;
>   
> @@ -1354,7 +1356,8 @@ ureg_insn(struct ureg_program *ureg,
>             const struct ureg_dst *dst,
>             unsigned nr_dst,
>             const struct ureg_src *src,
> -          unsigned nr_src )
> +          unsigned nr_src,
> +          unsigned precise )
>   {
>      struct ureg_emit_insn_result insn;
>      unsigned i;
> @@ -1369,6 +1372,7 @@ ureg_insn(struct ureg_program *ureg,
>      insn = ureg_emit_insn(ureg,
>                            opcode,
>                            saturate,
> +                         precise,
>                            nr_dst,
>                            nr_src);
>   
> @@ -1391,7 +1395,8 @@ ureg_tex_insn(struct ureg_program *ureg,
>                 const struct tgsi_texture_offset *texoffsets,
>                 unsigned nr_offset,
>                 const struct ureg_src *src,
> -              unsigned nr_src )
> +              unsigned nr_src,
> +              unsigned precise )
What does `precise' mean for tex instructions?

>   {
>      struct ureg_emit_insn_result insn;
>      unsigned i;
> @@ -1406,6 +1411,7 @@ ureg_tex_insn(struct ureg_program *ureg,
>      insn = ureg_emit_insn(ureg,
>                            opcode,
>                            saturate,
> +                         precise,
>                            nr_dst,
>                            nr_src);
>   
> @@ -1434,7 +1440,8 @@ ureg_memory_insn(struct ureg_program *ureg,
>                    unsigned nr_src,
>                    unsigned qualifier,
>                    unsigned texture,
> -                 unsigned format)
> +                 unsigned format,
> +                 unsigned precise)
Same question. I can't think of a possible meaning, in which case the 
parameter should be dropped.

Cheers,
Nicolai

>   {
>      struct ureg_emit_insn_result insn;
>      unsigned i;
> @@ -1442,6 +1449,7 @@ ureg_memory_insn(struct ureg_program *ureg,
>      insn = ureg_emit_insn(ureg,
>                            opcode,
>                            FALSE,
> +                         precise,
>                            nr_dst,
>                            nr_src);
>   
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.h
b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
> index 54f95ba565..105c85abd5 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.h
> +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
> @@ -546,7 +546,8 @@ ureg_insn(struct ureg_program *ureg,
>             const struct ureg_dst *dst,
>             unsigned nr_dst,
>             const struct ureg_src *src,
> -          unsigned nr_src );
> +          unsigned nr_src,
> +          unsigned precise);
>   
>   
>   void
> @@ -559,7 +560,8 @@ ureg_tex_insn(struct ureg_program *ureg,
>                 const struct tgsi_texture_offset *texoffsets,
>                 unsigned nr_offset,
>                 const struct ureg_src *src,
> -              unsigned nr_src );
> +              unsigned nr_src,
> +              unsigned precise);
>   
>   
>   void
> @@ -571,7 +573,8 @@ ureg_memory_insn(struct ureg_program *ureg,
>                    unsigned nr_src,
>                    unsigned qualifier,
>                    unsigned texture,
> -                 unsigned format);
> +                 unsigned format,
> +                 unsigned precise);
>   
>   /***********************************************************************
>    * Internal instruction helpers, don't call these directly:
> @@ -586,6 +589,7 @@ struct ureg_emit_insn_result
>   ureg_emit_insn(struct ureg_program *ureg,
>                  unsigned opcode,
>                  boolean saturate,
> +               unsigned precise,
>                  unsigned num_dst,
>                  unsigned num_src);
>   
> @@ -632,6 +636,7 @@ static inline void ureg_##op( struct ureg_program *ureg
)       \
>                            opcode,                                \
>                            FALSE,                                 \
>                            0,                                     \
> +                         0,                                     \
>                            0);                                    \
>      ureg_fixup_insn_size( ureg, insn.insn_token );               \
>   }
> @@ -646,6 +651,7 @@ static inline void ureg_##op( struct ureg_program
*ureg,        \
>                            opcode,                                \
>                            FALSE,                                 \
>                            0,                                     \
> +                         0,                                     \
>                            1);                                    \
>      ureg_emit_src( ureg, src );                                  \
>      ureg_fixup_insn_size( ureg, insn.insn_token );               \
> @@ -661,6 +667,7 @@ static inline void ureg_##op( struct ureg_program
*ureg,        \
>                            opcode,                                \
>                            FALSE,                                 \
>                            0,                                     \
> +                         0,                                     \
>                            0);                                    \
>      ureg_emit_label( ureg, insn.extended_token, label_token );   \
>      ureg_fixup_insn_size( ureg, insn.insn_token );               \
> @@ -677,6 +684,7 @@ static inline void ureg_##op( struct ureg_program
*ureg,        \
>                            opcode,                                \
>                            FALSE,                                 \
>                            0,                                     \
> +                         0,                                     \
>                            1);                                    \
>      ureg_emit_label( ureg, insn.extended_token, label_token );   \
>      ureg_emit_src( ureg, src );                                  \
> @@ -694,6 +702,7 @@ static inline void ureg_##op( struct ureg_program
*ureg,                \
>      insn = ureg_emit_insn(ureg,                                          \
>                            opcode,                                        \
>                            dst.Saturate,                                  \
> +                         0,                                             \
>                            1,                                             \
>                            0);                                            \
>      ureg_emit_dst( ureg, dst );                                          \
> @@ -713,6 +722,7 @@ static inline void ureg_##op( struct ureg_program
*ureg,                \
>      insn = ureg_emit_insn(ureg,                                          \
>                            opcode,                                        \
>                            dst.Saturate,                                  \
> +                         0,                                             \
>                            1,                                             \
>                            1);                                            \
>      ureg_emit_dst( ureg, dst );                                          \
> @@ -733,6 +743,7 @@ static inline void ureg_##op( struct ureg_program
*ureg,                \
>      insn = ureg_emit_insn(ureg,                                          \
>                            opcode,                                        \
>                            dst.Saturate,                                  \
> +                         0,                                             \
>                            1,                                             \
>                            2);                                            \
>      ureg_emit_dst( ureg, dst );                                          \
> @@ -756,6 +767,7 @@ static inline void ureg_##op( struct ureg_program
*ureg,                \
>      insn = ureg_emit_insn(ureg,                                          \
>                            opcode,                                        \
>                            dst.Saturate,                                  \
> +                         0,                                             \
>                            1,                                             \
>                            2);                                            \
>      ureg_emit_texture( ureg, insn.extended_token, target,                \
> @@ -780,6 +792,7 @@ static inline void ureg_##op( struct ureg_program
*ureg,                \
>      insn = ureg_emit_insn(ureg,                                          \
>                            opcode,                                        \
>                            dst.Saturate,                                  \
> +                         0,                                             \
>                            1,                                             \
>                            3);                                            \
>      ureg_emit_dst( ureg, dst );                                          \
> @@ -806,6 +819,7 @@ static inline void ureg_##op( struct ureg_program
*ureg,                \
>      insn = ureg_emit_insn(ureg,                                          \
>                            opcode,                                        \
>                            dst.Saturate,                                  \
> +                         0,                                             \
>                            1,                                             \
>                            4);                                            \
>      ureg_emit_texture( ureg, insn.extended_token, target,                \
> diff --git a/src/gallium/auxiliary/util/u_simple_shaders.c
b/src/gallium/auxiliary/util/u_simple_shaders.c
> index 5874d0e9aa..79331b5638 100644
> --- a/src/gallium/auxiliary/util/u_simple_shaders.c
> +++ b/src/gallium/auxiliary/util/u_simple_shaders.c
> @@ -954,7 +954,7 @@ util_make_geometry_passthrough_shader(struct
pipe_context *pipe,
>      }
>   
>      /* EMIT IMM[0] */
> -   ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, &imm, 1);
> +   ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, &imm, 1, 0);
>   
>      /* END */
>      ureg_END(ureg);
> diff --git a/src/gallium/state_trackers/nine/nine_shader.c
b/src/gallium/state_trackers/nine/nine_shader.c
> index 40fb6be88f..f405090811 100644
> --- a/src/gallium/state_trackers/nine/nine_shader.c
> +++ b/src/gallium/state_trackers/nine/nine_shader.c
> @@ -1879,7 +1879,7 @@ DECL_SPECIAL(IFC)
>       struct ureg_dst tmp = ureg_writemask(tx_scratch(tx),
TGSI_WRITEMASK_X);
>       src[0] = tx_src_param(tx, &tx->insn.src[0]);
>       src[1] = tx_src_param(tx, &tx->insn.src[1]);
> -    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2);
> +    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2, 0);
>       ureg_IF(tx->ureg, ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X),
tx_cond(tx));
>       return D3D_OK;
>   }
> @@ -1897,7 +1897,7 @@ DECL_SPECIAL(BREAKC)
>       struct ureg_dst tmp = ureg_writemask(tx_scratch(tx),
TGSI_WRITEMASK_X);
>       src[0] = tx_src_param(tx, &tx->insn.src[0]);
>       src[1] = tx_src_param(tx, &tx->insn.src[1]);
> -    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2);
> +    ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2, 0);
>       ureg_IF(tx->ureg, ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X),
tx_cond(tx));
>       ureg_BRK(tx->ureg);
>       tx_endcond(tx);
> @@ -3029,7 +3029,7 @@ NineTranslateInstruction_Generic(struct
shader_translator *tx)
>   
>       ureg_insn(tx->ureg, tx->insn.info->opcode,
>                 dst, tx->insn.ndst,
> -              src, tx->insn.nsrc);
> +              src, tx->insn.nsrc, 0);
>       return D3D_OK;
>   }
>   
> diff --git a/src/mesa/state_tracker/st_atifs_to_tgsi.c
b/src/mesa/state_tracker/st_atifs_to_tgsi.c
> index 338ced56ed..e0a6ff7131 100644
> --- a/src/mesa/state_tracker/st_atifs_to_tgsi.c
> +++ b/src/mesa/state_tracker/st_atifs_to_tgsi.c
> @@ -105,18 +105,18 @@ apply_swizzle(struct st_translate *t,
>         imm[0] = src;
>         imm[1] = ureg_imm4f(t->ureg, 1.0f, 1.0f, 0.0f, 0.0f);
>         imm[2] = ureg_imm4f(t->ureg, 0.0f, 0.0f, 1.0f, 1.0f);
> -      ureg_insn(t->ureg, TGSI_OPCODE_MAD, &tmp[0], 1, imm, 3);
> +      ureg_insn(t->ureg, TGSI_OPCODE_MAD, &tmp[0], 1, imm, 3, 0);
>   
>         if (swizzle == GL_SWIZZLE_STR_DR_ATI) {
>            imm[0] = ureg_scalar(src, TGSI_SWIZZLE_Z);
>         } else {
>            imm[0] = ureg_scalar(src, TGSI_SWIZZLE_W);
>         }
> -      ureg_insn(t->ureg, TGSI_OPCODE_RCP, &tmp[1], 1, &imm[0],
1);
> +      ureg_insn(t->ureg, TGSI_OPCODE_RCP, &tmp[1], 1, &imm[0],
1, 0);
>   
>         imm[0] = ureg_src(tmp[0]);
>         imm[1] = ureg_src(tmp[1]);
> -      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &tmp[0], 1, imm, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &tmp[0], 1, imm, 2, 0);
>   
>         return ureg_src(tmp[0]);
>      }
> @@ -170,35 +170,35 @@ prepare_argument(struct st_translate *t, const
unsigned argId,
>         src = ureg_scalar(src, TGSI_SWIZZLE_W);
>         break;
>      }
> -   ureg_insn(t->ureg, TGSI_OPCODE_MOV, &arg, 1, &src, 1);
> +   ureg_insn(t->ureg, TGSI_OPCODE_MOV, &arg, 1, &src, 1, 0);
>   
>      if (srcReg->argMod & GL_COMP_BIT_ATI) {
>         struct ureg_src modsrc[2];
>         modsrc[0] = ureg_imm1f(t->ureg, 1.0f);
>         modsrc[1] = ureg_negate(ureg_src(arg));
>   
> -      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0);
>      }
>      if (srcReg->argMod & GL_BIAS_BIT_ATI) {
>         struct ureg_src modsrc[2];
>         modsrc[0] = ureg_src(arg);
>         modsrc[1] = ureg_imm1f(t->ureg, -0.5f);
>   
> -      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0);
>      }
>      if (srcReg->argMod & GL_2X_BIT_ATI) {
>         struct ureg_src modsrc[2];
>         modsrc[0] = ureg_src(arg);
>         modsrc[1] = ureg_src(arg);
>   
> -      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0);
>      }
>      if (srcReg->argMod & GL_NEGATE_BIT_ATI) {
>         struct ureg_src modsrc[2];
>         modsrc[0] = ureg_src(arg);
>         modsrc[1] = ureg_imm1f(t->ureg, -1.0f);
>   
> -      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &arg, 1, modsrc, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_MUL, &arg, 1, modsrc, 2, 0);
>      }
>      return  ureg_src(arg);
>   }
> @@ -217,25 +217,25 @@ emit_special_inst(struct st_translate *t, const
struct instruction_desc *desc,
>         tmp[0] = get_temp(t, MAX_NUM_FRAGMENT_REGISTERS_ATI + 2); /*
re-purpose a3 */
>         src[0] = ureg_imm1f(t->ureg, 0.5f);
>         src[1] = ureg_negate(args[2]);
> -      ureg_insn(t->ureg, TGSI_OPCODE_ADD, tmp, 1, src, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_ADD, tmp, 1, src, 2, 0);
>         src[0] = ureg_src(tmp[0]);
>         src[1] = args[0];
>         src[2] = args[1];
> -      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3);
> +      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3, 0);
>      } else if (!strcmp(desc->name, "CND0")) {
>         src[0] = args[2];
>         src[1] = args[1];
>         src[2] = args[0];
> -      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3);
> +      ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3, 0);
>      } else if (!strcmp(desc->name, "DOT2_ADD")) {
>         /* note: DP2A is not implemented in most pipe drivers */
>         tmp[0] = get_temp(t, MAX_NUM_FRAGMENT_REGISTERS_ATI); /* re-purpose
a1 */
>         src[0] = args[0];
>         src[1] = args[1];
> -      ureg_insn(t->ureg, TGSI_OPCODE_DP2, tmp, 1, src, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_DP2, tmp, 1, src, 2, 0);
>         src[0] = ureg_src(tmp[0]);
>         src[1] = ureg_scalar(args[2], TGSI_SWIZZLE_Z);
> -      ureg_insn(t->ureg, TGSI_OPCODE_ADD, dst, 1, src, 2);
> +      ureg_insn(t->ureg, TGSI_OPCODE_ADD, dst, 1, src, 2, 0);
>      }
>   }
>   
> @@ -249,7 +249,7 @@ emit_arith_inst(struct st_translate *t,
>         return;
>      }
>   
> -   ureg_insn(t->ureg, desc->TGSI_opcode, dst, 1, args, argcount);
> +   ureg_insn(t->ureg, desc->TGSI_opcode, dst, 1, args, argcount, 0);
>   }
>   
>   static void
> @@ -292,7 +292,7 @@ emit_dstmod(struct st_translate *t,
>      if (dstMod & GL_SATURATE_BIT_ATI) {
>         dst = ureg_saturate(dst);
>      }
> -   ureg_insn(t->ureg, TGSI_OPCODE_MUL, &dst, 1, src, 2);
> +   ureg_insn(t->ureg, TGSI_OPCODE_MUL, &dst, 1, src, 2, 0);
>   }
>   
>   /**
> @@ -334,9 +334,9 @@ compile_setupinst(struct st_translate *t,
>         src[1] = t->samplers[r];
>         /* the texture target is still unknown, it will be fixed in the
draw call */
>         ureg_tex_insn(t->ureg, TGSI_OPCODE_TEX, dst, 1, TGSI_TEXTURE_2D,
> -                    TGSI_RETURN_TYPE_FLOAT, NULL, 0, src, 2);
> +                    TGSI_RETURN_TYPE_FLOAT, NULL, 0, src, 2, 0);
>      } else if (texinst->Opcode == ATI_FRAGMENT_SHADER_PASS_OP) {
> -      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1);
> +      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1, 0);
>      }
>   
>      t->regs_written[t->current_pass][r] = true;
> @@ -408,11 +408,11 @@ finalize_shader(struct st_translate *t, unsigned
numPasses)
>         /* copy the result into the OUT slot */
>         dst[0] = t->outputs[t->outputMapping[FRAG_RESULT_COLOR]];
>         src[0] = ureg_src(t->temps[0]);
> -      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1);
> +      ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1, 0);
>      }
>   
>      /* signal the end of the program */
> -   ureg_insn(t->ureg, TGSI_OPCODE_END, dst, 0, src, 0);
> +   ureg_insn(t->ureg, TGSI_OPCODE_END, dst, 0, src, 0, 0);
>   }
>   
>   /**
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index 19f90f21fe..ecd9f9f280 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -5900,7 +5900,7 @@ compile_tgsi_instruction(struct st_translate *t,
>      case TGSI_OPCODE_IF:
>      case TGSI_OPCODE_UIF:
>         assert(num_dst == 0);
> -      ureg_insn(ureg, inst->op, NULL, 0, src, num_src);
> +      ureg_insn(ureg, inst->op, NULL, 0, src, num_src,
inst->precise);
>         return;
>   
>      case TGSI_OPCODE_TEX:
> @@ -5935,7 +5935,7 @@ compile_tgsi_instruction(struct st_translate *t,
>                       tex_target,
>                       st_translate_texture_type(inst->tex_type),
>                       texoffsets, inst->tex_offset_num_offset,
> -                    src, num_src);
> +                    src, num_src, inst->precise);
>         return;
>   
>      case TGSI_OPCODE_RESQ:
> @@ -5966,7 +5966,7 @@ compile_tgsi_instruction(struct st_translate *t,
>         assert(src[0].File != TGSI_FILE_NULL);
>         ureg_memory_insn(ureg, inst->op, dst, num_dst, src, num_src,
>                          inst->buffer_access,
> -                       tex_target, inst->image_format);
> +                       tex_target, inst->image_format,
inst->precise);
>         break;
>   
>      case TGSI_OPCODE_STORE:
> @@ -5984,19 +5984,19 @@ compile_tgsi_instruction(struct st_translate *t,
>         assert(dst[0].File != TGSI_FILE_NULL);
>         ureg_memory_insn(ureg, inst->op, dst, num_dst, src, num_src,
>                          inst->buffer_access,
> -                       tex_target, inst->image_format);
> +                       tex_target, inst->image_format,
inst->precise);
>         break;
>   
>      case TGSI_OPCODE_SCS:
>         dst[0] = ureg_writemask(dst[0], TGSI_WRITEMASK_XY);
> -      ureg_insn(ureg, inst->op, dst, num_dst, src, num_src);
> +      ureg_insn(ureg, inst->op, dst, num_dst, src, num_src,
inst->precise);
>         break;
>   
>      default:
>         ureg_insn(ureg,
>                   inst->op,
>                   dst, num_dst,
> -                src, num_src);
> +                src, num_src, inst->precise);
>         break;
>      }
>   }
> diff --git a/src/mesa/state_tracker/st_mesa_to_tgsi.c
b/src/mesa/state_tracker/st_mesa_to_tgsi.c
> index 984ff92130..f11013c116 100644
> --- a/src/mesa/state_tracker/st_mesa_to_tgsi.c
> +++ b/src/mesa/state_tracker/st_mesa_to_tgsi.c
> @@ -558,7 +558,7 @@ compile_instruction(
>                                                  inst->TexShadow ),
>                        TGSI_RETURN_TYPE_FLOAT,
>                        NULL, 0,
> -                     src, num_src );
> +                     src, num_src, 0 );
>         return;
>   
>      case OPCODE_SCS:
> @@ -566,7 +566,7 @@ compile_instruction(
>         ureg_insn( ureg,
>                    translate_opcode( inst->Opcode ),
>                    dst, num_dst,
> -                 src, num_src );
> +                 src, num_src, 0 );
>         break;
>   
>      case OPCODE_XPD:
> @@ -574,7 +574,7 @@ compile_instruction(
>         ureg_insn( ureg,
>                    translate_opcode( inst->Opcode ),
>                    dst, num_dst,
> -                 src, num_src );
> +                 src, num_src, 0 );
>         break;
>   
>      case OPCODE_RSQ:
> @@ -593,7 +593,7 @@ compile_instruction(
>         ureg_insn( ureg,
>                    translate_opcode( inst->Opcode ),
>                    dst, num_dst,
> -                 src, num_src );
> +                 src, num_src, 0);
>         break;
>      }
>   }
> diff --git a/src/mesa/state_tracker/st_pbo.c
b/src/mesa/state_tracker/st_pbo.c
> index 303c8535b2..3dff1609e8 100644
> --- a/src/mesa/state_tracker/st_pbo.c
> +++ b/src/mesa/state_tracker/st_pbo.c
> @@ -528,7 +528,7 @@ create_fs(struct st_context *st, bool download, enum
pipe_texture_target target,
>         op[0] = ureg_src(temp0);
>         op[1] = ureg_src(temp1);
>         ureg_memory_insn(ureg, TGSI_OPCODE_STORE, &out, 1, op, 2, 0,
> -                             TGSI_TEXTURE_BUFFER, PIPE_FORMAT_NONE);
> +                             TGSI_TEXTURE_BUFFER, PIPE_FORMAT_NONE, 0);
>   
>         ureg_release_temporary(ureg, temp1);
>      } else {
> 

-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

Nicolai Hähnle

2017-Jun-12 10:41 UTC

head link

[Nouveau] [Mesa-dev] [RFC 3/9] st/glsl_to_tgsi: handle precise modifier

On 11.06.2017 20:42, Karol Herbst wrote:> all subexpression inside an ir_assignment needs to be tagged as precise.
> 
> Signed-off-by: Karol Herbst <karolherbst at gmail.com>
> ---
>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 80
++++++++++++++++++++++++------
>   1 file changed, 65 insertions(+), 15 deletions(-)
> 
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index c5d2e0fcd2..19f90f21fe 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -87,6 +87,13 @@ static int swizzle_for_type(const glsl_type *type, int
component = 0)
>      return swizzle;
>   }
>   
> +static unsigned is_precise(const ir_variable *ir)
> +{
> +   if (!ir)
> +      return 0;
> +   return ir->data.precise || ir->data.invariant;
> +}
> +
>   /**
>    * This struct is a corresponding struct to TGSI ureg_src.
>    */
> @@ -296,6 +303,7 @@ public:
>      ir_instruction *ir;
>   
>      unsigned op:8; /**< TGSI opcode */
> +   unsigned precise:1;
>      unsigned saturate:1;
>      unsigned is_64bit_expanded:1;
>      unsigned sampler_base:5;
> @@ -435,6 +443,7 @@ public:
>      bool have_fma;
>      bool use_shared_memory;
>      bool has_tex_txf_lz;
> +   unsigned precise;
>   
>      variable_storage *find_variable_storage(ir_variable *var);
>   
> @@ -505,13 +514,29 @@ public:
>                                         st_src_reg src0 = undef_src,
>                                         st_src_reg src1 = undef_src,
>                                         st_src_reg src2 = undef_src,
> -                                      st_src_reg src3 = undef_src);
> +                                      st_src_reg src3 = undef_src,
> +                                      unsigned precise = 0);
>   
>      glsl_to_tgsi_instruction *emit_asm(ir_instruction *ir, unsigned op,
>                                         st_dst_reg dst, st_dst_reg dst1,
>                                         st_src_reg src0 = undef_src,
>                                         st_src_reg src1 = undef_src,
>                                         st_src_reg src2 = undef_src,
> +                                      st_src_reg src3 = undef_src,
> +                                      unsigned precise = 0);
> +
> +   glsl_to_tgsi_instruction *emit_asm(ir_expression *ir, unsigned op,
> +                                      st_dst_reg dst = undef_dst,
> +                                      st_src_reg src0 = undef_src,
> +                                      st_src_reg src1 = undef_src,
> +                                      st_src_reg src2 = undef_src,
> +                                      st_src_reg src3 = undef_src);
> +
> +   glsl_to_tgsi_instruction *emit_asm(ir_expression *ir, unsigned op,
> +                                      st_dst_reg dst, st_dst_reg dst1,
> +                                      st_src_reg src0 = undef_src,
> +                                      st_src_reg src1 = undef_src,
> +                                      st_src_reg src2 = undef_src,
>                                         st_src_reg src3 = undef_src);
Yeah, I don't like those overloads and the way they force you to add 
artificial casts for disambiguation.

I'd suggest to embrace the global precise flag: drop the precise 
parameter from emit_asm, and just source the bit from this->precise.

Please make precise a bool, and add a comment explaining that it's a 
flag for whether the currently evaluated expression should be precise.

Cheers,
Nicolai

>      unsigned get_opcode(unsigned op,
> @@ -650,7 +675,8 @@ glsl_to_tgsi_instruction *
>   glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op,
>                                  st_dst_reg dst, st_dst_reg dst1,
>                                  st_src_reg src0, st_src_reg src1,
> -                               st_src_reg src2, st_src_reg src3)
> +                               st_src_reg src2, st_src_reg src3,
> +                               unsigned precise)
>   {
>      glsl_to_tgsi_instruction *inst = new(mem_ctx)
glsl_to_tgsi_instruction();
>      int num_reladdr = 0, i, j;
> @@ -691,6 +717,7 @@ glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir,
unsigned op,
>      STATIC_ASSERT(TGSI_OPCODE_LAST <= 255);
>   
>      inst->op = op;
> +   inst->precise = precise;
>      inst->info = tgsi_get_opcode_info(op);
>      inst->dst[0] = dst;
>      inst->dst[1] = dst1;
> @@ -881,9 +908,28 @@ glsl_to_tgsi_instruction *
>   glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op,
>                                  st_dst_reg dst,
>                                  st_src_reg src0, st_src_reg src1,
> +                               st_src_reg src2, st_src_reg src3,
> +                               unsigned precise)
> +{
> +   return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3,
precise);
> +}
> +
> +glsl_to_tgsi_instruction *
> +glsl_to_tgsi_visitor::emit_asm(ir_expression *ir, unsigned op,
> +                               st_dst_reg dst,
> +                               st_src_reg src0, st_src_reg src1,
> +                               st_src_reg src2, st_src_reg src3)
> +{
> +   return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3,
this->precise);
> +}
> +
> +glsl_to_tgsi_instruction *
> +glsl_to_tgsi_visitor::emit_asm(ir_expression *ir, unsigned op,
> +                               st_dst_reg dst, st_dst_reg dst1,
> +                               st_src_reg src0, st_src_reg src1,
>                                  st_src_reg src2, st_src_reg src3)
>   {
> -   return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3);
> +   return emit_asm(ir, op, dst, dst1, src0, src1, src2, src3,
this->precise);
>   }
>   
>   /**
> @@ -1116,7 +1162,7 @@ glsl_to_tgsi_visitor::emit_arl(ir_instruction *ir,
>      if (dst.index >= this->num_address_regs)
>         this->num_address_regs = dst.index + 1;
>   
> -   emit_asm(NULL, op, dst, src0);
> +   emit_asm((ir_instruction *)NULL, op, dst, src0);
>   }
>   
>   int
> @@ -1406,11 +1452,11 @@ glsl_to_tgsi_visitor::visit(ir_variable *ir)
>   void
>   glsl_to_tgsi_visitor::visit(ir_loop *ir)
>   {
> -   emit_asm(NULL, TGSI_OPCODE_BGNLOOP);
> +   emit_asm((ir_instruction *)NULL, TGSI_OPCODE_BGNLOOP);
>   
>      visit_exec_list(&ir->body_instructions, this);
>   
> -   emit_asm(NULL, TGSI_OPCODE_ENDLOOP);
> +   emit_asm((ir_instruction *)NULL, TGSI_OPCODE_ENDLOOP);
>   }
>   
>   void
> @@ -1418,10 +1464,10 @@ glsl_to_tgsi_visitor::visit(ir_loop_jump *ir)
>   {
>      switch (ir->mode) {
>      case ir_loop_jump::jump_break:
> -      emit_asm(NULL, TGSI_OPCODE_BRK);
> +      emit_asm((ir_instruction *)NULL, TGSI_OPCODE_BRK);
>         break;
>      case ir_loop_jump::jump_continue:
> -      emit_asm(NULL, TGSI_OPCODE_CONT);
> +      emit_asm((ir_instruction *)NULL, TGSI_OPCODE_CONT);
>         break;
>      }
>   }
> @@ -2703,7 +2749,7 @@ glsl_to_tgsi_visitor::visit(ir_dereference_variable
*ir)
>               st_dst_reg dst = st_dst_reg(get_temp(var->type));
>               st_src_reg src = st_src_reg(PROGRAM_OUTPUT,
decl->mesa_index,
>                                           var->type, component,
decl->array_id);
> -            emit_asm(NULL, TGSI_OPCODE_FBFETCH, dst, src);
> +            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_FBFETCH, dst,
src);
>               entry = new(mem_ctx) variable_storage(var, dst.file,
dst.index,
>                                                     dst.array_id);
>            } else {
> @@ -3148,7 +3194,10 @@ glsl_to_tgsi_visitor::visit(ir_assignment *ir)
>      st_dst_reg l;
>      st_src_reg r;
>   
> +   /* all generated instructions need to be flaged as precise */
> +   this->precise = is_precise(ir->lhs->variable_referenced());
>      ir->rhs->accept(this);
> +   this->precise = 0;
>      r = this->result;
>   
>      l = get_assignment_lhs(ir->lhs, this, &dst_component);
> @@ -3233,7 +3282,8 @@ glsl_to_tgsi_visitor::visit(ir_assignment *ir)
>          */
>         glsl_to_tgsi_instruction *inst, *new_inst;
>         inst = (glsl_to_tgsi_instruction
*)this->instructions.get_tail();
> -      new_inst = emit_asm(ir, inst->op, l, inst->src[0],
inst->src[1], inst->src[2], inst->src[3]);
> +      new_inst = emit_asm(ir, inst->op, l, inst->src[0],
inst->src[1], inst->src[2], inst->src[3],
> +                         
is_precise(ir->lhs->variable_referenced()));
>         new_inst->saturate = inst->saturate;
>         inst->dead_mask = inst->dst[0].writemask;
>      } else {
> @@ -4072,16 +4122,16 @@
glsl_to_tgsi_visitor::calc_deref_offsets(ir_dereference *tail,
>   
>            deref_arr->array_index->accept(this);
>            if (*array_elements != 1)
> -            emit_asm(NULL, TGSI_OPCODE_MUL, temp_dst, this->result,
st_src_reg_for_int(*array_elements));
> +            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MUL, temp_dst,
this->result, st_src_reg_for_int(*array_elements));
>            else
> -            emit_asm(NULL, TGSI_OPCODE_MOV, temp_dst, this->result);
> +            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MOV, temp_dst,
this->result);
>   
>            if (indirect->file == PROGRAM_UNDEFINED)
>               *indirect = temp_reg;
>            else {
>               temp_dst = st_dst_reg(*indirect);
>               temp_dst.writemask = 1;
> -            emit_asm(NULL, TGSI_OPCODE_ADD, temp_dst, *indirect,
temp_reg);
> +            emit_asm((ir_instruction *)NULL, TGSI_OPCODE_ADD, temp_dst,
*indirect, temp_reg);
>            }
>         } else
>            *index += array_index->value.u[0] * *array_elements;
> @@ -4141,7 +4191,7 @@
glsl_to_tgsi_visitor::canonicalize_gather_offset(st_src_reg offset)
>         st_src_reg tmp = get_temp(glsl_type::ivec2_type);
>         st_dst_reg tmp_dst = st_dst_reg(tmp);
>         tmp_dst.writemask = WRITEMASK_XY;
> -      emit_asm(NULL, TGSI_OPCODE_MOV, tmp_dst, offset);
> +      emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MOV, tmp_dst, offset);
>         return tmp;
>      }
>   
> @@ -6777,7 +6827,7 @@ get_mesa_program_tgsi(struct gl_context *ctx,
>      v->renumber_registers();
>   
>      /* Write the END instruction. */
> -   v->emit_asm(NULL, TGSI_OPCODE_END);
> +   v->emit_asm((ir_instruction *)NULL, TGSI_OPCODE_END);
>   
>      if (ctx->_Shader->Flags & GLSL_DUMP) {
>         _mesa_log("\n");
> 

-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

Nicolai Hähnle

2017-Jun-12 10:42 UTC

head link

[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

On 11.06.2017 20:42, Karol Herbst wrote:> Running Tomb Raider on Nouveau I found some flicker caused by ignoring
precise
> modifiers on variables inside Nouveau.
 >> This series add precise/invariant handling to TGSI, which can be then used
by
> drivers to disable certain unsafe optimisations which may otherwise alter
> calculations, which depend on having the same result across shaders.
It's kind of amazing that we got this far without doing this. On the 
radeonsi side, it's probably related to how conservative LLVM is.

But this series is a good idea, since it might allow us to become more 
aggressive with optimizations in radeonsi as well.

> This series fixes this bug in Tomb Raider and one CTS test for 4.4 and 4.5
> 
> Note on Patch 3: I really dislike how I tell glsl_to_tgsi_visitor to apply
the
> precise flag on instruction emited in ir_assignment->rhs->accept();
but I found
> no other easy way to handle this. Maybe somebody of you has a better idea?
Sent a suggestion, as well as comments on patches 4 & 5. Patches 1 & 2:

Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>

> 
> Karol Herbst (9):
>    tgsi: add precise flag to tgsi_instruction
>    tgsi/dump: print _PRECISE modifier on Instrutions
>    st/glsl_to_tgsi: handle precise modifier
>    tgsi: populate precise
>    tgsi/text: parse _PRECISE modifier
>    nv50/ir: add precise field to Instruction
>    nv50/ir/tgsi: handle precise for most ALU instructions
>    nv50/ir: disable mul+add to mad for precise instructions
>    nv50/ir/tgsi: split mad to mul+add
> 
>   src/gallium/auxiliary/tgsi/tgsi_build.c            |  4 +
>   src/gallium/auxiliary/tgsi/tgsi_dump.c             |  4 +
>   src/gallium/auxiliary/tgsi/tgsi_text.c             | 15 +++-
>   src/gallium/auxiliary/tgsi/tgsi_ureg.c             | 14 +++-
>   src/gallium/auxiliary/tgsi/tgsi_ureg.h             | 20 ++++-
>   src/gallium/auxiliary/util/u_simple_shaders.c      |  2 +-
>   src/gallium/drivers/nouveau/codegen/nv50_ir.h      |  1 +
>   .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 16 ++++
>   .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   |  6 +-
>   src/gallium/include/pipe/p_shader_tokens.h         |  3 +-
>   src/gallium/state_trackers/nine/nine_shader.c      |  6 +-
>   src/mesa/state_tracker/st_atifs_to_tgsi.c          | 38 ++++-----
>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp         | 92
+++++++++++++++++-----
>   src/mesa/state_tracker/st_mesa_to_tgsi.c           |  8 +-
>   src/mesa/state_tracker/st_pbo.c                    |  2 +-
>   15 files changed, 172 insertions(+), 59 deletions(-)
> 

-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

Roland Scheidegger

2017-Jun-12 23:57 UTC

head link

[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

This looks like the right idea to me too. It may sound a bit weird to do
that per instruction, but d3d11 does that as well. (Some d3d versions
just have a global flag basically forbidding or allowing any such fast
math optimizations in the assembly, but I'm not actually sure everybody
honors that without tesselation...)

For 1/9:
Reviewed-by: Roland Scheidegger <sroland at vmware.com>

2/9 has a typo in the commit short log ("Instrutions").

FWIW surely on nv50 you could keep a single mad instruction for umad
(sad maybe too?). (I'm actually wondering if the hw really can't do
unfused float multiply+add as a single instruction but I know next to
nothing about nvidia hw...)

Roland

Am 12.06.2017 um 12:42 schrieb Nicolai Hähnle:> On 11.06.2017 20:42, Karol Herbst wrote:
>> Running Tomb Raider on Nouveau I found some flicker caused by ignoring
>> precise
>> modifiers on variables inside Nouveau.
>>
>> This series add precise/invariant handling to TGSI, which can be then
>> used by
>> drivers to disable certain unsafe optimisations which may otherwise
alter
>> calculations, which depend on having the same result across shaders.
> 
> It's kind of amazing that we got this far without doing this. On the
> radeonsi side, it's probably related to how conservative LLVM is.
> 
> But this series is a good idea, since it might allow us to become more
> aggressive with optimizations in radeonsi as well.
> 
> 
>> This series fixes this bug in Tomb Raider and one CTS test for 4.4 and
>> 4.5
>>
>> Note on Patch 3: I really dislike how I tell glsl_to_tgsi_visitor to
>> apply the
>> precise flag on instruction emited in
ir_assignment->rhs->accept();
>> but I found
>> no other easy way to handle this. Maybe somebody of you has a better
>> idea?
> 
> Sent a suggestion, as well as comments on patches 4 & 5. Patches 1
& 2:
> 
> Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
> 
> 
>>
>> Karol Herbst (9):
>>    tgsi: add precise flag to tgsi_instruction
>>    tgsi/dump: print _PRECISE modifier on Instrutions
>>    st/glsl_to_tgsi: handle precise modifier
>>    tgsi: populate precise
>>    tgsi/text: parse _PRECISE modifier
>>    nv50/ir: add precise field to Instruction
>>    nv50/ir/tgsi: handle precise for most ALU instructions
>>    nv50/ir: disable mul+add to mad for precise instructions
>>    nv50/ir/tgsi: split mad to mul+add
>>
>>   src/gallium/auxiliary/tgsi/tgsi_build.c            |  4 +
>>   src/gallium/auxiliary/tgsi/tgsi_dump.c             |  4 +
>>   src/gallium/auxiliary/tgsi/tgsi_text.c             | 15 +++-
>>   src/gallium/auxiliary/tgsi/tgsi_ureg.c             | 14 +++-
>>   src/gallium/auxiliary/tgsi/tgsi_ureg.h             | 20 ++++-
>>   src/gallium/auxiliary/util/u_simple_shaders.c      |  2 +-
>>   src/gallium/drivers/nouveau/codegen/nv50_ir.h      |  1 +
>>   .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 16 ++++
>>   .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   |  6 +-
>>   src/gallium/include/pipe/p_shader_tokens.h         |  3 +-
>>   src/gallium/state_trackers/nine/nine_shader.c      |  6 +-
>>   src/mesa/state_tracker/st_atifs_to_tgsi.c          | 38 ++++-----
>>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp         | 92
>> +++++++++++++++++-----
>>   src/mesa/state_tracker/st_mesa_to_tgsi.c           |  8 +-
>>   src/mesa/state_tracker/st_pbo.c                    |  2 +-
>>   15 files changed, 172 insertions(+), 59 deletions(-)
>>
> 
>

Apparently Analagous Threads

Search for more reasonably related threads

Nouveau - Jun 2017 - [RFC 0/9] Add precise/invariant semantics to TGSI

[Nouveau] [RFC 0/9] Add precise/invariant semantics to TGSI

[Nouveau] [RFC 1/9] tgsi: add precise flag to tgsi_instruction

[Nouveau] [RFC 2/9] tgsi/dump: print _PRECISE modifier on Instrutions

[Nouveau] [RFC 3/9] st/glsl_to_tgsi: handle precise modifier

[Nouveau] [RFC 4/9] tgsi: populate precise

[Nouveau] [RFC 5/9] tgsi/text: parse _PRECISE modifier

[Nouveau] [RFC 6/9] nv50/ir: add precise field to Instruction

[Nouveau] [RFC 7/9] nv50/ir/tgsi: handle precise for most ALU instructions

[Nouveau] [RFC 8/9] nv50/ir: disable mul+add to mad for precise instructions

[Nouveau] [RFC 9/9] nv50/ir/tgsi: split mad to mul+add

[Nouveau] [Mesa-dev] [RFC 5/9] tgsi/text: parse _PRECISE modifier

[Nouveau] [Mesa-dev] [RFC 4/9] tgsi: populate precise

[Nouveau] [Mesa-dev] [RFC 3/9] st/glsl_to_tgsi: handle precise modifier

[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

Apparently Analagous Threads