Karol Herbst
2017-Jun-11 18:42 UTC
[Nouveau] [RFC 0/9] Add precise/invariant semantics to TGSI
Running Tomb Raider on Nouveau I found some flicker caused by ignoring precise modifiers on variables inside Nouveau. This series add precise/invariant handling to TGSI, which can be then used by drivers to disable certain unsafe optimisations which may otherwise alter calculations, which depend on having the same result across shaders. This series fixes this bug in Tomb Raider and one CTS test for 4.4 and 4.5 Note on Patch 3: I really dislike how I tell glsl_to_tgsi_visitor to apply the precise flag on instruction emited in ir_assignment->rhs->accept(); but I found no other easy way to handle this. Maybe somebody of you has a better idea? Karol Herbst (9): tgsi: add precise flag to tgsi_instruction tgsi/dump: print _PRECISE modifier on Instrutions st/glsl_to_tgsi: handle precise modifier tgsi: populate precise tgsi/text: parse _PRECISE modifier nv50/ir: add precise field to Instruction nv50/ir/tgsi: handle precise for most ALU instructions nv50/ir: disable mul+add to mad for precise instructions nv50/ir/tgsi: split mad to mul+add src/gallium/auxiliary/tgsi/tgsi_build.c | 4 + src/gallium/auxiliary/tgsi/tgsi_dump.c | 4 + src/gallium/auxiliary/tgsi/tgsi_text.c | 15 +++- src/gallium/auxiliary/tgsi/tgsi_ureg.c | 14 +++- src/gallium/auxiliary/tgsi/tgsi_ureg.h | 20 ++++- src/gallium/auxiliary/util/u_simple_shaders.c | 2 +- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 + .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 16 ++++ .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +- src/gallium/include/pipe/p_shader_tokens.h | 3 +- src/gallium/state_trackers/nine/nine_shader.c | 6 +- src/mesa/state_tracker/st_atifs_to_tgsi.c | 38 ++++----- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 92 +++++++++++++++++----- src/mesa/state_tracker/st_mesa_to_tgsi.c | 8 +- src/mesa/state_tracker/st_pbo.c | 2 +- 15 files changed, 172 insertions(+), 59 deletions(-) -- 2.13.1
Karol Herbst
2017-Jun-11 18:42 UTC
[Nouveau] [RFC 1/9] tgsi: add precise flag to tgsi_instruction
Signed-off-by: Karol Herbst <karolherbst at gmail.com> --- src/gallium/auxiliary/tgsi/tgsi_build.c | 1 + src/gallium/include/pipe/p_shader_tokens.h | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c b/src/gallium/auxiliary/tgsi/tgsi_build.c index 00843241f8..55e4d064ed 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_build.c +++ b/src/gallium/auxiliary/tgsi/tgsi_build.c @@ -642,6 +642,7 @@ tgsi_default_instruction( void ) instruction.Label = 0; instruction.Texture = 0; instruction.Memory = 0; + instruction.Precise = 0; instruction.Padding = 0; return instruction; diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index 1e08d97329..aa0fb3e3b3 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -638,7 +638,8 @@ struct tgsi_instruction unsigned Label : 1; unsigned Texture : 1; unsigned Memory : 1; - unsigned Padding : 2; + unsigned Precise : 1; + unsigned Padding : 1; }; /* -- 2.13.1
Karol Herbst
2017-Jun-11 18:42 UTC
[Nouveau] [RFC 2/9] tgsi/dump: print _PRECISE modifier on Instrutions
Signed-off-by: Karol Herbst <karolherbst at gmail.com> --- src/gallium/auxiliary/tgsi/tgsi_dump.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c b/src/gallium/auxiliary/tgsi/tgsi_dump.c index f6eba7424b..b58e64511c 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_dump.c +++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c @@ -584,6 +584,10 @@ iter_instruction( TXT( "_SAT" ); } + if (inst->Instruction.Precise) { + TXT( "_PRECISE" ); + } + for (i = 0; i < inst->Instruction.NumDstRegs; i++) { const struct tgsi_full_dst_register *dst = &inst->Dst[i]; -- 2.13.1
Karol Herbst
2017-Jun-11 18:42 UTC
[Nouveau] [RFC 3/9] st/glsl_to_tgsi: handle precise modifier
all subexpression inside an ir_assignment needs to be tagged as precise. Signed-off-by: Karol Herbst <karolherbst at gmail.com> --- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 80 ++++++++++++++++++++++++------ 1 file changed, 65 insertions(+), 15 deletions(-) diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index c5d2e0fcd2..19f90f21fe 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -87,6 +87,13 @@ static int swizzle_for_type(const glsl_type *type, int component = 0) return swizzle; } +static unsigned is_precise(const ir_variable *ir) +{ + if (!ir) + return 0; + return ir->data.precise || ir->data.invariant; +} + /** * This struct is a corresponding struct to TGSI ureg_src. */ @@ -296,6 +303,7 @@ public: ir_instruction *ir; unsigned op:8; /**< TGSI opcode */ + unsigned precise:1; unsigned saturate:1; unsigned is_64bit_expanded:1; unsigned sampler_base:5; @@ -435,6 +443,7 @@ public: bool have_fma; bool use_shared_memory; bool has_tex_txf_lz; + unsigned precise; variable_storage *find_variable_storage(ir_variable *var); @@ -505,13 +514,29 @@ public: st_src_reg src0 = undef_src, st_src_reg src1 = undef_src, st_src_reg src2 = undef_src, - st_src_reg src3 = undef_src); + st_src_reg src3 = undef_src, + unsigned precise = 0); glsl_to_tgsi_instruction *emit_asm(ir_instruction *ir, unsigned op, st_dst_reg dst, st_dst_reg dst1, st_src_reg src0 = undef_src, st_src_reg src1 = undef_src, st_src_reg src2 = undef_src, + st_src_reg src3 = undef_src, + unsigned precise = 0); + + glsl_to_tgsi_instruction *emit_asm(ir_expression *ir, unsigned op, + st_dst_reg dst = undef_dst, + st_src_reg src0 = undef_src, + st_src_reg src1 = undef_src, + st_src_reg src2 = undef_src, + st_src_reg src3 = undef_src); + + glsl_to_tgsi_instruction *emit_asm(ir_expression *ir, unsigned op, + st_dst_reg dst, st_dst_reg dst1, + st_src_reg src0 = undef_src, + st_src_reg src1 = undef_src, + st_src_reg src2 = undef_src, st_src_reg src3 = undef_src); unsigned get_opcode(unsigned op, @@ -650,7 +675,8 @@ glsl_to_tgsi_instruction * glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op, st_dst_reg dst, st_dst_reg dst1, st_src_reg src0, st_src_reg src1, - st_src_reg src2, st_src_reg src3) + st_src_reg src2, st_src_reg src3, + unsigned precise) { glsl_to_tgsi_instruction *inst = new(mem_ctx) glsl_to_tgsi_instruction(); int num_reladdr = 0, i, j; @@ -691,6 +717,7 @@ glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op, STATIC_ASSERT(TGSI_OPCODE_LAST <= 255); inst->op = op; + inst->precise = precise; inst->info = tgsi_get_opcode_info(op); inst->dst[0] = dst; inst->dst[1] = dst1; @@ -881,9 +908,28 @@ glsl_to_tgsi_instruction * glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op, st_dst_reg dst, st_src_reg src0, st_src_reg src1, + st_src_reg src2, st_src_reg src3, + unsigned precise) +{ + return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3, precise); +} + +glsl_to_tgsi_instruction * +glsl_to_tgsi_visitor::emit_asm(ir_expression *ir, unsigned op, + st_dst_reg dst, + st_src_reg src0, st_src_reg src1, + st_src_reg src2, st_src_reg src3) +{ + return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3, this->precise); +} + +glsl_to_tgsi_instruction * +glsl_to_tgsi_visitor::emit_asm(ir_expression *ir, unsigned op, + st_dst_reg dst, st_dst_reg dst1, + st_src_reg src0, st_src_reg src1, st_src_reg src2, st_src_reg src3) { - return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3); + return emit_asm(ir, op, dst, dst1, src0, src1, src2, src3, this->precise); } /** @@ -1116,7 +1162,7 @@ glsl_to_tgsi_visitor::emit_arl(ir_instruction *ir, if (dst.index >= this->num_address_regs) this->num_address_regs = dst.index + 1; - emit_asm(NULL, op, dst, src0); + emit_asm((ir_instruction *)NULL, op, dst, src0); } int @@ -1406,11 +1452,11 @@ glsl_to_tgsi_visitor::visit(ir_variable *ir) void glsl_to_tgsi_visitor::visit(ir_loop *ir) { - emit_asm(NULL, TGSI_OPCODE_BGNLOOP); + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_BGNLOOP); visit_exec_list(&ir->body_instructions, this); - emit_asm(NULL, TGSI_OPCODE_ENDLOOP); + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_ENDLOOP); } void @@ -1418,10 +1464,10 @@ glsl_to_tgsi_visitor::visit(ir_loop_jump *ir) { switch (ir->mode) { case ir_loop_jump::jump_break: - emit_asm(NULL, TGSI_OPCODE_BRK); + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_BRK); break; case ir_loop_jump::jump_continue: - emit_asm(NULL, TGSI_OPCODE_CONT); + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_CONT); break; } } @@ -2703,7 +2749,7 @@ glsl_to_tgsi_visitor::visit(ir_dereference_variable *ir) st_dst_reg dst = st_dst_reg(get_temp(var->type)); st_src_reg src = st_src_reg(PROGRAM_OUTPUT, decl->mesa_index, var->type, component, decl->array_id); - emit_asm(NULL, TGSI_OPCODE_FBFETCH, dst, src); + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_FBFETCH, dst, src); entry = new(mem_ctx) variable_storage(var, dst.file, dst.index, dst.array_id); } else { @@ -3148,7 +3194,10 @@ glsl_to_tgsi_visitor::visit(ir_assignment *ir) st_dst_reg l; st_src_reg r; + /* all generated instructions need to be flaged as precise */ + this->precise = is_precise(ir->lhs->variable_referenced()); ir->rhs->accept(this); + this->precise = 0; r = this->result; l = get_assignment_lhs(ir->lhs, this, &dst_component); @@ -3233,7 +3282,8 @@ glsl_to_tgsi_visitor::visit(ir_assignment *ir) */ glsl_to_tgsi_instruction *inst, *new_inst; inst = (glsl_to_tgsi_instruction *)this->instructions.get_tail(); - new_inst = emit_asm(ir, inst->op, l, inst->src[0], inst->src[1], inst->src[2], inst->src[3]); + new_inst = emit_asm(ir, inst->op, l, inst->src[0], inst->src[1], inst->src[2], inst->src[3], + is_precise(ir->lhs->variable_referenced())); new_inst->saturate = inst->saturate; inst->dead_mask = inst->dst[0].writemask; } else { @@ -4072,16 +4122,16 @@ glsl_to_tgsi_visitor::calc_deref_offsets(ir_dereference *tail, deref_arr->array_index->accept(this); if (*array_elements != 1) - emit_asm(NULL, TGSI_OPCODE_MUL, temp_dst, this->result, st_src_reg_for_int(*array_elements)); + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MUL, temp_dst, this->result, st_src_reg_for_int(*array_elements)); else - emit_asm(NULL, TGSI_OPCODE_MOV, temp_dst, this->result); + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MOV, temp_dst, this->result); if (indirect->file == PROGRAM_UNDEFINED) *indirect = temp_reg; else { temp_dst = st_dst_reg(*indirect); temp_dst.writemask = 1; - emit_asm(NULL, TGSI_OPCODE_ADD, temp_dst, *indirect, temp_reg); + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_ADD, temp_dst, *indirect, temp_reg); } } else *index += array_index->value.u[0] * *array_elements; @@ -4141,7 +4191,7 @@ glsl_to_tgsi_visitor::canonicalize_gather_offset(st_src_reg offset) st_src_reg tmp = get_temp(glsl_type::ivec2_type); st_dst_reg tmp_dst = st_dst_reg(tmp); tmp_dst.writemask = WRITEMASK_XY; - emit_asm(NULL, TGSI_OPCODE_MOV, tmp_dst, offset); + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MOV, tmp_dst, offset); return tmp; } @@ -6777,7 +6827,7 @@ get_mesa_program_tgsi(struct gl_context *ctx, v->renumber_registers(); /* Write the END instruction. */ - v->emit_asm(NULL, TGSI_OPCODE_END); + v->emit_asm((ir_instruction *)NULL, TGSI_OPCODE_END); if (ctx->_Shader->Flags & GLSL_DUMP) { _mesa_log("\n"); -- 2.13.1
Only implemented for glsl->tgsi. Other converters just set precise to 0. Signed-off-by: Karol Herbst <karolherbst at gmail.com> --- src/gallium/auxiliary/tgsi/tgsi_build.c | 3 +++ src/gallium/auxiliary/tgsi/tgsi_ureg.c | 14 +++++++--- src/gallium/auxiliary/tgsi/tgsi_ureg.h | 20 +++++++++++--- src/gallium/auxiliary/util/u_simple_shaders.c | 2 +- src/gallium/state_trackers/nine/nine_shader.c | 6 ++--- src/mesa/state_tracker/st_atifs_to_tgsi.c | 38 +++++++++++++-------------- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 12 ++++----- src/mesa/state_tracker/st_mesa_to_tgsi.c | 8 +++--- src/mesa/state_tracker/st_pbo.c | 2 +- 9 files changed, 65 insertions(+), 40 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c b/src/gallium/auxiliary/tgsi/tgsi_build.c index 55e4d064ed..144a017768 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_build.c +++ b/src/gallium/auxiliary/tgsi/tgsi_build.c @@ -651,6 +651,7 @@ tgsi_default_instruction( void ) static struct tgsi_instruction tgsi_build_instruction(unsigned opcode, unsigned saturate, + unsigned precise, unsigned num_dst_regs, unsigned num_src_regs, struct tgsi_header *header) @@ -665,6 +666,7 @@ tgsi_build_instruction(unsigned opcode, instruction = tgsi_default_instruction(); instruction.Opcode = opcode; instruction.Saturate = saturate; + instruction.Precise = precise; instruction.NumDstRegs = num_dst_regs; instruction.NumSrcRegs = num_src_regs; @@ -1061,6 +1063,7 @@ tgsi_build_full_instruction( *instruction = tgsi_build_instruction(full_inst->Instruction.Opcode, full_inst->Instruction.Saturate, + full_inst->Instruction.Precise, full_inst->Instruction.NumDstRegs, full_inst->Instruction.NumSrcRegs, header); diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c index 5bd779728a..56db2252c5 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c @@ -1213,6 +1213,7 @@ struct ureg_emit_insn_result ureg_emit_insn(struct ureg_program *ureg, unsigned opcode, boolean saturate, + unsigned precise, unsigned num_dst, unsigned num_src) { @@ -1226,6 +1227,7 @@ ureg_emit_insn(struct ureg_program *ureg, out[0].insn = tgsi_default_instruction(); out[0].insn.Opcode = opcode; out[0].insn.Saturate = saturate; + out[0].insn.Precise = precise; out[0].insn.NumDstRegs = num_dst; out[0].insn.NumSrcRegs = num_src; @@ -1354,7 +1356,8 @@ ureg_insn(struct ureg_program *ureg, const struct ureg_dst *dst, unsigned nr_dst, const struct ureg_src *src, - unsigned nr_src ) + unsigned nr_src, + unsigned precise ) { struct ureg_emit_insn_result insn; unsigned i; @@ -1369,6 +1372,7 @@ ureg_insn(struct ureg_program *ureg, insn = ureg_emit_insn(ureg, opcode, saturate, + precise, nr_dst, nr_src); @@ -1391,7 +1395,8 @@ ureg_tex_insn(struct ureg_program *ureg, const struct tgsi_texture_offset *texoffsets, unsigned nr_offset, const struct ureg_src *src, - unsigned nr_src ) + unsigned nr_src, + unsigned precise ) { struct ureg_emit_insn_result insn; unsigned i; @@ -1406,6 +1411,7 @@ ureg_tex_insn(struct ureg_program *ureg, insn = ureg_emit_insn(ureg, opcode, saturate, + precise, nr_dst, nr_src); @@ -1434,7 +1440,8 @@ ureg_memory_insn(struct ureg_program *ureg, unsigned nr_src, unsigned qualifier, unsigned texture, - unsigned format) + unsigned format, + unsigned precise) { struct ureg_emit_insn_result insn; unsigned i; @@ -1442,6 +1449,7 @@ ureg_memory_insn(struct ureg_program *ureg, insn = ureg_emit_insn(ureg, opcode, FALSE, + precise, nr_dst, nr_src); diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.h b/src/gallium/auxiliary/tgsi/tgsi_ureg.h index 54f95ba565..105c85abd5 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.h +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.h @@ -546,7 +546,8 @@ ureg_insn(struct ureg_program *ureg, const struct ureg_dst *dst, unsigned nr_dst, const struct ureg_src *src, - unsigned nr_src ); + unsigned nr_src, + unsigned precise); void @@ -559,7 +560,8 @@ ureg_tex_insn(struct ureg_program *ureg, const struct tgsi_texture_offset *texoffsets, unsigned nr_offset, const struct ureg_src *src, - unsigned nr_src ); + unsigned nr_src, + unsigned precise); void @@ -571,7 +573,8 @@ ureg_memory_insn(struct ureg_program *ureg, unsigned nr_src, unsigned qualifier, unsigned texture, - unsigned format); + unsigned format, + unsigned precise); /*********************************************************************** * Internal instruction helpers, don't call these directly: @@ -586,6 +589,7 @@ struct ureg_emit_insn_result ureg_emit_insn(struct ureg_program *ureg, unsigned opcode, boolean saturate, + unsigned precise, unsigned num_dst, unsigned num_src); @@ -632,6 +636,7 @@ static inline void ureg_##op( struct ureg_program *ureg ) \ opcode, \ FALSE, \ 0, \ + 0, \ 0); \ ureg_fixup_insn_size( ureg, insn.insn_token ); \ } @@ -646,6 +651,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ opcode, \ FALSE, \ 0, \ + 0, \ 1); \ ureg_emit_src( ureg, src ); \ ureg_fixup_insn_size( ureg, insn.insn_token ); \ @@ -661,6 +667,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ opcode, \ FALSE, \ 0, \ + 0, \ 0); \ ureg_emit_label( ureg, insn.extended_token, label_token ); \ ureg_fixup_insn_size( ureg, insn.insn_token ); \ @@ -677,6 +684,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ opcode, \ FALSE, \ 0, \ + 0, \ 1); \ ureg_emit_label( ureg, insn.extended_token, label_token ); \ ureg_emit_src( ureg, src ); \ @@ -694,6 +702,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ insn = ureg_emit_insn(ureg, \ opcode, \ dst.Saturate, \ + 0, \ 1, \ 0); \ ureg_emit_dst( ureg, dst ); \ @@ -713,6 +722,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ insn = ureg_emit_insn(ureg, \ opcode, \ dst.Saturate, \ + 0, \ 1, \ 1); \ ureg_emit_dst( ureg, dst ); \ @@ -733,6 +743,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ insn = ureg_emit_insn(ureg, \ opcode, \ dst.Saturate, \ + 0, \ 1, \ 2); \ ureg_emit_dst( ureg, dst ); \ @@ -756,6 +767,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ insn = ureg_emit_insn(ureg, \ opcode, \ dst.Saturate, \ + 0, \ 1, \ 2); \ ureg_emit_texture( ureg, insn.extended_token, target, \ @@ -780,6 +792,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ insn = ureg_emit_insn(ureg, \ opcode, \ dst.Saturate, \ + 0, \ 1, \ 3); \ ureg_emit_dst( ureg, dst ); \ @@ -806,6 +819,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ insn = ureg_emit_insn(ureg, \ opcode, \ dst.Saturate, \ + 0, \ 1, \ 4); \ ureg_emit_texture( ureg, insn.extended_token, target, \ diff --git a/src/gallium/auxiliary/util/u_simple_shaders.c b/src/gallium/auxiliary/util/u_simple_shaders.c index 5874d0e9aa..79331b5638 100644 --- a/src/gallium/auxiliary/util/u_simple_shaders.c +++ b/src/gallium/auxiliary/util/u_simple_shaders.c @@ -954,7 +954,7 @@ util_make_geometry_passthrough_shader(struct pipe_context *pipe, } /* EMIT IMM[0] */ - ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, &imm, 1); + ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, &imm, 1, 0); /* END */ ureg_END(ureg); diff --git a/src/gallium/state_trackers/nine/nine_shader.c b/src/gallium/state_trackers/nine/nine_shader.c index 40fb6be88f..f405090811 100644 --- a/src/gallium/state_trackers/nine/nine_shader.c +++ b/src/gallium/state_trackers/nine/nine_shader.c @@ -1879,7 +1879,7 @@ DECL_SPECIAL(IFC) struct ureg_dst tmp = ureg_writemask(tx_scratch(tx), TGSI_WRITEMASK_X); src[0] = tx_src_param(tx, &tx->insn.src[0]); src[1] = tx_src_param(tx, &tx->insn.src[1]); - ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2); + ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2, 0); ureg_IF(tx->ureg, ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X), tx_cond(tx)); return D3D_OK; } @@ -1897,7 +1897,7 @@ DECL_SPECIAL(BREAKC) struct ureg_dst tmp = ureg_writemask(tx_scratch(tx), TGSI_WRITEMASK_X); src[0] = tx_src_param(tx, &tx->insn.src[0]); src[1] = tx_src_param(tx, &tx->insn.src[1]); - ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2); + ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2, 0); ureg_IF(tx->ureg, ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X), tx_cond(tx)); ureg_BRK(tx->ureg); tx_endcond(tx); @@ -3029,7 +3029,7 @@ NineTranslateInstruction_Generic(struct shader_translator *tx) ureg_insn(tx->ureg, tx->insn.info->opcode, dst, tx->insn.ndst, - src, tx->insn.nsrc); + src, tx->insn.nsrc, 0); return D3D_OK; } diff --git a/src/mesa/state_tracker/st_atifs_to_tgsi.c b/src/mesa/state_tracker/st_atifs_to_tgsi.c index 338ced56ed..e0a6ff7131 100644 --- a/src/mesa/state_tracker/st_atifs_to_tgsi.c +++ b/src/mesa/state_tracker/st_atifs_to_tgsi.c @@ -105,18 +105,18 @@ apply_swizzle(struct st_translate *t, imm[0] = src; imm[1] = ureg_imm4f(t->ureg, 1.0f, 1.0f, 0.0f, 0.0f); imm[2] = ureg_imm4f(t->ureg, 0.0f, 0.0f, 1.0f, 1.0f); - ureg_insn(t->ureg, TGSI_OPCODE_MAD, &tmp[0], 1, imm, 3); + ureg_insn(t->ureg, TGSI_OPCODE_MAD, &tmp[0], 1, imm, 3, 0); if (swizzle == GL_SWIZZLE_STR_DR_ATI) { imm[0] = ureg_scalar(src, TGSI_SWIZZLE_Z); } else { imm[0] = ureg_scalar(src, TGSI_SWIZZLE_W); } - ureg_insn(t->ureg, TGSI_OPCODE_RCP, &tmp[1], 1, &imm[0], 1); + ureg_insn(t->ureg, TGSI_OPCODE_RCP, &tmp[1], 1, &imm[0], 1, 0); imm[0] = ureg_src(tmp[0]); imm[1] = ureg_src(tmp[1]); - ureg_insn(t->ureg, TGSI_OPCODE_MUL, &tmp[0], 1, imm, 2); + ureg_insn(t->ureg, TGSI_OPCODE_MUL, &tmp[0], 1, imm, 2, 0); return ureg_src(tmp[0]); } @@ -170,35 +170,35 @@ prepare_argument(struct st_translate *t, const unsigned argId, src = ureg_scalar(src, TGSI_SWIZZLE_W); break; } - ureg_insn(t->ureg, TGSI_OPCODE_MOV, &arg, 1, &src, 1); + ureg_insn(t->ureg, TGSI_OPCODE_MOV, &arg, 1, &src, 1, 0); if (srcReg->argMod & GL_COMP_BIT_ATI) { struct ureg_src modsrc[2]; modsrc[0] = ureg_imm1f(t->ureg, 1.0f); modsrc[1] = ureg_negate(ureg_src(arg)); - ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2); + ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0); } if (srcReg->argMod & GL_BIAS_BIT_ATI) { struct ureg_src modsrc[2]; modsrc[0] = ureg_src(arg); modsrc[1] = ureg_imm1f(t->ureg, -0.5f); - ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2); + ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0); } if (srcReg->argMod & GL_2X_BIT_ATI) { struct ureg_src modsrc[2]; modsrc[0] = ureg_src(arg); modsrc[1] = ureg_src(arg); - ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2); + ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0); } if (srcReg->argMod & GL_NEGATE_BIT_ATI) { struct ureg_src modsrc[2]; modsrc[0] = ureg_src(arg); modsrc[1] = ureg_imm1f(t->ureg, -1.0f); - ureg_insn(t->ureg, TGSI_OPCODE_MUL, &arg, 1, modsrc, 2); + ureg_insn(t->ureg, TGSI_OPCODE_MUL, &arg, 1, modsrc, 2, 0); } return ureg_src(arg); } @@ -217,25 +217,25 @@ emit_special_inst(struct st_translate *t, const struct instruction_desc *desc, tmp[0] = get_temp(t, MAX_NUM_FRAGMENT_REGISTERS_ATI + 2); /* re-purpose a3 */ src[0] = ureg_imm1f(t->ureg, 0.5f); src[1] = ureg_negate(args[2]); - ureg_insn(t->ureg, TGSI_OPCODE_ADD, tmp, 1, src, 2); + ureg_insn(t->ureg, TGSI_OPCODE_ADD, tmp, 1, src, 2, 0); src[0] = ureg_src(tmp[0]); src[1] = args[0]; src[2] = args[1]; - ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3); + ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3, 0); } else if (!strcmp(desc->name, "CND0")) { src[0] = args[2]; src[1] = args[1]; src[2] = args[0]; - ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3); + ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3, 0); } else if (!strcmp(desc->name, "DOT2_ADD")) { /* note: DP2A is not implemented in most pipe drivers */ tmp[0] = get_temp(t, MAX_NUM_FRAGMENT_REGISTERS_ATI); /* re-purpose a1 */ src[0] = args[0]; src[1] = args[1]; - ureg_insn(t->ureg, TGSI_OPCODE_DP2, tmp, 1, src, 2); + ureg_insn(t->ureg, TGSI_OPCODE_DP2, tmp, 1, src, 2, 0); src[0] = ureg_src(tmp[0]); src[1] = ureg_scalar(args[2], TGSI_SWIZZLE_Z); - ureg_insn(t->ureg, TGSI_OPCODE_ADD, dst, 1, src, 2); + ureg_insn(t->ureg, TGSI_OPCODE_ADD, dst, 1, src, 2, 0); } } @@ -249,7 +249,7 @@ emit_arith_inst(struct st_translate *t, return; } - ureg_insn(t->ureg, desc->TGSI_opcode, dst, 1, args, argcount); + ureg_insn(t->ureg, desc->TGSI_opcode, dst, 1, args, argcount, 0); } static void @@ -292,7 +292,7 @@ emit_dstmod(struct st_translate *t, if (dstMod & GL_SATURATE_BIT_ATI) { dst = ureg_saturate(dst); } - ureg_insn(t->ureg, TGSI_OPCODE_MUL, &dst, 1, src, 2); + ureg_insn(t->ureg, TGSI_OPCODE_MUL, &dst, 1, src, 2, 0); } /** @@ -334,9 +334,9 @@ compile_setupinst(struct st_translate *t, src[1] = t->samplers[r]; /* the texture target is still unknown, it will be fixed in the draw call */ ureg_tex_insn(t->ureg, TGSI_OPCODE_TEX, dst, 1, TGSI_TEXTURE_2D, - TGSI_RETURN_TYPE_FLOAT, NULL, 0, src, 2); + TGSI_RETURN_TYPE_FLOAT, NULL, 0, src, 2, 0); } else if (texinst->Opcode == ATI_FRAGMENT_SHADER_PASS_OP) { - ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1); + ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1, 0); } t->regs_written[t->current_pass][r] = true; @@ -408,11 +408,11 @@ finalize_shader(struct st_translate *t, unsigned numPasses) /* copy the result into the OUT slot */ dst[0] = t->outputs[t->outputMapping[FRAG_RESULT_COLOR]]; src[0] = ureg_src(t->temps[0]); - ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1); + ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1, 0); } /* signal the end of the program */ - ureg_insn(t->ureg, TGSI_OPCODE_END, dst, 0, src, 0); + ureg_insn(t->ureg, TGSI_OPCODE_END, dst, 0, src, 0, 0); } /** diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 19f90f21fe..ecd9f9f280 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -5900,7 +5900,7 @@ compile_tgsi_instruction(struct st_translate *t, case TGSI_OPCODE_IF: case TGSI_OPCODE_UIF: assert(num_dst == 0); - ureg_insn(ureg, inst->op, NULL, 0, src, num_src); + ureg_insn(ureg, inst->op, NULL, 0, src, num_src, inst->precise); return; case TGSI_OPCODE_TEX: @@ -5935,7 +5935,7 @@ compile_tgsi_instruction(struct st_translate *t, tex_target, st_translate_texture_type(inst->tex_type), texoffsets, inst->tex_offset_num_offset, - src, num_src); + src, num_src, inst->precise); return; case TGSI_OPCODE_RESQ: @@ -5966,7 +5966,7 @@ compile_tgsi_instruction(struct st_translate *t, assert(src[0].File != TGSI_FILE_NULL); ureg_memory_insn(ureg, inst->op, dst, num_dst, src, num_src, inst->buffer_access, - tex_target, inst->image_format); + tex_target, inst->image_format, inst->precise); break; case TGSI_OPCODE_STORE: @@ -5984,19 +5984,19 @@ compile_tgsi_instruction(struct st_translate *t, assert(dst[0].File != TGSI_FILE_NULL); ureg_memory_insn(ureg, inst->op, dst, num_dst, src, num_src, inst->buffer_access, - tex_target, inst->image_format); + tex_target, inst->image_format, inst->precise); break; case TGSI_OPCODE_SCS: dst[0] = ureg_writemask(dst[0], TGSI_WRITEMASK_XY); - ureg_insn(ureg, inst->op, dst, num_dst, src, num_src); + ureg_insn(ureg, inst->op, dst, num_dst, src, num_src, inst->precise); break; default: ureg_insn(ureg, inst->op, dst, num_dst, - src, num_src); + src, num_src, inst->precise); break; } } diff --git a/src/mesa/state_tracker/st_mesa_to_tgsi.c b/src/mesa/state_tracker/st_mesa_to_tgsi.c index 984ff92130..f11013c116 100644 --- a/src/mesa/state_tracker/st_mesa_to_tgsi.c +++ b/src/mesa/state_tracker/st_mesa_to_tgsi.c @@ -558,7 +558,7 @@ compile_instruction( inst->TexShadow ), TGSI_RETURN_TYPE_FLOAT, NULL, 0, - src, num_src ); + src, num_src, 0 ); return; case OPCODE_SCS: @@ -566,7 +566,7 @@ compile_instruction( ureg_insn( ureg, translate_opcode( inst->Opcode ), dst, num_dst, - src, num_src ); + src, num_src, 0 ); break; case OPCODE_XPD: @@ -574,7 +574,7 @@ compile_instruction( ureg_insn( ureg, translate_opcode( inst->Opcode ), dst, num_dst, - src, num_src ); + src, num_src, 0 ); break; case OPCODE_RSQ: @@ -593,7 +593,7 @@ compile_instruction( ureg_insn( ureg, translate_opcode( inst->Opcode ), dst, num_dst, - src, num_src ); + src, num_src, 0); break; } } diff --git a/src/mesa/state_tracker/st_pbo.c b/src/mesa/state_tracker/st_pbo.c index 303c8535b2..3dff1609e8 100644 --- a/src/mesa/state_tracker/st_pbo.c +++ b/src/mesa/state_tracker/st_pbo.c @@ -528,7 +528,7 @@ create_fs(struct st_context *st, bool download, enum pipe_texture_target target, op[0] = ureg_src(temp0); op[1] = ureg_src(temp1); ureg_memory_insn(ureg, TGSI_OPCODE_STORE, &out, 1, op, 2, 0, - TGSI_TEXTURE_BUFFER, PIPE_FORMAT_NONE); + TGSI_TEXTURE_BUFFER, PIPE_FORMAT_NONE, 0); ureg_release_temporary(ureg, temp1); } else { -- 2.13.1
Signed-off-by: Karol Herbst <karolherbst at gmail.com> --- src/gallium/auxiliary/tgsi/tgsi_text.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c b/src/gallium/auxiliary/tgsi/tgsi_text.c index 93a05568f4..c5fcb3283d 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_text.c +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c @@ -999,6 +999,7 @@ parse_texoffset_operand( static boolean match_inst(const char **pcur, unsigned *saturate, + unsigned *precise, const struct tgsi_opcode_info *info) { const char *cur = *pcur; @@ -1007,6 +1008,7 @@ match_inst(const char **pcur, if (str_match_nocase_whole(&cur, info->mnemonic)) { *pcur = cur; *saturate = 0; + *precise = 0; return TRUE; } @@ -1015,8 +1017,15 @@ match_inst(const char **pcur, if (str_match_nocase_whole(&cur, "_SAT")) { *pcur = cur; *saturate = 1; - return TRUE; } + + if (str_match_nocase_whole(&cur, "_PRECISE")) { + *pcur = cur; + *precise = 1; + } + + if (*precise || *saturate) + return TRUE; } return FALSE; @@ -1029,6 +1038,7 @@ parse_instruction( { uint i; uint saturate = 0; + uint precise = 0; const struct tgsi_opcode_info *info; struct tgsi_full_instruction inst; const char *cur; @@ -1043,7 +1053,7 @@ parse_instruction( cur = ctx->cur; info = tgsi_get_opcode_info( i ); - if (match_inst(&cur, &saturate, info)) { + if (match_inst(&cur, &saturate, &precise, info)) { if (info->num_dst + info->num_src + info->is_tex == 0) { ctx->cur = cur; break; @@ -1064,6 +1074,7 @@ parse_instruction( inst.Instruction.Opcode = i; inst.Instruction.Saturate = saturate; + inst.Instruction.Precise = precise; inst.Instruction.NumDstRegs = info->num_dst; inst.Instruction.NumSrcRegs = info->num_src; -- 2.13.1
Karol Herbst
2017-Jun-11 18:42 UTC
[Nouveau] [RFC 6/9] nv50/ir: add precise field to Instruction
Signed-off-by: Karol Herbst <karolherbst at gmail.com> --- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h index 5c09fed05c..6835c4fa8c 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h @@ -884,6 +884,7 @@ public: unsigned perPatch : 1; unsigned exit : 1; // terminate program after insn unsigned mask : 4; // for vector ops + unsigned precise : 1; // prevent algebraic optimisations like mul+add to mad int8_t postFactor; // MUL/DIV(if < 0) by 1 << postFactor -- 2.13.1
Karol Herbst
2017-Jun-11 18:42 UTC
[Nouveau] [RFC 7/9] nv50/ir/tgsi: handle precise for most ALU instructions
Signed-off-by: Karol Herbst <karolherbst at gmail.com> --- src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 1264dd4834..c633185893 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -3179,6 +3179,7 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn) geni->subOp = tgsi::opcodeToSubOp(tgsi.getOpcode()); if (op == OP_MUL && dstTy == TYPE_F32) geni->dnz = info->io.mul_zero_wins; + geni->precise = insn->Instruction.Precise; } break; case TGSI_OPCODE_MAD: @@ -3192,6 +3193,7 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn) geni = mkOp3(op, dstTy, dst0[c], src0, src1, src2); if (dstTy == TYPE_F32) geni->dnz = info->io.mul_zero_wins; + geni->precise = insn->Instruction.Precise; } break; case TGSI_OPCODE_MOV: -- 2.13.1
Karol Herbst
2017-Jun-11 18:42 UTC
[Nouveau] [RFC 8/9] nv50/ir: disable mul+add to mad for precise instructions
fixes missrendering in TombRaider Signed-off-by: Karol Herbst <karolherbst at gmail.com> --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 4c92a1efb5..85f3f44832 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1669,6 +1669,10 @@ AlgebraicOpt::handleABS(Instruction *abs) bool AlgebraicOpt::handleADD(Instruction *add) { + // we can't optimize to SAD/MAD if the instruction is tagged as precise + if (add->precise) + return false; + Value *src0 = add->getSrc(0); Value *src1 = add->getSrc(1); @@ -1712,7 +1716,7 @@ AlgebraicOpt::tryADDToMADOrSAD(Instruction *add, operation toOp) return false; if (src->getInsn()->saturate || src->getInsn()->postFactor || - src->getInsn()->dnz) + src->getInsn()->dnz || src->getInsn()->precise) return false; if (toOp == OP_SAD) { -- 2.13.1
fixes KHR-GL44.gpu_shader5.precise_qualifier KHR-GL45.gpu_shader5.precise_qualifier Signed-off-by: Karol Herbst <karolherbst at gmail.com> --- src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index c633185893..cd45e82426 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -3184,6 +3184,20 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn) break; case TGSI_OPCODE_MAD: case TGSI_OPCODE_UMAD: + FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) { + val0 = getSSA(); + src0 = fetchSrc(0, c); + src1 = fetchSrc(1, c); + src2 = fetchSrc(2, c); + geni = mkOp2(OP_MUL, dstTy, val0, src0, src1); + if (dstTy == TYPE_F32) + geni->dnz = info->io.mul_zero_wins; + geni->precise = insn->Instruction.Precise; + + geni = mkOp2(OP_ADD, dstTy, dst0[c], val0, src2); + geni->precise = insn->Instruction.Precise; + } + break; case TGSI_OPCODE_SAD: case TGSI_OPCODE_FMA: FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) { -- 2.13.1
Nicolai Hähnle
2017-Jun-12 10:31 UTC
[Nouveau] [Mesa-dev] [RFC 5/9] tgsi/text: parse _PRECISE modifier
On 11.06.2017 20:42, Karol Herbst wrote:> Signed-off-by: Karol Herbst <karolherbst at gmail.com> > --- > src/gallium/auxiliary/tgsi/tgsi_text.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c b/src/gallium/auxiliary/tgsi/tgsi_text.c > index 93a05568f4..c5fcb3283d 100644 > --- a/src/gallium/auxiliary/tgsi/tgsi_text.c > +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c > @@ -999,6 +999,7 @@ parse_texoffset_operand( > static boolean > match_inst(const char **pcur, > unsigned *saturate, > + unsigned *precise, > const struct tgsi_opcode_info *info) > { > const char *cur = *pcur; > @@ -1007,6 +1008,7 @@ match_inst(const char **pcur, > if (str_match_nocase_whole(&cur, info->mnemonic)) { > *pcur = cur; > *saturate = 0; > + *precise = 0; > return TRUE; > } > > @@ -1015,8 +1017,15 @@ match_inst(const char **pcur, > if (str_match_nocase_whole(&cur, "_SAT")) { > *pcur = cur; > *saturate = 1; > - return TRUE; > } > + > + if (str_match_nocase_whole(&cur, "_PRECISE")) { > + *pcur = cur; > + *precise = 1; > + }I think this doesn't properly handle the case where both _SAT and _PRECISE are present, because of using str_match_nocase_whole. Cheers, Nicolai> + > + if (*precise || *saturate) > + return TRUE; > } > > return FALSE; > @@ -1029,6 +1038,7 @@ parse_instruction( > { > uint i; > uint saturate = 0; > + uint precise = 0; > const struct tgsi_opcode_info *info; > struct tgsi_full_instruction inst; > const char *cur; > @@ -1043,7 +1053,7 @@ parse_instruction( > cur = ctx->cur; > > info = tgsi_get_opcode_info( i ); > - if (match_inst(&cur, &saturate, info)) { > + if (match_inst(&cur, &saturate, &precise, info)) { > if (info->num_dst + info->num_src + info->is_tex == 0) { > ctx->cur = cur; > break; > @@ -1064,6 +1074,7 @@ parse_instruction( > > inst.Instruction.Opcode = i; > inst.Instruction.Saturate = saturate; > + inst.Instruction.Precise = precise; > inst.Instruction.NumDstRegs = info->num_dst; > inst.Instruction.NumSrcRegs = info->num_src; > >-- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte.
Nicolai Hähnle
2017-Jun-12 10:33 UTC
[Nouveau] [Mesa-dev] [RFC 4/9] tgsi: populate precise
On 11.06.2017 20:42, Karol Herbst wrote:> Only implemented for glsl->tgsi. Other converters just set precise to 0. > > Signed-off-by: Karol Herbst <karolherbst at gmail.com> > --- > src/gallium/auxiliary/tgsi/tgsi_build.c | 3 +++ > src/gallium/auxiliary/tgsi/tgsi_ureg.c | 14 +++++++--- > src/gallium/auxiliary/tgsi/tgsi_ureg.h | 20 +++++++++++--- > src/gallium/auxiliary/util/u_simple_shaders.c | 2 +- > src/gallium/state_trackers/nine/nine_shader.c | 6 ++--- > src/mesa/state_tracker/st_atifs_to_tgsi.c | 38 +++++++++++++-------------- > src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 12 ++++----- > src/mesa/state_tracker/st_mesa_to_tgsi.c | 8 +++--- > src/mesa/state_tracker/st_pbo.c | 2 +- > 9 files changed, 65 insertions(+), 40 deletions(-) > > diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c b/src/gallium/auxiliary/tgsi/tgsi_build.c > index 55e4d064ed..144a017768 100644 > --- a/src/gallium/auxiliary/tgsi/tgsi_build.c > +++ b/src/gallium/auxiliary/tgsi/tgsi_build.c > @@ -651,6 +651,7 @@ tgsi_default_instruction( void ) > static struct tgsi_instruction > tgsi_build_instruction(unsigned opcode, > unsigned saturate, > + unsigned precise, > unsigned num_dst_regs, > unsigned num_src_regs, > struct tgsi_header *header) > @@ -665,6 +666,7 @@ tgsi_build_instruction(unsigned opcode, > instruction = tgsi_default_instruction(); > instruction.Opcode = opcode; > instruction.Saturate = saturate; > + instruction.Precise = precise; > instruction.NumDstRegs = num_dst_regs; > instruction.NumSrcRegs = num_src_regs; > > @@ -1061,6 +1063,7 @@ tgsi_build_full_instruction( > > *instruction = tgsi_build_instruction(full_inst->Instruction.Opcode, > full_inst->Instruction.Saturate, > + full_inst->Instruction.Precise, > full_inst->Instruction.NumDstRegs, > full_inst->Instruction.NumSrcRegs, > header); > diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c > index 5bd779728a..56db2252c5 100644 > --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c > +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c > @@ -1213,6 +1213,7 @@ struct ureg_emit_insn_result > ureg_emit_insn(struct ureg_program *ureg, > unsigned opcode, > boolean saturate, > + unsigned precise, > unsigned num_dst, > unsigned num_src) > { > @@ -1226,6 +1227,7 @@ ureg_emit_insn(struct ureg_program *ureg, > out[0].insn = tgsi_default_instruction(); > out[0].insn.Opcode = opcode; > out[0].insn.Saturate = saturate; > + out[0].insn.Precise = precise; > out[0].insn.NumDstRegs = num_dst; > out[0].insn.NumSrcRegs = num_src; > > @@ -1354,7 +1356,8 @@ ureg_insn(struct ureg_program *ureg, > const struct ureg_dst *dst, > unsigned nr_dst, > const struct ureg_src *src, > - unsigned nr_src ) > + unsigned nr_src, > + unsigned precise ) > { > struct ureg_emit_insn_result insn; > unsigned i; > @@ -1369,6 +1372,7 @@ ureg_insn(struct ureg_program *ureg, > insn = ureg_emit_insn(ureg, > opcode, > saturate, > + precise, > nr_dst, > nr_src); > > @@ -1391,7 +1395,8 @@ ureg_tex_insn(struct ureg_program *ureg, > const struct tgsi_texture_offset *texoffsets, > unsigned nr_offset, > const struct ureg_src *src, > - unsigned nr_src ) > + unsigned nr_src, > + unsigned precise )What does `precise' mean for tex instructions?> { > struct ureg_emit_insn_result insn; > unsigned i; > @@ -1406,6 +1411,7 @@ ureg_tex_insn(struct ureg_program *ureg, > insn = ureg_emit_insn(ureg, > opcode, > saturate, > + precise, > nr_dst, > nr_src); > > @@ -1434,7 +1440,8 @@ ureg_memory_insn(struct ureg_program *ureg, > unsigned nr_src, > unsigned qualifier, > unsigned texture, > - unsigned format) > + unsigned format, > + unsigned precise)Same question. I can't think of a possible meaning, in which case the parameter should be dropped. Cheers, Nicolai> { > struct ureg_emit_insn_result insn; > unsigned i; > @@ -1442,6 +1449,7 @@ ureg_memory_insn(struct ureg_program *ureg, > insn = ureg_emit_insn(ureg, > opcode, > FALSE, > + precise, > nr_dst, > nr_src); > > diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.h b/src/gallium/auxiliary/tgsi/tgsi_ureg.h > index 54f95ba565..105c85abd5 100644 > --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.h > +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.h > @@ -546,7 +546,8 @@ ureg_insn(struct ureg_program *ureg, > const struct ureg_dst *dst, > unsigned nr_dst, > const struct ureg_src *src, > - unsigned nr_src ); > + unsigned nr_src, > + unsigned precise); > > > void > @@ -559,7 +560,8 @@ ureg_tex_insn(struct ureg_program *ureg, > const struct tgsi_texture_offset *texoffsets, > unsigned nr_offset, > const struct ureg_src *src, > - unsigned nr_src ); > + unsigned nr_src, > + unsigned precise); > > > void > @@ -571,7 +573,8 @@ ureg_memory_insn(struct ureg_program *ureg, > unsigned nr_src, > unsigned qualifier, > unsigned texture, > - unsigned format); > + unsigned format, > + unsigned precise); > > /*********************************************************************** > * Internal instruction helpers, don't call these directly: > @@ -586,6 +589,7 @@ struct ureg_emit_insn_result > ureg_emit_insn(struct ureg_program *ureg, > unsigned opcode, > boolean saturate, > + unsigned precise, > unsigned num_dst, > unsigned num_src); > > @@ -632,6 +636,7 @@ static inline void ureg_##op( struct ureg_program *ureg ) \ > opcode, \ > FALSE, \ > 0, \ > + 0, \ > 0); \ > ureg_fixup_insn_size( ureg, insn.insn_token ); \ > } > @@ -646,6 +651,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ > opcode, \ > FALSE, \ > 0, \ > + 0, \ > 1); \ > ureg_emit_src( ureg, src ); \ > ureg_fixup_insn_size( ureg, insn.insn_token ); \ > @@ -661,6 +667,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ > opcode, \ > FALSE, \ > 0, \ > + 0, \ > 0); \ > ureg_emit_label( ureg, insn.extended_token, label_token ); \ > ureg_fixup_insn_size( ureg, insn.insn_token ); \ > @@ -677,6 +684,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ > opcode, \ > FALSE, \ > 0, \ > + 0, \ > 1); \ > ureg_emit_label( ureg, insn.extended_token, label_token ); \ > ureg_emit_src( ureg, src ); \ > @@ -694,6 +702,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ > insn = ureg_emit_insn(ureg, \ > opcode, \ > dst.Saturate, \ > + 0, \ > 1, \ > 0); \ > ureg_emit_dst( ureg, dst ); \ > @@ -713,6 +722,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ > insn = ureg_emit_insn(ureg, \ > opcode, \ > dst.Saturate, \ > + 0, \ > 1, \ > 1); \ > ureg_emit_dst( ureg, dst ); \ > @@ -733,6 +743,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ > insn = ureg_emit_insn(ureg, \ > opcode, \ > dst.Saturate, \ > + 0, \ > 1, \ > 2); \ > ureg_emit_dst( ureg, dst ); \ > @@ -756,6 +767,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ > insn = ureg_emit_insn(ureg, \ > opcode, \ > dst.Saturate, \ > + 0, \ > 1, \ > 2); \ > ureg_emit_texture( ureg, insn.extended_token, target, \ > @@ -780,6 +792,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ > insn = ureg_emit_insn(ureg, \ > opcode, \ > dst.Saturate, \ > + 0, \ > 1, \ > 3); \ > ureg_emit_dst( ureg, dst ); \ > @@ -806,6 +819,7 @@ static inline void ureg_##op( struct ureg_program *ureg, \ > insn = ureg_emit_insn(ureg, \ > opcode, \ > dst.Saturate, \ > + 0, \ > 1, \ > 4); \ > ureg_emit_texture( ureg, insn.extended_token, target, \ > diff --git a/src/gallium/auxiliary/util/u_simple_shaders.c b/src/gallium/auxiliary/util/u_simple_shaders.c > index 5874d0e9aa..79331b5638 100644 > --- a/src/gallium/auxiliary/util/u_simple_shaders.c > +++ b/src/gallium/auxiliary/util/u_simple_shaders.c > @@ -954,7 +954,7 @@ util_make_geometry_passthrough_shader(struct pipe_context *pipe, > } > > /* EMIT IMM[0] */ > - ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, &imm, 1); > + ureg_insn(ureg, TGSI_OPCODE_EMIT, NULL, 0, &imm, 1, 0); > > /* END */ > ureg_END(ureg); > diff --git a/src/gallium/state_trackers/nine/nine_shader.c b/src/gallium/state_trackers/nine/nine_shader.c > index 40fb6be88f..f405090811 100644 > --- a/src/gallium/state_trackers/nine/nine_shader.c > +++ b/src/gallium/state_trackers/nine/nine_shader.c > @@ -1879,7 +1879,7 @@ DECL_SPECIAL(IFC) > struct ureg_dst tmp = ureg_writemask(tx_scratch(tx), TGSI_WRITEMASK_X); > src[0] = tx_src_param(tx, &tx->insn.src[0]); > src[1] = tx_src_param(tx, &tx->insn.src[1]); > - ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2); > + ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2, 0); > ureg_IF(tx->ureg, ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X), tx_cond(tx)); > return D3D_OK; > } > @@ -1897,7 +1897,7 @@ DECL_SPECIAL(BREAKC) > struct ureg_dst tmp = ureg_writemask(tx_scratch(tx), TGSI_WRITEMASK_X); > src[0] = tx_src_param(tx, &tx->insn.src[0]); > src[1] = tx_src_param(tx, &tx->insn.src[1]); > - ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2); > + ureg_insn(tx->ureg, cmp_op, &tmp, 1, src, 2, 0); > ureg_IF(tx->ureg, ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X), tx_cond(tx)); > ureg_BRK(tx->ureg); > tx_endcond(tx); > @@ -3029,7 +3029,7 @@ NineTranslateInstruction_Generic(struct shader_translator *tx) > > ureg_insn(tx->ureg, tx->insn.info->opcode, > dst, tx->insn.ndst, > - src, tx->insn.nsrc); > + src, tx->insn.nsrc, 0); > return D3D_OK; > } > > diff --git a/src/mesa/state_tracker/st_atifs_to_tgsi.c b/src/mesa/state_tracker/st_atifs_to_tgsi.c > index 338ced56ed..e0a6ff7131 100644 > --- a/src/mesa/state_tracker/st_atifs_to_tgsi.c > +++ b/src/mesa/state_tracker/st_atifs_to_tgsi.c > @@ -105,18 +105,18 @@ apply_swizzle(struct st_translate *t, > imm[0] = src; > imm[1] = ureg_imm4f(t->ureg, 1.0f, 1.0f, 0.0f, 0.0f); > imm[2] = ureg_imm4f(t->ureg, 0.0f, 0.0f, 1.0f, 1.0f); > - ureg_insn(t->ureg, TGSI_OPCODE_MAD, &tmp[0], 1, imm, 3); > + ureg_insn(t->ureg, TGSI_OPCODE_MAD, &tmp[0], 1, imm, 3, 0); > > if (swizzle == GL_SWIZZLE_STR_DR_ATI) { > imm[0] = ureg_scalar(src, TGSI_SWIZZLE_Z); > } else { > imm[0] = ureg_scalar(src, TGSI_SWIZZLE_W); > } > - ureg_insn(t->ureg, TGSI_OPCODE_RCP, &tmp[1], 1, &imm[0], 1); > + ureg_insn(t->ureg, TGSI_OPCODE_RCP, &tmp[1], 1, &imm[0], 1, 0); > > imm[0] = ureg_src(tmp[0]); > imm[1] = ureg_src(tmp[1]); > - ureg_insn(t->ureg, TGSI_OPCODE_MUL, &tmp[0], 1, imm, 2); > + ureg_insn(t->ureg, TGSI_OPCODE_MUL, &tmp[0], 1, imm, 2, 0); > > return ureg_src(tmp[0]); > } > @@ -170,35 +170,35 @@ prepare_argument(struct st_translate *t, const unsigned argId, > src = ureg_scalar(src, TGSI_SWIZZLE_W); > break; > } > - ureg_insn(t->ureg, TGSI_OPCODE_MOV, &arg, 1, &src, 1); > + ureg_insn(t->ureg, TGSI_OPCODE_MOV, &arg, 1, &src, 1, 0); > > if (srcReg->argMod & GL_COMP_BIT_ATI) { > struct ureg_src modsrc[2]; > modsrc[0] = ureg_imm1f(t->ureg, 1.0f); > modsrc[1] = ureg_negate(ureg_src(arg)); > > - ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2); > + ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0); > } > if (srcReg->argMod & GL_BIAS_BIT_ATI) { > struct ureg_src modsrc[2]; > modsrc[0] = ureg_src(arg); > modsrc[1] = ureg_imm1f(t->ureg, -0.5f); > > - ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2); > + ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0); > } > if (srcReg->argMod & GL_2X_BIT_ATI) { > struct ureg_src modsrc[2]; > modsrc[0] = ureg_src(arg); > modsrc[1] = ureg_src(arg); > > - ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2); > + ureg_insn(t->ureg, TGSI_OPCODE_ADD, &arg, 1, modsrc, 2, 0); > } > if (srcReg->argMod & GL_NEGATE_BIT_ATI) { > struct ureg_src modsrc[2]; > modsrc[0] = ureg_src(arg); > modsrc[1] = ureg_imm1f(t->ureg, -1.0f); > > - ureg_insn(t->ureg, TGSI_OPCODE_MUL, &arg, 1, modsrc, 2); > + ureg_insn(t->ureg, TGSI_OPCODE_MUL, &arg, 1, modsrc, 2, 0); > } > return ureg_src(arg); > } > @@ -217,25 +217,25 @@ emit_special_inst(struct st_translate *t, const struct instruction_desc *desc, > tmp[0] = get_temp(t, MAX_NUM_FRAGMENT_REGISTERS_ATI + 2); /* re-purpose a3 */ > src[0] = ureg_imm1f(t->ureg, 0.5f); > src[1] = ureg_negate(args[2]); > - ureg_insn(t->ureg, TGSI_OPCODE_ADD, tmp, 1, src, 2); > + ureg_insn(t->ureg, TGSI_OPCODE_ADD, tmp, 1, src, 2, 0); > src[0] = ureg_src(tmp[0]); > src[1] = args[0]; > src[2] = args[1]; > - ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3); > + ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3, 0); > } else if (!strcmp(desc->name, "CND0")) { > src[0] = args[2]; > src[1] = args[1]; > src[2] = args[0]; > - ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3); > + ureg_insn(t->ureg, TGSI_OPCODE_CMP, dst, 1, src, 3, 0); > } else if (!strcmp(desc->name, "DOT2_ADD")) { > /* note: DP2A is not implemented in most pipe drivers */ > tmp[0] = get_temp(t, MAX_NUM_FRAGMENT_REGISTERS_ATI); /* re-purpose a1 */ > src[0] = args[0]; > src[1] = args[1]; > - ureg_insn(t->ureg, TGSI_OPCODE_DP2, tmp, 1, src, 2); > + ureg_insn(t->ureg, TGSI_OPCODE_DP2, tmp, 1, src, 2, 0); > src[0] = ureg_src(tmp[0]); > src[1] = ureg_scalar(args[2], TGSI_SWIZZLE_Z); > - ureg_insn(t->ureg, TGSI_OPCODE_ADD, dst, 1, src, 2); > + ureg_insn(t->ureg, TGSI_OPCODE_ADD, dst, 1, src, 2, 0); > } > } > > @@ -249,7 +249,7 @@ emit_arith_inst(struct st_translate *t, > return; > } > > - ureg_insn(t->ureg, desc->TGSI_opcode, dst, 1, args, argcount); > + ureg_insn(t->ureg, desc->TGSI_opcode, dst, 1, args, argcount, 0); > } > > static void > @@ -292,7 +292,7 @@ emit_dstmod(struct st_translate *t, > if (dstMod & GL_SATURATE_BIT_ATI) { > dst = ureg_saturate(dst); > } > - ureg_insn(t->ureg, TGSI_OPCODE_MUL, &dst, 1, src, 2); > + ureg_insn(t->ureg, TGSI_OPCODE_MUL, &dst, 1, src, 2, 0); > } > > /** > @@ -334,9 +334,9 @@ compile_setupinst(struct st_translate *t, > src[1] = t->samplers[r]; > /* the texture target is still unknown, it will be fixed in the draw call */ > ureg_tex_insn(t->ureg, TGSI_OPCODE_TEX, dst, 1, TGSI_TEXTURE_2D, > - TGSI_RETURN_TYPE_FLOAT, NULL, 0, src, 2); > + TGSI_RETURN_TYPE_FLOAT, NULL, 0, src, 2, 0); > } else if (texinst->Opcode == ATI_FRAGMENT_SHADER_PASS_OP) { > - ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1); > + ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1, 0); > } > > t->regs_written[t->current_pass][r] = true; > @@ -408,11 +408,11 @@ finalize_shader(struct st_translate *t, unsigned numPasses) > /* copy the result into the OUT slot */ > dst[0] = t->outputs[t->outputMapping[FRAG_RESULT_COLOR]]; > src[0] = ureg_src(t->temps[0]); > - ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1); > + ureg_insn(t->ureg, TGSI_OPCODE_MOV, dst, 1, src, 1, 0); > } > > /* signal the end of the program */ > - ureg_insn(t->ureg, TGSI_OPCODE_END, dst, 0, src, 0); > + ureg_insn(t->ureg, TGSI_OPCODE_END, dst, 0, src, 0, 0); > } > > /** > diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > index 19f90f21fe..ecd9f9f280 100644 > --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > @@ -5900,7 +5900,7 @@ compile_tgsi_instruction(struct st_translate *t, > case TGSI_OPCODE_IF: > case TGSI_OPCODE_UIF: > assert(num_dst == 0); > - ureg_insn(ureg, inst->op, NULL, 0, src, num_src); > + ureg_insn(ureg, inst->op, NULL, 0, src, num_src, inst->precise); > return; > > case TGSI_OPCODE_TEX: > @@ -5935,7 +5935,7 @@ compile_tgsi_instruction(struct st_translate *t, > tex_target, > st_translate_texture_type(inst->tex_type), > texoffsets, inst->tex_offset_num_offset, > - src, num_src); > + src, num_src, inst->precise); > return; > > case TGSI_OPCODE_RESQ: > @@ -5966,7 +5966,7 @@ compile_tgsi_instruction(struct st_translate *t, > assert(src[0].File != TGSI_FILE_NULL); > ureg_memory_insn(ureg, inst->op, dst, num_dst, src, num_src, > inst->buffer_access, > - tex_target, inst->image_format); > + tex_target, inst->image_format, inst->precise); > break; > > case TGSI_OPCODE_STORE: > @@ -5984,19 +5984,19 @@ compile_tgsi_instruction(struct st_translate *t, > assert(dst[0].File != TGSI_FILE_NULL); > ureg_memory_insn(ureg, inst->op, dst, num_dst, src, num_src, > inst->buffer_access, > - tex_target, inst->image_format); > + tex_target, inst->image_format, inst->precise); > break; > > case TGSI_OPCODE_SCS: > dst[0] = ureg_writemask(dst[0], TGSI_WRITEMASK_XY); > - ureg_insn(ureg, inst->op, dst, num_dst, src, num_src); > + ureg_insn(ureg, inst->op, dst, num_dst, src, num_src, inst->precise); > break; > > default: > ureg_insn(ureg, > inst->op, > dst, num_dst, > - src, num_src); > + src, num_src, inst->precise); > break; > } > } > diff --git a/src/mesa/state_tracker/st_mesa_to_tgsi.c b/src/mesa/state_tracker/st_mesa_to_tgsi.c > index 984ff92130..f11013c116 100644 > --- a/src/mesa/state_tracker/st_mesa_to_tgsi.c > +++ b/src/mesa/state_tracker/st_mesa_to_tgsi.c > @@ -558,7 +558,7 @@ compile_instruction( > inst->TexShadow ), > TGSI_RETURN_TYPE_FLOAT, > NULL, 0, > - src, num_src ); > + src, num_src, 0 ); > return; > > case OPCODE_SCS: > @@ -566,7 +566,7 @@ compile_instruction( > ureg_insn( ureg, > translate_opcode( inst->Opcode ), > dst, num_dst, > - src, num_src ); > + src, num_src, 0 ); > break; > > case OPCODE_XPD: > @@ -574,7 +574,7 @@ compile_instruction( > ureg_insn( ureg, > translate_opcode( inst->Opcode ), > dst, num_dst, > - src, num_src ); > + src, num_src, 0 ); > break; > > case OPCODE_RSQ: > @@ -593,7 +593,7 @@ compile_instruction( > ureg_insn( ureg, > translate_opcode( inst->Opcode ), > dst, num_dst, > - src, num_src ); > + src, num_src, 0); > break; > } > } > diff --git a/src/mesa/state_tracker/st_pbo.c b/src/mesa/state_tracker/st_pbo.c > index 303c8535b2..3dff1609e8 100644 > --- a/src/mesa/state_tracker/st_pbo.c > +++ b/src/mesa/state_tracker/st_pbo.c > @@ -528,7 +528,7 @@ create_fs(struct st_context *st, bool download, enum pipe_texture_target target, > op[0] = ureg_src(temp0); > op[1] = ureg_src(temp1); > ureg_memory_insn(ureg, TGSI_OPCODE_STORE, &out, 1, op, 2, 0, > - TGSI_TEXTURE_BUFFER, PIPE_FORMAT_NONE); > + TGSI_TEXTURE_BUFFER, PIPE_FORMAT_NONE, 0); > > ureg_release_temporary(ureg, temp1); > } else { >-- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte.
Nicolai Hähnle
2017-Jun-12 10:41 UTC
[Nouveau] [Mesa-dev] [RFC 3/9] st/glsl_to_tgsi: handle precise modifier
On 11.06.2017 20:42, Karol Herbst wrote:> all subexpression inside an ir_assignment needs to be tagged as precise. > > Signed-off-by: Karol Herbst <karolherbst at gmail.com> > --- > src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 80 ++++++++++++++++++++++++------ > 1 file changed, 65 insertions(+), 15 deletions(-) > > diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > index c5d2e0fcd2..19f90f21fe 100644 > --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > @@ -87,6 +87,13 @@ static int swizzle_for_type(const glsl_type *type, int component = 0) > return swizzle; > } > > +static unsigned is_precise(const ir_variable *ir) > +{ > + if (!ir) > + return 0; > + return ir->data.precise || ir->data.invariant; > +} > + > /** > * This struct is a corresponding struct to TGSI ureg_src. > */ > @@ -296,6 +303,7 @@ public: > ir_instruction *ir; > > unsigned op:8; /**< TGSI opcode */ > + unsigned precise:1; > unsigned saturate:1; > unsigned is_64bit_expanded:1; > unsigned sampler_base:5; > @@ -435,6 +443,7 @@ public: > bool have_fma; > bool use_shared_memory; > bool has_tex_txf_lz; > + unsigned precise; > > variable_storage *find_variable_storage(ir_variable *var); > > @@ -505,13 +514,29 @@ public: > st_src_reg src0 = undef_src, > st_src_reg src1 = undef_src, > st_src_reg src2 = undef_src, > - st_src_reg src3 = undef_src); > + st_src_reg src3 = undef_src, > + unsigned precise = 0); > > glsl_to_tgsi_instruction *emit_asm(ir_instruction *ir, unsigned op, > st_dst_reg dst, st_dst_reg dst1, > st_src_reg src0 = undef_src, > st_src_reg src1 = undef_src, > st_src_reg src2 = undef_src, > + st_src_reg src3 = undef_src, > + unsigned precise = 0); > + > + glsl_to_tgsi_instruction *emit_asm(ir_expression *ir, unsigned op, > + st_dst_reg dst = undef_dst, > + st_src_reg src0 = undef_src, > + st_src_reg src1 = undef_src, > + st_src_reg src2 = undef_src, > + st_src_reg src3 = undef_src); > + > + glsl_to_tgsi_instruction *emit_asm(ir_expression *ir, unsigned op, > + st_dst_reg dst, st_dst_reg dst1, > + st_src_reg src0 = undef_src, > + st_src_reg src1 = undef_src, > + st_src_reg src2 = undef_src, > st_src_reg src3 = undef_src);Yeah, I don't like those overloads and the way they force you to add artificial casts for disambiguation. I'd suggest to embrace the global precise flag: drop the precise parameter from emit_asm, and just source the bit from this->precise. Please make precise a bool, and add a comment explaining that it's a flag for whether the currently evaluated expression should be precise. Cheers, Nicolai> unsigned get_opcode(unsigned op, > @@ -650,7 +675,8 @@ glsl_to_tgsi_instruction * > glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op, > st_dst_reg dst, st_dst_reg dst1, > st_src_reg src0, st_src_reg src1, > - st_src_reg src2, st_src_reg src3) > + st_src_reg src2, st_src_reg src3, > + unsigned precise) > { > glsl_to_tgsi_instruction *inst = new(mem_ctx) glsl_to_tgsi_instruction(); > int num_reladdr = 0, i, j; > @@ -691,6 +717,7 @@ glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op, > STATIC_ASSERT(TGSI_OPCODE_LAST <= 255); > > inst->op = op; > + inst->precise = precise; > inst->info = tgsi_get_opcode_info(op); > inst->dst[0] = dst; > inst->dst[1] = dst1; > @@ -881,9 +908,28 @@ glsl_to_tgsi_instruction * > glsl_to_tgsi_visitor::emit_asm(ir_instruction *ir, unsigned op, > st_dst_reg dst, > st_src_reg src0, st_src_reg src1, > + st_src_reg src2, st_src_reg src3, > + unsigned precise) > +{ > + return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3, precise); > +} > + > +glsl_to_tgsi_instruction * > +glsl_to_tgsi_visitor::emit_asm(ir_expression *ir, unsigned op, > + st_dst_reg dst, > + st_src_reg src0, st_src_reg src1, > + st_src_reg src2, st_src_reg src3) > +{ > + return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3, this->precise); > +} > + > +glsl_to_tgsi_instruction * > +glsl_to_tgsi_visitor::emit_asm(ir_expression *ir, unsigned op, > + st_dst_reg dst, st_dst_reg dst1, > + st_src_reg src0, st_src_reg src1, > st_src_reg src2, st_src_reg src3) > { > - return emit_asm(ir, op, dst, undef_dst, src0, src1, src2, src3); > + return emit_asm(ir, op, dst, dst1, src0, src1, src2, src3, this->precise); > } > > /** > @@ -1116,7 +1162,7 @@ glsl_to_tgsi_visitor::emit_arl(ir_instruction *ir, > if (dst.index >= this->num_address_regs) > this->num_address_regs = dst.index + 1; > > - emit_asm(NULL, op, dst, src0); > + emit_asm((ir_instruction *)NULL, op, dst, src0); > } > > int > @@ -1406,11 +1452,11 @@ glsl_to_tgsi_visitor::visit(ir_variable *ir) > void > glsl_to_tgsi_visitor::visit(ir_loop *ir) > { > - emit_asm(NULL, TGSI_OPCODE_BGNLOOP); > + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_BGNLOOP); > > visit_exec_list(&ir->body_instructions, this); > > - emit_asm(NULL, TGSI_OPCODE_ENDLOOP); > + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_ENDLOOP); > } > > void > @@ -1418,10 +1464,10 @@ glsl_to_tgsi_visitor::visit(ir_loop_jump *ir) > { > switch (ir->mode) { > case ir_loop_jump::jump_break: > - emit_asm(NULL, TGSI_OPCODE_BRK); > + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_BRK); > break; > case ir_loop_jump::jump_continue: > - emit_asm(NULL, TGSI_OPCODE_CONT); > + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_CONT); > break; > } > } > @@ -2703,7 +2749,7 @@ glsl_to_tgsi_visitor::visit(ir_dereference_variable *ir) > st_dst_reg dst = st_dst_reg(get_temp(var->type)); > st_src_reg src = st_src_reg(PROGRAM_OUTPUT, decl->mesa_index, > var->type, component, decl->array_id); > - emit_asm(NULL, TGSI_OPCODE_FBFETCH, dst, src); > + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_FBFETCH, dst, src); > entry = new(mem_ctx) variable_storage(var, dst.file, dst.index, > dst.array_id); > } else { > @@ -3148,7 +3194,10 @@ glsl_to_tgsi_visitor::visit(ir_assignment *ir) > st_dst_reg l; > st_src_reg r; > > + /* all generated instructions need to be flaged as precise */ > + this->precise = is_precise(ir->lhs->variable_referenced()); > ir->rhs->accept(this); > + this->precise = 0; > r = this->result; > > l = get_assignment_lhs(ir->lhs, this, &dst_component); > @@ -3233,7 +3282,8 @@ glsl_to_tgsi_visitor::visit(ir_assignment *ir) > */ > glsl_to_tgsi_instruction *inst, *new_inst; > inst = (glsl_to_tgsi_instruction *)this->instructions.get_tail(); > - new_inst = emit_asm(ir, inst->op, l, inst->src[0], inst->src[1], inst->src[2], inst->src[3]); > + new_inst = emit_asm(ir, inst->op, l, inst->src[0], inst->src[1], inst->src[2], inst->src[3], > + is_precise(ir->lhs->variable_referenced())); > new_inst->saturate = inst->saturate; > inst->dead_mask = inst->dst[0].writemask; > } else { > @@ -4072,16 +4122,16 @@ glsl_to_tgsi_visitor::calc_deref_offsets(ir_dereference *tail, > > deref_arr->array_index->accept(this); > if (*array_elements != 1) > - emit_asm(NULL, TGSI_OPCODE_MUL, temp_dst, this->result, st_src_reg_for_int(*array_elements)); > + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MUL, temp_dst, this->result, st_src_reg_for_int(*array_elements)); > else > - emit_asm(NULL, TGSI_OPCODE_MOV, temp_dst, this->result); > + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MOV, temp_dst, this->result); > > if (indirect->file == PROGRAM_UNDEFINED) > *indirect = temp_reg; > else { > temp_dst = st_dst_reg(*indirect); > temp_dst.writemask = 1; > - emit_asm(NULL, TGSI_OPCODE_ADD, temp_dst, *indirect, temp_reg); > + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_ADD, temp_dst, *indirect, temp_reg); > } > } else > *index += array_index->value.u[0] * *array_elements; > @@ -4141,7 +4191,7 @@ glsl_to_tgsi_visitor::canonicalize_gather_offset(st_src_reg offset) > st_src_reg tmp = get_temp(glsl_type::ivec2_type); > st_dst_reg tmp_dst = st_dst_reg(tmp); > tmp_dst.writemask = WRITEMASK_XY; > - emit_asm(NULL, TGSI_OPCODE_MOV, tmp_dst, offset); > + emit_asm((ir_instruction *)NULL, TGSI_OPCODE_MOV, tmp_dst, offset); > return tmp; > } > > @@ -6777,7 +6827,7 @@ get_mesa_program_tgsi(struct gl_context *ctx, > v->renumber_registers(); > > /* Write the END instruction. */ > - v->emit_asm(NULL, TGSI_OPCODE_END); > + v->emit_asm((ir_instruction *)NULL, TGSI_OPCODE_END); > > if (ctx->_Shader->Flags & GLSL_DUMP) { > _mesa_log("\n"); >-- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte.
Nicolai Hähnle
2017-Jun-12 10:42 UTC
[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
On 11.06.2017 20:42, Karol Herbst wrote:> Running Tomb Raider on Nouveau I found some flicker caused by ignoring precise > modifiers on variables inside Nouveau.>> This series add precise/invariant handling to TGSI, which can be then used by > drivers to disable certain unsafe optimisations which may otherwise alter > calculations, which depend on having the same result across shaders.It's kind of amazing that we got this far without doing this. On the radeonsi side, it's probably related to how conservative LLVM is. But this series is a good idea, since it might allow us to become more aggressive with optimizations in radeonsi as well.> This series fixes this bug in Tomb Raider and one CTS test for 4.4 and 4.5 > > Note on Patch 3: I really dislike how I tell glsl_to_tgsi_visitor to apply the > precise flag on instruction emited in ir_assignment->rhs->accept(); but I found > no other easy way to handle this. Maybe somebody of you has a better idea?Sent a suggestion, as well as comments on patches 4 & 5. Patches 1 & 2: Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>> > Karol Herbst (9): > tgsi: add precise flag to tgsi_instruction > tgsi/dump: print _PRECISE modifier on Instrutions > st/glsl_to_tgsi: handle precise modifier > tgsi: populate precise > tgsi/text: parse _PRECISE modifier > nv50/ir: add precise field to Instruction > nv50/ir/tgsi: handle precise for most ALU instructions > nv50/ir: disable mul+add to mad for precise instructions > nv50/ir/tgsi: split mad to mul+add > > src/gallium/auxiliary/tgsi/tgsi_build.c | 4 + > src/gallium/auxiliary/tgsi/tgsi_dump.c | 4 + > src/gallium/auxiliary/tgsi/tgsi_text.c | 15 +++- > src/gallium/auxiliary/tgsi/tgsi_ureg.c | 14 +++- > src/gallium/auxiliary/tgsi/tgsi_ureg.h | 20 ++++- > src/gallium/auxiliary/util/u_simple_shaders.c | 2 +- > src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 + > .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 16 ++++ > .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +- > src/gallium/include/pipe/p_shader_tokens.h | 3 +- > src/gallium/state_trackers/nine/nine_shader.c | 6 +- > src/mesa/state_tracker/st_atifs_to_tgsi.c | 38 ++++----- > src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 92 +++++++++++++++++----- > src/mesa/state_tracker/st_mesa_to_tgsi.c | 8 +- > src/mesa/state_tracker/st_pbo.c | 2 +- > 15 files changed, 172 insertions(+), 59 deletions(-) >-- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte.
Roland Scheidegger
2017-Jun-12 23:57 UTC
[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
This looks like the right idea to me too. It may sound a bit weird to do that per instruction, but d3d11 does that as well. (Some d3d versions just have a global flag basically forbidding or allowing any such fast math optimizations in the assembly, but I'm not actually sure everybody honors that without tesselation...) For 1/9: Reviewed-by: Roland Scheidegger <sroland at vmware.com> 2/9 has a typo in the commit short log ("Instrutions"). FWIW surely on nv50 you could keep a single mad instruction for umad (sad maybe too?). (I'm actually wondering if the hw really can't do unfused float multiply+add as a single instruction but I know next to nothing about nvidia hw...) Roland Am 12.06.2017 um 12:42 schrieb Nicolai Hähnle:> On 11.06.2017 20:42, Karol Herbst wrote: >> Running Tomb Raider on Nouveau I found some flicker caused by ignoring >> precise >> modifiers on variables inside Nouveau. >> >> This series add precise/invariant handling to TGSI, which can be then >> used by >> drivers to disable certain unsafe optimisations which may otherwise alter >> calculations, which depend on having the same result across shaders. > > It's kind of amazing that we got this far without doing this. On the > radeonsi side, it's probably related to how conservative LLVM is. > > But this series is a good idea, since it might allow us to become more > aggressive with optimizations in radeonsi as well. > > >> This series fixes this bug in Tomb Raider and one CTS test for 4.4 and >> 4.5 >> >> Note on Patch 3: I really dislike how I tell glsl_to_tgsi_visitor to >> apply the >> precise flag on instruction emited in ir_assignment->rhs->accept(); >> but I found >> no other easy way to handle this. Maybe somebody of you has a better >> idea? > > Sent a suggestion, as well as comments on patches 4 & 5. Patches 1 & 2: > > Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com> > > >> >> Karol Herbst (9): >> tgsi: add precise flag to tgsi_instruction >> tgsi/dump: print _PRECISE modifier on Instrutions >> st/glsl_to_tgsi: handle precise modifier >> tgsi: populate precise >> tgsi/text: parse _PRECISE modifier >> nv50/ir: add precise field to Instruction >> nv50/ir/tgsi: handle precise for most ALU instructions >> nv50/ir: disable mul+add to mad for precise instructions >> nv50/ir/tgsi: split mad to mul+add >> >> src/gallium/auxiliary/tgsi/tgsi_build.c | 4 + >> src/gallium/auxiliary/tgsi/tgsi_dump.c | 4 + >> src/gallium/auxiliary/tgsi/tgsi_text.c | 15 +++- >> src/gallium/auxiliary/tgsi/tgsi_ureg.c | 14 +++- >> src/gallium/auxiliary/tgsi/tgsi_ureg.h | 20 ++++- >> src/gallium/auxiliary/util/u_simple_shaders.c | 2 +- >> src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 + >> .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 16 ++++ >> .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +- >> src/gallium/include/pipe/p_shader_tokens.h | 3 +- >> src/gallium/state_trackers/nine/nine_shader.c | 6 +- >> src/mesa/state_tracker/st_atifs_to_tgsi.c | 38 ++++----- >> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 92 >> +++++++++++++++++----- >> src/mesa/state_tracker/st_mesa_to_tgsi.c | 8 +- >> src/mesa/state_tracker/st_pbo.c | 2 +- >> 15 files changed, 172 insertions(+), 59 deletions(-) >> > >
Possibly Parallel Threads
- [RFC 0/9] Add precise/invariant semantics to TGSI
- [PATCH] Autogenerate uureg opcode macros
- [PATCH mesa 2/3] tgsi: Add support for global / local / input MEMORY
- [PATCH mesa v2 2/3] tgsi: Add support for global / private / input MEMORY
- [Mesa-dev] [PATCH mesa 2/3] tgsi: Add support for global / local / input MEMORY