Roland Scheidegger
2017-Jun-12 23:57 UTC
[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
This looks like the right idea to me too. It may sound a bit weird to do that per instruction, but d3d11 does that as well. (Some d3d versions just have a global flag basically forbidding or allowing any such fast math optimizations in the assembly, but I'm not actually sure everybody honors that without tesselation...) For 1/9: Reviewed-by: Roland Scheidegger <sroland at vmware.com> 2/9 has a typo in the commit short log ("Instrutions"). FWIW surely on nv50 you could keep a single mad instruction for umad (sad maybe too?). (I'm actually wondering if the hw really can't do unfused float multiply+add as a single instruction but I know next to nothing about nvidia hw...) Roland Am 12.06.2017 um 12:42 schrieb Nicolai Hähnle:> On 11.06.2017 20:42, Karol Herbst wrote: >> Running Tomb Raider on Nouveau I found some flicker caused by ignoring >> precise >> modifiers on variables inside Nouveau. >> >> This series add precise/invariant handling to TGSI, which can be then >> used by >> drivers to disable certain unsafe optimisations which may otherwise alter >> calculations, which depend on having the same result across shaders. > > It's kind of amazing that we got this far without doing this. On the > radeonsi side, it's probably related to how conservative LLVM is. > > But this series is a good idea, since it might allow us to become more > aggressive with optimizations in radeonsi as well. > > >> This series fixes this bug in Tomb Raider and one CTS test for 4.4 and >> 4.5 >> >> Note on Patch 3: I really dislike how I tell glsl_to_tgsi_visitor to >> apply the >> precise flag on instruction emited in ir_assignment->rhs->accept(); >> but I found >> no other easy way to handle this. Maybe somebody of you has a better >> idea? > > Sent a suggestion, as well as comments on patches 4 & 5. Patches 1 & 2: > > Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com> > > >> >> Karol Herbst (9): >> tgsi: add precise flag to tgsi_instruction >> tgsi/dump: print _PRECISE modifier on Instrutions >> st/glsl_to_tgsi: handle precise modifier >> tgsi: populate precise >> tgsi/text: parse _PRECISE modifier >> nv50/ir: add precise field to Instruction >> nv50/ir/tgsi: handle precise for most ALU instructions >> nv50/ir: disable mul+add to mad for precise instructions >> nv50/ir/tgsi: split mad to mul+add >> >> src/gallium/auxiliary/tgsi/tgsi_build.c | 4 + >> src/gallium/auxiliary/tgsi/tgsi_dump.c | 4 + >> src/gallium/auxiliary/tgsi/tgsi_text.c | 15 +++- >> src/gallium/auxiliary/tgsi/tgsi_ureg.c | 14 +++- >> src/gallium/auxiliary/tgsi/tgsi_ureg.h | 20 ++++- >> src/gallium/auxiliary/util/u_simple_shaders.c | 2 +- >> src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 + >> .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 16 ++++ >> .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +- >> src/gallium/include/pipe/p_shader_tokens.h | 3 +- >> src/gallium/state_trackers/nine/nine_shader.c | 6 +- >> src/mesa/state_tracker/st_atifs_to_tgsi.c | 38 ++++----- >> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 92 >> +++++++++++++++++----- >> src/mesa/state_tracker/st_mesa_to_tgsi.c | 8 +- >> src/mesa/state_tracker/st_pbo.c | 2 +- >> 15 files changed, 172 insertions(+), 59 deletions(-) >> > >
Roland Scheidegger
2017-Jun-13 00:01 UTC
[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
Am 13.06.2017 um 01:57 schrieb Roland Scheidegger:> This looks like the right idea to me too. It may sound a bit weird to do > that per instruction, but d3d11 does that as well. (Some d3d versions > just have a global flag basically forbidding or allowing any such fast > math optimizations in the assembly, but I'm not actually sure everybody > honors that without tesselation...) > > For 1/9: > Reviewed-by: Roland Scheidegger <sroland at vmware.com>I forgot to mention, could you add some bits in gallium docs (source/tgsi.rst) for this? Not sure where maybe under Modifiers or some such. Roland> > 2/9 has a typo in the commit short log ("Instrutions"). > > FWIW surely on nv50 you could keep a single mad instruction for umad > (sad maybe too?). (I'm actually wondering if the hw really can't do > unfused float multiply+add as a single instruction but I know next to > nothing about nvidia hw...) > > Roland > > Am 12.06.2017 um 12:42 schrieb Nicolai Hähnle: >> On 11.06.2017 20:42, Karol Herbst wrote: >>> Running Tomb Raider on Nouveau I found some flicker caused by ignoring >>> precise >>> modifiers on variables inside Nouveau. >>> >>> This series add precise/invariant handling to TGSI, which can be then >>> used by >>> drivers to disable certain unsafe optimisations which may otherwise alter >>> calculations, which depend on having the same result across shaders. >> >> It's kind of amazing that we got this far without doing this. On the >> radeonsi side, it's probably related to how conservative LLVM is. >> >> But this series is a good idea, since it might allow us to become more >> aggressive with optimizations in radeonsi as well. >> >> >>> This series fixes this bug in Tomb Raider and one CTS test for 4.4 and >>> 4.5 >>> >>> Note on Patch 3: I really dislike how I tell glsl_to_tgsi_visitor to >>> apply the >>> precise flag on instruction emited in ir_assignment->rhs->accept(); >>> but I found >>> no other easy way to handle this. Maybe somebody of you has a better >>> idea? >> >> Sent a suggestion, as well as comments on patches 4 & 5. Patches 1 & 2: >> >> Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com> >> >> >>> >>> Karol Herbst (9): >>> tgsi: add precise flag to tgsi_instruction >>> tgsi/dump: print _PRECISE modifier on Instrutions >>> st/glsl_to_tgsi: handle precise modifier >>> tgsi: populate precise >>> tgsi/text: parse _PRECISE modifier >>> nv50/ir: add precise field to Instruction >>> nv50/ir/tgsi: handle precise for most ALU instructions >>> nv50/ir: disable mul+add to mad for precise instructions >>> nv50/ir/tgsi: split mad to mul+add >>> >>> src/gallium/auxiliary/tgsi/tgsi_build.c | 4 + >>> src/gallium/auxiliary/tgsi/tgsi_dump.c | 4 + >>> src/gallium/auxiliary/tgsi/tgsi_text.c | 15 +++- >>> src/gallium/auxiliary/tgsi/tgsi_ureg.c | 14 +++- >>> src/gallium/auxiliary/tgsi/tgsi_ureg.h | 20 ++++- >>> src/gallium/auxiliary/util/u_simple_shaders.c | 2 +- >>> src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 + >>> .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 16 ++++ >>> .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +- >>> src/gallium/include/pipe/p_shader_tokens.h | 3 +- >>> src/gallium/state_trackers/nine/nine_shader.c | 6 +- >>> src/mesa/state_tracker/st_atifs_to_tgsi.c | 38 ++++----- >>> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 92 >>> +++++++++++++++++----- >>> src/mesa/state_tracker/st_mesa_to_tgsi.c | 8 +- >>> src/mesa/state_tracker/st_pbo.c | 2 +- >>> 15 files changed, 172 insertions(+), 59 deletions(-) >>> >> >> >
Ilia Mirkin
2017-Jun-13 00:05 UTC
[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland at vmware.com> wrote:> FWIW surely on nv50 you could keep a single mad instruction for umad > (sad maybe too?). (I'm actually wondering if the hw really can't do > unfused float multiply+add as a single instruction but I know next to > nothing about nvidia hw...)The compiler should reassociate a mul + add into a mad where possible. In actuality, IMAD is actually super-slow... allegedly slower than IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is faster but we haven't figured out how to operate it yet. I'm not aware of a muladd version of fma on fermi and newer (GL 4.0). The tesla series does have a floating point mul+add (but no fma).
Roland Scheidegger
2017-Jun-13 00:33 UTC
[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
Am 13.06.2017 um 02:05 schrieb Ilia Mirkin:> On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland at vmware.com> wrote: >> FWIW surely on nv50 you could keep a single mad instruction for umad >> (sad maybe too?). (I'm actually wondering if the hw really can't do >> unfused float multiply+add as a single instruction but I know next to >> nothing about nvidia hw...) > > The compiler should reassociate a mul + add into a mad where possible. > In actuality, IMAD is actually super-slow... allegedly slower than > IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is > faster but we haven't figured out how to operate it yet. I'm not aware > of a muladd version of fma on fermi and newer (GL 4.0). The tesla > series does have a floating point mul+add (but no fma). >Interesting. radeons seem to always have a unfused mad. pre-gcn parts apparently only have a 32bit fma with parts supporting double precision. The same restriction is stated for gcn parts in the isa docs, which obviously doesn't make sense, but I have no idea if the fma is full speed... Roland
Maybe Matching Threads
- [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
- [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
- [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
- [RFC 0/9] Add precise/invariant semantics to TGSI
- [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI