thr3ads.net - Nouveau - [Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Roland Scheidegger

2017-Jun-12 23:57 UTC

[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

This looks like the right idea to me too. It may sound a bit weird to do
that per instruction, but d3d11 does that as well. (Some d3d versions
just have a global flag basically forbidding or allowing any such fast
math optimizations in the assembly, but I'm not actually sure everybody
honors that without tesselation...)

For 1/9:
Reviewed-by: Roland Scheidegger <sroland at vmware.com>

2/9 has a typo in the commit short log ("Instrutions").

FWIW surely on nv50 you could keep a single mad instruction for umad
(sad maybe too?). (I'm actually wondering if the hw really can't do
unfused float multiply+add as a single instruction but I know next to
nothing about nvidia hw...)

Roland

Am 12.06.2017 um 12:42 schrieb Nicolai Hähnle:> On 11.06.2017 20:42, Karol Herbst wrote:
>> Running Tomb Raider on Nouveau I found some flicker caused by ignoring
>> precise
>> modifiers on variables inside Nouveau.
>>
>> This series add precise/invariant handling to TGSI, which can be then
>> used by
>> drivers to disable certain unsafe optimisations which may otherwise
alter
>> calculations, which depend on having the same result across shaders.
> 
> It's kind of amazing that we got this far without doing this. On the
> radeonsi side, it's probably related to how conservative LLVM is.
> 
> But this series is a good idea, since it might allow us to become more
> aggressive with optimizations in radeonsi as well.
> 
> 
>> This series fixes this bug in Tomb Raider and one CTS test for 4.4 and
>> 4.5
>>
>> Note on Patch 3: I really dislike how I tell glsl_to_tgsi_visitor to
>> apply the
>> precise flag on instruction emited in
ir_assignment->rhs->accept();
>> but I found
>> no other easy way to handle this. Maybe somebody of you has a better
>> idea?
> 
> Sent a suggestion, as well as comments on patches 4 & 5. Patches 1
& 2:
> 
> Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
> 
> 
>>
>> Karol Herbst (9):
>>    tgsi: add precise flag to tgsi_instruction
>>    tgsi/dump: print _PRECISE modifier on Instrutions
>>    st/glsl_to_tgsi: handle precise modifier
>>    tgsi: populate precise
>>    tgsi/text: parse _PRECISE modifier
>>    nv50/ir: add precise field to Instruction
>>    nv50/ir/tgsi: handle precise for most ALU instructions
>>    nv50/ir: disable mul+add to mad for precise instructions
>>    nv50/ir/tgsi: split mad to mul+add
>>
>>   src/gallium/auxiliary/tgsi/tgsi_build.c            |  4 +
>>   src/gallium/auxiliary/tgsi/tgsi_dump.c             |  4 +
>>   src/gallium/auxiliary/tgsi/tgsi_text.c             | 15 +++-
>>   src/gallium/auxiliary/tgsi/tgsi_ureg.c             | 14 +++-
>>   src/gallium/auxiliary/tgsi/tgsi_ureg.h             | 20 ++++-
>>   src/gallium/auxiliary/util/u_simple_shaders.c      |  2 +-
>>   src/gallium/drivers/nouveau/codegen/nv50_ir.h      |  1 +
>>   .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 16 ++++
>>   .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   |  6 +-
>>   src/gallium/include/pipe/p_shader_tokens.h         |  3 +-
>>   src/gallium/state_trackers/nine/nine_shader.c      |  6 +-
>>   src/mesa/state_tracker/st_atifs_to_tgsi.c          | 38 ++++-----
>>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp         | 92
>> +++++++++++++++++-----
>>   src/mesa/state_tracker/st_mesa_to_tgsi.c           |  8 +-
>>   src/mesa/state_tracker/st_pbo.c                    |  2 +-
>>   15 files changed, 172 insertions(+), 59 deletions(-)
>>
> 
>

Roland Scheidegger

2017-Jun-13 00:01 UTC

head link

[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

Am 13.06.2017 um 01:57 schrieb Roland Scheidegger:> This looks like the right idea to me too. It may sound a bit weird to do
> that per instruction, but d3d11 does that as well. (Some d3d versions
> just have a global flag basically forbidding or allowing any such fast
> math optimizations in the assembly, but I'm not actually sure everybody
> honors that without tesselation...)
> 
> For 1/9:
> Reviewed-by: Roland Scheidegger <sroland at vmware.com>
I forgot to mention, could you add some bits in gallium docs
(source/tgsi.rst) for this? Not sure where maybe under Modifiers or some
such.

Roland
> 
> 2/9 has a typo in the commit short log ("Instrutions").
> 
> FWIW surely on nv50 you could keep a single mad instruction for umad
> (sad maybe too?). (I'm actually wondering if the hw really can't do
> unfused float multiply+add as a single instruction but I know next to
> nothing about nvidia hw...)
> 
> Roland
> 
> Am 12.06.2017 um 12:42 schrieb Nicolai Hähnle:
>> On 11.06.2017 20:42, Karol Herbst wrote:
>>> Running Tomb Raider on Nouveau I found some flicker caused by
ignoring
>>> precise
>>> modifiers on variables inside Nouveau.
>>>
>>> This series add precise/invariant handling to TGSI, which can be
then
>>> used by
>>> drivers to disable certain unsafe optimisations which may otherwise
alter
>>> calculations, which depend on having the same result across
shaders.
>>
>> It's kind of amazing that we got this far without doing this. On
the
>> radeonsi side, it's probably related to how conservative LLVM is.
>>
>> But this series is a good idea, since it might allow us to become more
>> aggressive with optimizations in radeonsi as well.
>>
>>
>>> This series fixes this bug in Tomb Raider and one CTS test for 4.4
and
>>> 4.5
>>>
>>> Note on Patch 3: I really dislike how I tell glsl_to_tgsi_visitor
to
>>> apply the
>>> precise flag on instruction emited in
ir_assignment->rhs->accept();
>>> but I found
>>> no other easy way to handle this. Maybe somebody of you has a
better
>>> idea?
>>
>> Sent a suggestion, as well as comments on patches 4 & 5. Patches 1
& 2:
>>
>> Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
>>
>>
>>>
>>> Karol Herbst (9):
>>>    tgsi: add precise flag to tgsi_instruction
>>>    tgsi/dump: print _PRECISE modifier on Instrutions
>>>    st/glsl_to_tgsi: handle precise modifier
>>>    tgsi: populate precise
>>>    tgsi/text: parse _PRECISE modifier
>>>    nv50/ir: add precise field to Instruction
>>>    nv50/ir/tgsi: handle precise for most ALU instructions
>>>    nv50/ir: disable mul+add to mad for precise instructions
>>>    nv50/ir/tgsi: split mad to mul+add
>>>
>>>   src/gallium/auxiliary/tgsi/tgsi_build.c            |  4 +
>>>   src/gallium/auxiliary/tgsi/tgsi_dump.c             |  4 +
>>>   src/gallium/auxiliary/tgsi/tgsi_text.c             | 15 +++-
>>>   src/gallium/auxiliary/tgsi/tgsi_ureg.c             | 14 +++-
>>>   src/gallium/auxiliary/tgsi/tgsi_ureg.h             | 20 ++++-
>>>   src/gallium/auxiliary/util/u_simple_shaders.c      |  2 +-
>>>   src/gallium/drivers/nouveau/codegen/nv50_ir.h      |  1 +
>>>   .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 16 ++++
>>>   .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   |  6 +-
>>>   src/gallium/include/pipe/p_shader_tokens.h         |  3 +-
>>>   src/gallium/state_trackers/nine/nine_shader.c      |  6 +-
>>>   src/mesa/state_tracker/st_atifs_to_tgsi.c          | 38 ++++-----
>>>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp         | 92
>>> +++++++++++++++++-----
>>>   src/mesa/state_tracker/st_mesa_to_tgsi.c           |  8 +-
>>>   src/mesa/state_tracker/st_pbo.c                    |  2 +-
>>>   15 files changed, 172 insertions(+), 59 deletions(-)
>>>
>>
>>
>

Ilia Mirkin

2017-Jun-13 00:05 UTC

head link

[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland at
vmware.com> wrote:> FWIW surely on nv50 you could keep a single mad instruction for umad
> (sad maybe too?). (I'm actually wondering if the hw really can't do
> unfused float multiply+add as a single instruction but I know next to
> nothing about nvidia hw...)
The compiler should reassociate a mul + add into a mad where possible.
In actuality, IMAD is actually super-slow... allegedly slower than
IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is
faster but we haven't figured out how to operate it yet. I'm not aware
of a muladd version of fma on fermi and newer (GL 4.0). The tesla
series does have a floating point mul+add (but no fma).

Roland Scheidegger

2017-Jun-13 00:33 UTC

head link

[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

Am 13.06.2017 um 02:05 schrieb Ilia Mirkin:> On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland at
vmware.com> wrote:
>> FWIW surely on nv50 you could keep a single mad instruction for umad
>> (sad maybe too?). (I'm actually wondering if the hw really
can't do
>> unfused float multiply+add as a single instruction but I know next to
>> nothing about nvidia hw...)
> 
> The compiler should reassociate a mul + add into a mad where possible.
> In actuality, IMAD is actually super-slow... allegedly slower than
> IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is
> faster but we haven't figured out how to operate it yet. I'm not
aware
> of a muladd version of fma on fermi and newer (GL 4.0). The tesla
> series does have a floating point mul+add (but no fma).
> 
Interesting. radeons seem to always have a unfused mad. pre-gcn parts
apparently only have a 32bit fma with parts supporting double precision.
The same restriction is stated for gcn parts in the isa docs, which
obviously doesn't make sense, but I have no idea if the fma is full speed...

Roland

Seemingly Similar Threads

Search for more apparently analagous threads

Nouveau - Jun 2017 - [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

[Nouveau] [Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

Seemingly Similar Threads