Displaying 3 results from an estimated 3 matches for "xmad".
Did you mean:
mad
2017 Jun 13
1
[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
...loat multiply+add as a single instruction but I know next to
>> nothing about nvidia hw...)
>
> The compiler should reassociate a mul + add into a mad where possible.
> In actuality, IMAD is actually super-slow... allegedly slower than
> IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is
> faster but we haven't figured out how to operate it yet. I'm not aware
> of a muladd version of fma on fermi and newer (GL 4.0). The tesla
> series does have a floating point mul+add (but no fma).
>
Interesting. radeons seem to always have a unfused mad. p...
2017 Jun 12
3
[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
This looks like the right idea to me too. It may sound a bit weird to do
that per instruction, but d3d11 does that as well. (Some d3d versions
just have a global flag basically forbidding or allowing any such fast
math optimizations in the assembly, but I'm not actually sure everybody
honors that without tesselation...)
For 1/9:
Reviewed-by: Roland Scheidegger <sroland at vmware.com>
2017 Jun 13
0
[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
...'t do
> unfused float multiply+add as a single instruction but I know next to
> nothing about nvidia hw...)
The compiler should reassociate a mul + add into a mad where possible.
In actuality, IMAD is actually super-slow... allegedly slower than
IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is
faster but we haven't figured out how to operate it yet. I'm not aware
of a muladd version of fma on fermi and newer (GL 4.0). The tesla
series does have a floating point mul+add (but no fma).