thr3ads.net - search: "xmad"

Displaying 3 results from an estimated 3 matches for "xmad".

Did you mean: mad

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 13

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

...loat multiply+add as a single instruction but I know next to >> nothing about nvidia hw...) > > The compiler should reassociate a mul + add into a mad where possible. > In actuality, IMAD is actually super-slow... allegedly slower than > IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is > faster but we haven't figured out how to operate it yet. I'm not aware > of a muladd version of fma on fermi and newer (GL 4.0). The tesla > series does have a floating point mul+add (but no fma). > Interesting. radeons seem to always have a unfused mad. p...

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 12

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

This looks like the right idea to me too. It may sound a bit weird to do that per instruction, but d3d11 does that as well. (Some d3d versions just have a global flag basically forbidding or allowing any such fast math optimizations in the assembly, but I'm not actually sure everybody honors that without tesselation...) For 1/9: Reviewed-by: Roland Scheidegger <sroland at vmware.com>

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 13

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

...'t do > unfused float multiply+add as a single instruction but I know next to > nothing about nvidia hw...) The compiler should reassociate a mul + add into a mad where possible. In actuality, IMAD is actually super-slow... allegedly slower than IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is faster but we haven't figured out how to operate it yet. I'm not aware of a muladd version of fma on fermi and newer (GL 4.0). The tesla series does have a floating point mul+add (but no fma).

search for: xmad