Arsenault, Matthew via llvm-dev
2018-Jul-23 10:56 UTC
[llvm-dev] RFC: What is the real behavior for the minnum/maxnum intrinsics?
Hi, The specification for the llvm.minnum/llvm.maxnum intrinsics is too unclear right now to usefully optimize. There are two problems. First the expected behavior for signaling NaNs needs to be clarified. Second, whether the returned value is expected to be canonicalized (as if by llvm.canonicalize). Currently according to the LangRef: Follows the IEEE-754 semantics for minNum, which also match for libm's fmin. If either operand is a NaN, returns the other non-NaN operand. Returns NaN only if both operands are NaN. If the operands compare equal, returns a value that compares equal to both operands. This means that fmin(+/-0.0, +/-0.0) could return either -0.0 or 0.0. This first line is a lie. This isn’t true for the case of signaling NaNs. The IEEE rule is if either input is a signaling nan, it returns a quieted NaN, not the other operand. The C standard definition for fmin/fmax do not make this distinction, and just return the other operand. The constant folding for these currently match the libm behavior, returning the non-NaN operand and will never quiet. The default lowering for these operations also just directly calls the system’s fmin/fmax. Additionally, the IEEE standard specifies that minNum/maxNum return the “canonicalized” value. If the returned value is a NaN, my understanding of this is that the payload bits of the NaN are all 0 even if this was not the case for the input NaNs. This also contradicts just returning the raw value of the other operand if one is a NaN as will happen in the implemented constant folding and libm lowering. On AMDGPU we select these to instructions that have either behavior, depending on a flag that can be considered part of the global floating point mode. We default to enabling the IEEE behavior, returning a quieted nan. In order to match the expected behavior of fmin/fmax in the OpenCL builtin library, we use llvm.canonicalize to quiet the incoming NaNs to the minnum/maxnum intrinsics. This is approximately tripling the number of instructions inside the inner loops of an important kernel, where min/max are feeding into each other. My goal is to eliminate the canonicalizes, since the output of min/max is supposed to be canonical. I don’t necessarily care about getting correct FP exception behavior, but I do need these to return the correct value for signaling nans loaded from memory. Since these intrinsics do have the IEEE name, I think they should probably be change to match the IEEE behavior. This would mean that optimizing llvm.canonicalize(llvm.minnum(x, y)) -> llvm.minnum(x, y) is a correct transformation. Target lowering would then be expected to insert quieting canonicalizes for the inputs to the libm fmin call. Do we need another pair of intrinsics matching the fmin/fmax behavior? TLDR: 1. What do these do for signaling NaNs? 2. If the target expects something to happen during llvm.canonicalize (e.g. flush denormals), can this be assumed to have been done by the implementation of llvm.minnum/maxnum? -Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180723/0e4aa0d4/attachment-0001.html>
Alex Bradbury via llvm-dev
2018-Jul-23 19:40 UTC
[llvm-dev] RFC: What is the real behavior for the minnum/maxnum intrinsics?
On 23 July 2018 at 11:56, Arsenault, Matthew via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hi, > > > The specification for the llvm.minnum/llvm.maxnum intrinsics is too unclear > right now to usefully optimize. There are two problems. First the expected > behavior for signaling NaNs needs to be clarified. Second, whether the > returned value is expected to be canonicalized (as if by llvm.canonicalize). > > Currently according to the LangRef: > > Follows the IEEE-754 semantics for minNum, which also match for libm's > fmin. > > If either operand is a NaN, returns the other non-NaN operand. Returns > NaN only if both operands are NaN. If the operands compare equal, > returns a value that compares equal to both operands. This means that > fmin(+/-0.0, +/-0.0) could return either -0.0 or 0.0. > > This first line is a lie. This isn’t true for the case of signaling NaNs. > The IEEE rule is if either input is a signaling nan, it returns a quieted > NaN, not the other operand. The C standard definition for fmin/fmax do not > make this distinction, and just return the other operand.Sadly I can't seem to find an actual copy of the draft of the IEEE 754-201x standard, but I understand that it introduces new min/max functions that better match the min/max actually implemented by hardware and used in languages <https://github.com/WebAssembly/design/issues/214>. I agree, the first sentence you quote doesn't seem correct. Perhaps someone more familiar with the IEEE 754 development can give more insight. Best, Alex
Stephen Canon via llvm-dev
2018-Jul-26 15:51 UTC
[llvm-dev] RFC: What is the real behavior for the minnum/maxnum intrinsics?
> On Jul 23, 2018, at 3:40 PM, Alex Bradbury via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On 23 July 2018 at 11:56, Arsenault, Matthew via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > >> Hi, >> >> >> The specification for the llvm.minnum/llvm.maxnum intrinsics is too unclear >> right now to usefully optimize. There are two problems. First the expected >> behavior for signaling NaNs needs to be clarified. Second, whether the >> returned value is expected to be canonicalized (as if by llvm.canonicalize). >> >> Currently according to the LangRef: >> >> Follows the IEEE-754 semantics for minNum, which also match for libm's >> fmin. >> >> If either operand is a NaN, returns the other non-NaN operand. Returns >> NaN only if both operands are NaN. If the operands compare equal, >> returns a value that compares equal to both operands. This means that >> fmin(+/-0.0, +/-0.0) could return either -0.0 or 0.0. >> >> This first line is a lie. This isn’t true for the case of signaling NaNs. >> The IEEE rule is if either input is a signaling nan, it returns a quieted >> NaN, not the other operand. The C standard definition for fmin/fmax do not >> make this distinction, and just return the other operand. > > Sadly I can't seem to find an actual copy of the draft of the IEEE > 754-201x standard, but I understand that it introduces new min/max > functions that better match the min/max actually implemented by > hardware and used in languages > <https://github.com/WebAssembly/design/issues/214 <https://github.com/WebAssembly/design/issues/214>>. I agree, the first > sentence you quote doesn't seem correct. Perhaps someone more familiar > with the IEEE 754 development can give more insight.It’s a bit more subtle. The minNum, maxNum, minNumMag, and maxNumMag operations have been removed from the normative clauses of IEEE 754 201x entirely, and replaced with a set of non-normative recommended operations. This is not a great situation, because it doesn’t provide much clarity for complier writers or CPU architects, but it at least removes the existing operations which were known to be critically flawed, due to them not being associative. The new non-normative operations of interest are defined in clause 9.6: 1. minimum / maximum: returns NaN if any input is NaN. 2. minimumNumber / maximumNumber: return NaN only if *every* input is NaN, even if signaling NaNs are present. Otherwise returns the numerical min / max **ordering –0 before +0**. The behavior of ordering zeros is a critical distinction from the behavior of the old operations, possibly more critical than the signaling NaN behavior. I can’t in good faith recommend jumping to implement these semantics, because they are non-normative. They make sense, but there are other definitions that make sense as well, so there’s no guarantee that IEEE 754 won’t tweak them before adopting them in a normative clause in the next revision. I can’t really argue against adopting them either, because they do make perfect sense. Some notes on how these definitions align with existing architectures of interest: ARMv8: 1. FMIN / FMAX implement the new minimum / maximum exactly. 2. FMINNM / FMAXNM implement minimumNumber / maximumNumber if we can prove no sNaNs are present. If sNaN may be present, we need to canonicalize each argument first. X86: 1. AFAIK there’s no trivial instruction for minimum / maximum, because MINxx / MAXxx return the second argument if either is NaN. So this will look like a compare + min/max + select, I think. 2. The new AVX-512 VRANGExx can be used to implement minimumNumber / maximumNumber if we can prove no sNaNs are present. If sNaN may be present, we need to canonicalize each argument first. Pre-AVX-512, this is also compare + min/max + select. Someone else will need to provide details for other arches. – Steve -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180726/31ca1ce2/attachment.html>
Possibly Parallel Threads
- RFC: What is the real behavior for the minnum/maxnum intrinsics?
- [LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics
- [LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics
- [LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics
- [LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics