On Fri, Sep 12, 2014 at 3:04 PM, Owen Anderson <resistor at mac.com> wrote:> > On Sep 12, 2014, at 2:24 PM, Owen Anderson <resistor at mac.com> wrote: > > > On Sep 12, 2014, at 10:27 AM, Dan Gohman <dan433584 at gmail.com> wrote: > > >> More generally, I don’t see a compelling reason for LLVM to add intrinsic >> support for the version you’re proposing. Your choice can easily be >> expanded into IR, and does not have the wide hardware support (particularly >> in GPUs) that the IEEE version does. >> > > The IEEE version can also be expanded in LLVM IR. And for GPUs, many GPU > input languages leave the behavior on NaN unspecified, so it's not > obviously the best guide. > > > That’s not generally true. HLSL (DirectX), CUDA, OpenCL, and Metal all > have defined semantics for NaNs which include not propagating them through > min/max. GLSL (OpenGL) is the odd one out in this area. > >HLSL leaves it undefined: http://msdn.microsoft.com/en-us/library/windows/desktop/bb509624%28v=vs.85%29.aspx I guess Metal and others only have a "fast-math" flag which (among other things) makes behavior on NaN undefined, but it's my impression that it's a popular flag.> Also, as a practical issues, many GPUs have ISA-level support for the > IEEE-conforming version. Some (all?) of the AMD GPUs that Matt cares about > support it, and PTX has native operations for it as well. The IR expansion > of an IEEE-conforming fmin/fmax is at least three compares + selects, which > makes it very difficult to pattern match for these targets. >It's 2 compares + selects: float nan_swallowing_fmin(float a, float b) { return b != b ? a : (a < b ? a : b); } which is within the realm of pattern-matching.> > The inverse form (always propagating NaNs) is not widely natively > supported. >> I think AArch64 *might* have it? >It does. In fact, even armv7 has a NaN-propagating min/max: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489i/CIHDEEBE.html -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140912/932d64f4/attachment.html>
On Sep 12, 2014, at 5:39 PM, Dan Gohman <dan433584 at gmail.com> wrote:> > > On Fri, Sep 12, 2014 at 3:04 PM, Owen Anderson <resistor at mac.com> wrote: > >> On Sep 12, 2014, at 2:24 PM, Owen Anderson <resistor at mac.com> wrote: >> >> >>> On Sep 12, 2014, at 10:27 AM, Dan Gohman <dan433584 at gmail.com> wrote: >>> >>> >>> More generally, I don’t see a compelling reason for LLVM to add intrinsic support for the version you’re proposing. Your choice can easily be expanded into IR, and does not have the wide hardware support (particularly in GPUs) that the IEEE version does. >>> >>> The IEEE version can also be expanded in LLVM IR. And for GPUs, many GPU input languages leave the behavior on NaN unspecified, so it's not obviously the best guide. >> >> That’s not generally true. HLSL (DirectX), CUDA, OpenCL, and Metal all have defined semantics for NaNs which include not propagating them through min/max. GLSL (OpenGL) is the odd one out in this area. > > > HLSL leaves it undefined: > > http://msdn.microsoft.com/en-us/library/windows/desktop/bb509624%28v=vs.85%29.aspxNot exactly. The HLSL language leaves it undefined, but HLSL bytecode specifies that it’s not NaN-propagating: http://msdn.microsoft.com/en-us/library/windows/desktop/hh447185(v=vs.85).aspx And I happen to know from experience that a lot of graphics shaders depend on it working that way in practice. —Owen -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140912/fb3a1f2c/attachment.html>
Given IEEE-754's sway, and its saying what it does on this point, but given also the popularity of NaN-propagating min and max, how about a compromise? We add intrinsics following the IEEE-754 semantics, but we also follow IEEE-754 (and ARMv8) in renaming them to minnum and maxnum, to clarify which interpretation these intrinsics are using. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140915/ef59050f/attachment.html>
Possibly Parallel Threads
- [LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics
- [LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics
- RFC: What is the real behavior for the minnum/maxnum intrinsics?
- [LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics
- [LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics