thr3ads.net - llvm dev - [LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Owen Anderson

2014-Sep-12 21:24 UTC

[LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics

> On Sep 12, 2014, at 10:27 AM, Dan Gohman <dan433584 at gmail.com>
wrote:
> 
> 
> More generally, I don’t see a compelling reason for LLVM to add intrinsic
support for the version you’re proposing.  Your choice can easily be expanded
into IR, and does not have the wide hardware support (particularly in GPUs) that
the IEEE version does.
> 
> The IEEE version can also be expanded in LLVM IR. And for GPUs, many GPU
input languages leave the behavior on NaN unspecified, so it's not obviously
the best guide.
That’s not generally true.  HLSL (DirectX), CUDA, OpenCL, and Metal all have
defined semantics for NaNs which include not propagating them through min/max. 
GLSL (OpenGL) is the odd one out in this area.

—Owen
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140912/a306f3f6/attachment.html>

Owen Anderson

2014-Sep-12 22:04 UTC

head link

[LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics

> On Sep 12, 2014, at 2:24 PM, Owen Anderson <resistor at mac.com>
wrote:
> 
> 
>> On Sep 12, 2014, at 10:27 AM, Dan Gohman <dan433584 at gmail.com
<mailto:dan433584 at gmail.com>> wrote:
>> 
>> 
>> More generally, I don’t see a compelling reason for LLVM to add
intrinsic support for the version you’re proposing.  Your choice can easily be
expanded into IR, and does not have the wide hardware support (particularly in
GPUs) that the IEEE version does.
>> 
>> The IEEE version can also be expanded in LLVM IR. And for GPUs, many
GPU input languages leave the behavior on NaN unspecified, so it's not
obviously the best guide.
> 
> That’s not generally true.  HLSL (DirectX), CUDA, OpenCL, and Metal all
have defined semantics for NaNs which include not propagating them through
min/max.  GLSL (OpenGL) is the odd one out in this area.
Also, as a practical issues, many GPUs have ISA-level support for the
IEEE-conforming version.  Some (all?) of the AMD GPUs that Matt cares about
support it, and PTX has native operations for it as well.  The IR expansion of
an IEEE-conforming fmin/fmax is at least three compares + selects, which makes
it very difficult to pattern match for these targets.

The inverse form (always propagating NaNs) is not widely natively supported.  I
think AArch64 *might* have it?  MAXPS in SSE performs a ternary operator form
that doesn’t match either definition.

—Owen

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140912/f3200ed8/attachment.html>

Dan Gohman

2014-Sep-13 00:39 UTC

head link

[LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics

On Fri, Sep 12, 2014 at 3:04 PM, Owen Anderson <resistor at mac.com>
wrote:
>
> On Sep 12, 2014, at 2:24 PM, Owen Anderson <resistor at mac.com>
wrote:
>
>
> On Sep 12, 2014, at 10:27 AM, Dan Gohman <dan433584 at gmail.com>
wrote:
>
>
>> More generally, I don’t see a compelling reason for LLVM to add
intrinsic
>> support for the version you’re proposing.  Your choice can easily be
>> expanded into IR, and does not have the wide hardware support
(particularly
>> in GPUs) that the IEEE version does.
>>
>
> The IEEE version can also be expanded in LLVM IR. And for GPUs, many GPU
> input languages leave the behavior on NaN unspecified, so it's not
> obviously the best guide.
>
>
> That’s not generally true.  HLSL (DirectX), CUDA, OpenCL, and Metal all
> have defined semantics for NaNs which include not propagating them through
> min/max.  GLSL (OpenGL) is the odd one out in this area.
>
>HLSL leaves it undefined:

http://msdn.microsoft.com/en-us/library/windows/desktop/bb509624%28v=vs.85%29.aspx

I guess Metal and others only have a "fast-math" flag which (among
other
things) makes behavior on NaN undefined, but it's my impression that
it's a
popular flag.

> Also, as a practical issues, many GPUs have ISA-level support for the
> IEEE-conforming version.  Some (all?) of the AMD GPUs that Matt cares about
> support it, and PTX has native operations for it as well.  The IR expansion
> of an IEEE-conforming fmin/fmax is at least three compares + selects, which
> makes it very difficult to pattern match for these targets.
>
It's 2 compares + selects:

float nan_swallowing_fmin(float a, float b) {
  return b != b ? a : (a < b ? a : b);
}

which is within the realm of pattern-matching.

>
> The inverse form (always propagating NaNs) is not widely natively
> supported.
>
>  I think AArch64 *might* have it?
>
It does. In fact, even armv7 has a NaN-propagating min/max:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489i/CIHDEEBE.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140912/932d64f4/attachment.html>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Sep 2014 - [LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics

[LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics

[LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics

[LLVMdev] [PATCH][RFC]: Add fmin/fmax intrinsics

Reasonably Related Threads