thr3ads.net - llvm dev - [llvm-dev] Subnormal handling attributes [Nov 2019]

If this information is useful, please help other people find it:
Share via:

Arsenault, Matthew via llvm-dev

2019-Nov-01 22:19 UTC

[llvm-dev] Subnormal handling attributes

Hi,

In order to handle subnormals more correctly, changes are necessary to the
“denormal-fp-math” string attribute. There are at least 3 different problems I
have with the current implementation.

There’s currently no documentation on the meaning of the attribute itself, only
the corresponding clang flag (documented at
https://clang.llvm.org/docs/UsersManual.html#cmdoption-fdenormal-fp-math). The
current description is not particularly clear to me: “Select which denormal
numbers the code is permitted to require.” Require in what sense? Does this mean
it’s assumed a floating point instruction in a function will never see a
denormal input? What are the restrictions on what happens if a denormal is used?
Does it mean denormals are required to be flushed by any floating point
instruction?

The claim that this “defaults to ieee” is simply untrue. If the flag is not
specified, clang does not emit the corresponding attribute in the IR. The one
user for code generation of this attribute
(https://github.com/llvm/llvm-project/blob/4531aee2ac1609e8ddf4f3deec200c5f793faa7b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L20456)
assumes some type of flushing if the attribute isn’t specified, which is
incorrect.

In the context of this use, I think what this attribute is intended to mean is
the behavior of subnormal outputs from the regular floating point instructions
in the default floating point environment. It does not necessarily mean denormal
inputs are not allowed or interpreted as zero. Is this a correct interpretation?

The first problem here is assuming a non-IEEE target by default, which doesn’t
even match the documented behavior of the flag. In order to fix this without
introducing performance regressions, clang needs to start emitting the attribute
for platforms where the default floating point mode is known to flush subnormal
outputs. What platforms are these? It’s overly difficult to find documentation
on what the default mode is on different platforms, and even more difficult to
find out the finer points like if it’s a signed flush to zero or not. It would
be helpful if interested developers could prepare to handle this switch in the
default behavior.

The second problem is this attribute is insufficient to describe the variety of
subnormal behaviors. For example on X86 and AMDGPU, the floating point control
register provide separate controls for instructions flushing their outputs, and
for treating input denormals as 0. ICC for instance provides a separate flag for
each of these cases:
https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-setting-the-ftz-and-daz-flags.
If we’re bothering to model this correctly, we might as well try to fully model
all of these.

The third problem is that AMDGPU has separate flushing controls for f32, from
f64 and f16 instructions. On AMDGPU there’s no performance advantage for turning
off f64 denormals (and a fairly limited benefit to turning off f16 denormals
from enabling selecting a handful of instructions). However, there is a big
advantage to turning them off for f32 on most subtargets. The
-cl-denorms-are-zero flag gives the freedom to only flush on the desired types,
so ideally these attributes would be broken down per type.

My end goal is to be able to specify denormal flushing per-type that can map
into the default initialization fields for the AMDGPU FP mode register, which
will also be usable for selection and DAG combines. Currently we use subtarget
features for this, which is a bit hacky and I’m trying to replace with some form
of attribute (with defaults determined by the calling convention).

With something that looks like the current attribute, we would potentially need
(flush input, flush output) * (f16, f32, f64) = 6 attributes to cover the basic
types. If you include the more exotic FP types, this would come to 12+
attributes, which is a bit ridiculous.

What would be the preferred form of replacement attributes? I think one
attribute per-type that looks something like a bitfield that describes both the
input and output denormal behavior would be most preferable. As for bikeshedding
issues, what are these attributes called? Should these use the IR names for the
FP types, or the MVT names (i.e. -float vs. -f32)? Are these still string
attributes, or should this be promoted to a real attribute? I do think the
naming scheme should move towards IEEE’s current preferred terminology of
subnormal over the commonly used denormal.

I started on some of the work towards some of these fixes in
https://reviews.llvm.org/D69598

-Matt

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191101/838d62b8/attachment.html>

llvm dev - Nov 2019 - Subnormal handling attributes

[llvm-dev] Subnormal handling attributes