llvm dev - Oct 2012 - [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

Introduction
---

LLVM IR currently does not have any support for specifying fine-grained control
over relaxing floating point requirements for the optimizer. The below is a
proposal to extend floating point IR instructions to support a number of flags
that a creator of IR can use to allow for greater optimizations when
desired. Such changes are sometimes referred to as fast-math, but this proposal
is about finer-grained specifications at a per-instruction level.

What this doesn't address
---

Default behavior is retained, and this proposal is only addressing relaxing
restrictions. For example, assuming default rounding mode will remain
untouched. Discussion on changing the default behavior of LLVM or allowing for
more restrictive behavior is outside the scope of this proposal. This proposal
does not address behavior of denormals, which is more of a backend concern.

Specifying exact precision control or requirements is outside the scope of this
proposal, and can probably be handled with the existing metadata implementation.

This proposal covers changes to and optimizations over LLVM IR, and changes to
codegen are outside the scope of this proposal. The flags described in the next
section exist only at the IR level, and will not be propagated into codegen or
the SelectionDAG.

Flags
---
no NaNs (N)
  - ignore the existence of NaNs when convenient
no Infs (I)
  - ignore the existence of Infs when convenient
no signed zeros (S)
  - ignore the existence of negative zero when convenient
allow fusion (F)
  - fuse FP operations when convenient, despite possible differences in rounding
    (e.g. form FMAs)
unsafe algebra (A)
  - allow for algebraically equivalent transformations that may dramatically
    change results in floating point. (e.g. reassociation)

Throughout I'll refer to these options in their short-hand, e.g.
'A'.
Internally, these flags are to reside in SubclassData.

=====Question:

Not all combinations make sense (e.g. 'A' pretty much implies all other
flags).

Basically, I have the below semilattice of sensible relations:
  A > S > I > N
  A > F
Meaning that 'A' implies all the others, 'S' implies 'I'
and 'N', etc.

It might make sense to change the S, I, and N options to be some kind of finite
option with levels 3, 2, and 1 respectively. F and A could be kept distinct. It
is still the case that A would imply pretty much everything else.
=====

Changes to LangRef
---

Change the definitions of floating point arithmetic operations, below is how
fadd will change:

'fadd' Instruction
Syntax:

  <result> = fadd {flag}* <ty> <op1>, <op2>   ; yields
{ty}:result
...
Semantics:
 ...
 flag can be one of the following optimizer hints to enable otherwise unsafe
 floating point optimizations:
  N: no NaNs - ignore the existence of NaNs when convenient
  I: no infs - ignore the existence of Infs when convenient
  S: no signed zeros - ignore the existence of negative zero when convenient
  F: allow fusion - fuse FP operations when convenient, despite possible
      differences in rounding
  A: unsafe algebra - allow for algebraically equivalent transformations that
     may dramatically change results in floating point.

Changes to optimizations
---

Optimizations should be allowed to perform unsafe optimizations provided the
instructions involved have the corresponding restrictions relaxed. When
combining instructions, optimizations should do what makes sense to not remove
restrictions that previously existed (commonly, a bitwise-AND of the flags).

Below are some example optimizations that could be allowed with the given
relaxations.

N - no NaNs
  x == x ==> true

S - no signed zeros
  x - 0 ==> x
  0 - (x - y) ==> y - x

NS - no signed zeros AND no NaNs
  x * 0 ==> 0

NI - no infs AND no NaNs
  x - x ==> 0
  Inf > x ==> true

A - unsafe-algebra
  Reassociation
    (x + C1) + C2 ==> x + (C1 + C2)
  Redistribution
    (x * C) + x ==> x * (C+1)
    (x * C) + (x + x) ==> x * (C + 2)
  Reciprocal
   x / C ==> x * (1/C)

These examples apply when the new constants are permitted, e.g. not denormal,
and all the instructions involved have the needed flags.

I propose to expand -instsimplify and -instcombine to perform these kinds of
optimizations. -reassociate will be expanded to reassociate floating point
operations when allowed. Similar to existing behavior regarding integer
wrapping, -early-cse will not CSE FP operations with mismatched flags, while
-gvn will (conservatively). This allows later optimizations to optimize the
expressions independently between runs of -early-cse and -gvn.

Changes to frontends
---

Frontends are free to generate code with flags set as they desire. Frontends
should continue to call llc with their desired options, as the flags apply only
at the IR level and not at codegen or the SelectionDAGs.

Below is a suggested change to clang's command-line options.

-ffast-math
  Currently described as:
  Enable the *frontend*'s 'fast-math' mode. This has no effect on
optimizations,
  but provides a preprocessor macro __FAST_MATH__ the same as GCC's
-ffast-math
  flag

  I propose to change the description and behavior to:

  Enable 'fast-math' mode. This allows for optimizations that may
produce
  incorrect and unsafe results, and thus should only be used with care. This
  also provides a preprocessor macro __FAST_MATH__ the same as GCC's
-ffast-math
  flag

  I propose that this turn on all flags for all floating point instructions. If
  this flag doesn't already cause clang to run llc with
-enable-unsafe-fp-math,
  then I propose that it does so as well.

-fp-contract=<value>
  I'm not too familiar with this option, but I recommend that 'all'
turn on the
  'F' bit for all FP instructinos, default do so when following the
pragma, and
  off never doing so. This option should still be passed to the backend.

(Optional)
I propose adding the below flags:

-ffinite-math-only
  Allow optimizations to assume that floating point arguments and results are
  NaNs or +/-Inf. This may produce incorrect results, and so should be used with
  care.

  This would set the 'I' and 'N' bits on all generated floating
point instructions.

-fno-signed-zeros
  Allow optimizations to ignore the signedness of zero. This may produce
  incorrect results, and so should be used with care.

  This would set the 'S' bit on all FP instructions.

Changes to llvm cli tools
---
opt and llc already have the command line options
  -enable-unsafe-fp-math: Enable optimizations that may decrease FP precision
  -enable-fp-mad: Enable less precise MAD instructions to be generated
  -enable-no-infs-fp-math: Enable FP math optimizations that assume no +-Infs
  -enable-no-nans-fp-math: Enable FP math optimizations that assume no NaNs
However, opt makes no use of them as they are currently only considered to be
TargetOptions. llc will remain unchanged, as these options apply to DAG
optimizations while this proposal deals with IR optimizations.

(Optional)
Have an opt pass that adds the desired flags to floating point instructions.

Miscellaneous explanations in the form of Q&A
---

Why not just have "fast-math" rather than individual flags?

Having the individual flags gives the granularity to choose the levels of
optimizations. For example, unsafe-algebra can lead to dramatically different
results in corner cases, and may not be desired when a user just wants to ensure
that x*0 folds to 0.

Why have these flags attached to the instruction itself, rather than be a
compiler mode?

Being attached to the instruction itself allows much greater flexibility both
for other optimizations and for the concerns of the source and target. For
example, a frontend may desire that x - x be folded to 0. This would require
no-NaNs for the subtract. However, the frontend may want to keep NaNs for its
comparisons.

Additionally, these properties can be set internally in the optimizer when the
property has been proven. For example, if x has been found to be positive, then
operations involving x and a constant can be marked to ignore signed zero.

Finally, having these flags allows for greater safety and optimization when code
of different flags are mixed. For example, a function author may set the
unsafe-algebra flag knowing that such transformations will not meaningfully
alter its result. If that function gets inlined into a caller, however, we
don't
want to always assume that the function's expressions can be reassociated
with
the caller's expressions. These properties allow us to preserve the
optimizations of the inlined function without affecting the caller.

Why not use metadata rather than flags?

There is existing metadata to denote precisions, and this proposal is orthogonal
to those efforts. These flags are analogous to nsw/nuw, and are inherent
properties of the IR instructions themselves that all transformations should
respect.

llvm dev - Oct 2012 - [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

Possibly Parallel Threads