thr3ads.net - llvm dev - [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Dan Gohman

2012-Oct-30 23:19 UTC

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

On Tue, Oct 30, 2012 at 2:25 PM, Michael Ilseman <milseman at apple.com>
wrote:
> Here's a new version of the RFC, incorporating and addressing the
feedback
> from Krzysztof, Eli, Duncan, and Dan.
>
>
> Revision 1 changes:
>   * Removed Fusion flag from all sections
>   * Clarified and changed descriptions of remaining flags:
>     * Make 'N' and 'I' flags be explicitly concerning
values of operands,
> and
>       producing undef values if a NaN/Inf is provided.
>     * 'S' is now only about distinguishing between +/-0.
>     * LangRef changes updated to reflect flags changes
>     * Updated Quesiton section given the now simpler set of flags
>     * Optimizations changed to reflect 'N' and 'I'
describing operands and
> not
>       results
>   * Be explicit on what LLVM's default behavior is (no signaling NaNs,
etc)
>   * Mention that this could be solved with metadata, and open the debate
>
> Introduction
> ---
>
> LLVM IR currently does not have any support for specifying fine-grained
> control
> over relaxing floating point requirements for the optimizer. The below is a
> proposal to extend floating point IR instructions to support a number of
> flags
> that a creator of IR can use to allow for greater optimizations when
> desired. Such changes are sometimes referred to as fast-math, but this
> proposal
> is about finer-grained specifications at a per-instruction level.
>
>
> What this doesn't address
> ---
>
> Default behavior is retained, and this proposal is only addressing relaxing
> restrictions. LLVM currently by default:
>  - ignores signaling NaNs
>  - assumes default rounding mode
>  - assumes FENV_ACCESS is off
>
> Discussion on changing the default behavior of LLVM or allowing for more
> restrictive behavior is outside the scope of this proposal. This proposal
> does
> not address behavior of denormals, which is more of a backend concern.
>
> Specifying exact precision control or requirements is outside the scope of
> this
> proposal, and can probably be handled with the existing metadata
> implementation.
>
> This proposal covers changes to and optimizations over LLVM IR, and
> changes to
> codegen are outside the scope of this proposal. The flags described in the
> next
> section exist only at the IR level, and will not be propagated into
> codegen or
> the SelectionDAG.
>
>
> Flags
> ---
> no NaNs (N)
>   - The optimizer is allowed to optimize under the assumption that the
> operands'
>     values are not NaN. If one of the operands is NaN, the value of the
> result
>     is undefined.
>
> no Infs (I)
>   - The optimizer is allowed to optimize under the assumption that the
> operands'
>     values are not +/-Inf. If one of the operands is +/-Inf, the value of
> the
>     result is undefined.
>
> no signed zeros (S)
>   - The optimizer is allowed to not distinguish between -0 and +0 for the
>     purposes of optimizations.
>
Ok, I checked LLVM CodeGen's existing -enable-no-infs-fp-math
and -enable-no-nans-fp-math flags, and GCC's -ffinite-math-only flag, and
they all say they apply to results as well as arguments. Do you have a good
reason for varying from existing practice here?

Phrasing these from the perspective of the optimizer is a little confusing
here. Also, "The optimizer is allowed to [not care about X]" read
literally
means that the semantics for X are unconstrained, which would be Undefined
Behavior. For I and N here you have a second sentence which says only the
result is undefined, but for S you don't. Also, even when you do have the
second sentence, it seems to contradict the first sentence.

> unsafe algebra (A)
>   - The optimizer is allowed to perform algebraically equivalent
> transformations
>      that may dramatically change results in floating point. (e.g.
>      reassociation)
>
> Throughout I'll refer to these options in their short-hand, e.g.
'A'.
> Internally, these flags are to reside in SubclassData.
>
>
> =====> Question:
>
> Not all combinations make sense (e.g. 'A' pretty much implies all
other
> flags).
>
> Basically, I have the below lattice of sensible relations:
>   A > S > N
>   A > I > N
> Meaning that 'A' implies all the others, 'S' implies
'N', etc.
>
Why does S still imply N?

Also, I'm curious if there's a specific motivation to have I imply N.
LLVM
CodeGen's existing options for these are independent.

> It might be desirable to simplify this into just being a fast-math level.
>
What would make this desirable?

> Changes to optimizations
> ---
>
> Optimizations should be allowed to perform unsafe optimizations provided
> the
> instructions involved have the corresponding restrictions relaxed. When
> combining instructions, optimizations should do what makes sense to not
> remove
> restrictions that previously existed (commonly, a bitwise-AND of the
> flags).
>
> Below are some example optimizations that could be allowed with the given
> relaxations.
>
> N - no NaNs
>   x == x ==> true
>
> S - no signed zeros
>   x - 0 ==> x
>   0 - (x - y) ==> y - x
>
> NIS - no signed zeros AND no NaNs AND no Infs
>   x * 0 ==> 0
>
> NI - no infs AND no NaNs
>   x - x ==> 0
>
> A - unsafe-algebra
>   Reassociation
>     (x + y) + z ==> x + (y + z)
>     (x + C1) + C2 ==> x + (C1 + C2)
>   Redistribution
>     (x * C) + x ==> x * (C+1)
>     (x * C) + (x + x) ==> x * (C + 2)
>   Reciprocal
>    x / C ==> x * (1/C)
>
> These examples apply when the new constants are permitted, e.g. not
> denormal,
> and all the instructions involved have the needed flags.
>
I'm still confused by what you mean in this sentence. Why are you talking
about constants, if you intend this optimizations to be valid for
non-constants? And, it's not clear what you're trying to say about
denormal
values here.

Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121030/a8e4e04f/attachment.html>

Michael Ilseman

2012-Oct-31 03:28 UTC

head link

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

On Oct 30, 2012, at 4:19 PM, Dan Gohman <dan433584 at gmail.com> wrote:
> On Tue, Oct 30, 2012 at 2:25 PM, Michael Ilseman <milseman at
apple.com> wrote:
> Here's a new version of the RFC, incorporating and addressing the
feedback from Krzysztof, Eli, Duncan, and Dan.
> 
> 
> Revision 1 changes:
>   * Removed Fusion flag from all sections
>   * Clarified and changed descriptions of remaining flags:
>     * Make 'N' and 'I' flags be explicitly concerning
values of operands, and
>       producing undef values if a NaN/Inf is provided.
>     * 'S' is now only about distinguishing between +/-0.
>     * LangRef changes updated to reflect flags changes
>     * Updated Quesiton section given the now simpler set of flags
>     * Optimizations changed to reflect 'N' and 'I'
describing operands and not
>       results
>   * Be explicit on what LLVM's default behavior is (no signaling NaNs,
etc)
>   * Mention that this could be solved with metadata, and open the debate
> 
> Introduction
> ---
> 
> LLVM IR currently does not have any support for specifying fine-grained
control
> over relaxing floating point requirements for the optimizer. The below is a
> proposal to extend floating point IR instructions to support a number of
flags
> that a creator of IR can use to allow for greater optimizations when
> desired. Such changes are sometimes referred to as fast-math, but this
proposal
> is about finer-grained specifications at a per-instruction level.
> 
> 
> What this doesn't address
> ---
> 
> Default behavior is retained, and this proposal is only addressing relaxing
> restrictions. LLVM currently by default:
>  - ignores signaling NaNs
>  - assumes default rounding mode
>  - assumes FENV_ACCESS is off
> 
> Discussion on changing the default behavior of LLVM or allowing for more
> restrictive behavior is outside the scope of this proposal. This proposal
does
> not address behavior of denormals, which is more of a backend concern.
> 
> Specifying exact precision control or requirements is outside the scope of
this
> proposal, and can probably be handled with the existing metadata
implementation.
> 
> This proposal covers changes to and optimizations over LLVM IR, and changes
to
> codegen are outside the scope of this proposal. The flags described in the
next
> section exist only at the IR level, and will not be propagated into codegen
or
> the SelectionDAG.
> 
> 
> Flags
> ---
> no NaNs (N)
>   - The optimizer is allowed to optimize under the assumption that the
operands'
>     values are not NaN. If one of the operands is NaN, the value of the
result
>     is undefined.
> 
> no Infs (I)
>   - The optimizer is allowed to optimize under the assumption that the
operands'
>     values are not +/-Inf. If one of the operands is +/-Inf, the value of
the
>     result is undefined.
> 
> no signed zeros (S)
>   - The optimizer is allowed to not distinguish between -0 and +0 for the
>     purposes of optimizations.
> 
> Ok, I checked LLVM CodeGen's existing -enable-no-infs-fp-math and
-enable-no-nans-fp-math flags, and GCC's -ffinite-math-only flag, and they
all say they apply to results as well as arguments. Do you have a good reason
for varying from existing practice here?
> 
The primary example I was trying to simplify with that change was x * 0 ==>
0. It can be performed if you assume NIS inputs, or NS inputs and N outputs.
This is because Inf * 0 is NaN. In hindsight, this is all making things more
confusing, so I think I'll go back to "arguments and results" and
allow this optimization for NS. GCC gets around this by lumping Inf and NaN
under the same command line option.
> Phrasing these from the perspective of the optimizer is a little confusing
here.
I think it might be clearer to change "The optimizer is allowed to …"
to "Allow optimizations to …" and clean up the wording a bit.
> Also, "The optimizer is allowed to [not care about X]" read
literally means that the semantics for X are unconstrained, which would be
Undefined Behavior. For I and N here you have a second sentence which says only
the result is undefined, but for S you don't.
'S' shouldn't have any undefined behavior, it just allows
optimizations to not distinguish between +/-0. It's perfectly legal for the
operation to receive a negative zero, the operation just might treat it exactly
the same as a positive zero. I would rather have that than undefined behavior.

This is similar to how gcc defines -fno-signed-zeros:
"Allow optimizations for floating point arithmetic that ignore the
signedness of zero. IEEE arithmetic specifies the behavior of distinct +0.0 and
-0.0 values, which then prohibits simplification of expressions such as x+0.0 or
0.0*x (even with -ffinite-math-only). This option implies that the sign of a
zero result isn't significant."

I'll revise my description to also mention that the sign of a zero result
isn't significant.
> Also, even when you do have the second sentence, it seems to contradict the
first sentence.
> 
Why does it contradict the first sentence? I meant it as a clarification or
reinforcement of the first, not a contradiction.
> 
> unsafe algebra (A)
>   - The optimizer is allowed to perform algebraically equivalent
transformations
>      that may dramatically change results in floating point. (e.g.
>      reassociation)
> 
> Throughout I'll refer to these options in their short-hand, e.g.
'A'.
> Internally, these flags are to reside in SubclassData.
> 
> 
> =====> Question:
> 
> Not all combinations make sense (e.g. 'A' pretty much implies all
other flags).
> 
> Basically, I have the below lattice of sensible relations:
>   A > S > N
>   A > I > N
> Meaning that 'A' implies all the others, 'S' implies
'N', etc.
> 
> Why does S still imply N?
> 
> Also, I'm curious if there's a specific motivation to have I imply
N. LLVM CodeGen's existing options for these are independent.
> 
> 
> It might be desirable to simplify this into just being a fast-math level.
> 
> What would make this desirable?
>  
I think this "Question" I had no longer makes too much sense, so
I'm going to delete this section.
> Changes to optimizations
> ---
> 
> Optimizations should be allowed to perform unsafe optimizations provided
the
> instructions involved have the corresponding restrictions relaxed. When
> combining instructions, optimizations should do what makes sense to not
remove
> restrictions that previously existed (commonly, a bitwise-AND of the
flags).
> 
> Below are some example optimizations that could be allowed with the given
> relaxations.
> 
> N - no NaNs
>   x == x ==> true
> 
> S - no signed zeros
>   x - 0 ==> x
>   0 - (x - y) ==> y - x
> 
> NIS - no signed zeros AND no NaNs AND no Infs
>   x * 0 ==> 0
> 
> NI - no infs AND no NaNs
>   x - x ==> 0
> 
> A - unsafe-algebra
>   Reassociation
>     (x + y) + z ==> x + (y + z)
>     (x + C1) + C2 ==> x + (C1 + C2)
>   Redistribution
>     (x * C) + x ==> x * (C+1)
>     (x * C) + (x + x) ==> x * (C + 2)
>   Reciprocal
>    x / C ==> x * (1/C)
> 
> These examples apply when the new constants are permitted, e.g. not
denormal,
> and all the instructions involved have the needed flags.
> 
> I'm still confused by what you mean in this sentence. Why are you
talking about constants, if you intend this optimizations to be valid for
non-constants? And, it's not clear what you're trying to say about
denormal values here.
> 
I was mentioning denormals for one of the optimizations. I think it would be
more clear to say something like:>  Reciprocal
>    x / C ==> x * (1/C)  when (1/C) is not denormal
I was mostly trying to say that the optimizations are not blindly applied, but
are applied when they are still legal. I think the sentence is more confusing
than helpful, though.
> Dan
> 
Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121030/4624e664/attachment.html>

Joshua Cranmer

2012-Oct-31 04:11 UTC

head link

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

On 10/30/2012 10:28 PM, Michael Ilseman wrote:>
> On Oct 30, 2012, at 4:19 PM, Dan Gohman <dan433584 at gmail.com 
> <mailto:dan433584 at gmail.com>> wrote:
>
>> On Tue, Oct 30, 2012 at 2:25 PM, Michael Ilseman <milseman at
apple.com
>> <mailto:milseman at apple.com>> wrote:
>>
>>
>>     no signed zeros (S)
>>       - The optimizer is allowed to not distinguish between -0 and +0
>>     for the
>>         purposes of optimizations.
>>
>>
>> Ok, I checked LLVM CodeGen's existing -enable-no-infs-fp-math 
>> and -enable-no-nans-fp-math flags, and GCC's -ffinite-math-only
flag,
>> and they all say they apply to results as well as arguments. Do you 
>> have a good reason for varying from existing practice here?
>>
>
> The primary example I was trying to simplify with that change was x * 
> 0 ==> 0. It can be performed if you assume NIS inputs, or NS inputs 
> and N outputs. This is because Inf * 0 is NaN. In hindsight, this is 
> all making things more confusing, so I think I'll go back to 
> "arguments and results" and allow this optimization for NS. GCC
gets
> around this by lumping Inf and NaN under the same command line option.
>
>> Phrasing these from the perspective of the optimizer is a little 
>> confusing here.
>
> I think it might be clearer to change "The optimizer is allowed to
…"
> to "Allow optimizations to …" and clean up the wording a bit.
>
>> Also, "The optimizer is allowed to [not care about X]" read
literally
>> means that the semantics for X are unconstrained, which would be 
>> Undefined Behavior. For I and N here you have a second sentence which 
>> says only the result is undefined, but for S you don't.
>
> 'S' shouldn't have any undefined behavior, it just allows 
> optimizations to not distinguish between +/-0. It's perfectly legal 
> for the operation to receive a negative zero, the operation just might 
> treat it exactly the same as a positive zero. I would rather have that 
> than undefined behavior.
I'm not an expert in writing specifications, but I think defining the S 
flag in this manner would be preferable:
no signed zeros (S) - If present, then the result of a floating point 
operation with -0.0 or +0.0 as an operand is either the result of the 
operation with the original specified values or the result of the 
operation with the +0.0 or -0.0 replaced with its opposite sign.

As a side note, it's never explicitly stated in the language reference 
how much of IEEE 754 semantics floating point operations must follow.

-- 
Joshua Cranmer
News submodule owner
DXR coauthor

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121030/e8b4a8b9/attachment.html>

Dan Gohman

2012-Nov-01 22:08 UTC

head link

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

On Tue, Oct 30, 2012 at 8:28 PM, Michael Ilseman <milseman at apple.com>
wrote:
>
> This is similar to how gcc defines *-fno-signed-zeros:*
> "Allow optimizations for floating point arithmetic that ignore the
> signedness of zero. IEEE arithmetic specifies the behavior of distinct
> +0.0 and -0.0 values, which then prohibits simplification of expressions
> such as x+0.0 or 0.0*x (even with *-ffinite-math-only*). This option
> implies that the sign of a zero result isn't significant."
>
> I'll revise my description to also mention that the sign of a zero
result
> isn't significant.
>
Ok, I see what you're saying here now.

>
> Also, even when you do have the second sentence, it seems to contradict
> the first sentence.
>
>
> Why does it contradict the first sentence? I meant it as a clarification
> or reinforcement of the first, not a contradiction.
>
Suppose I'm writing a backend for a target which has an instruction that
traps on any kind of NaN. Assuming I care about NaNs, I can't use such an
instruction for regular floating-point operations. However, would it be ok
to use it when the N flag is set?

If the "optimizer" may truly ignore the possibility of NaNs under the
N
flag, this would seem to be ok. However, a trap is outside the boundaries
of "undefined result". So, which half is right?

Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20121101/3be3166d/attachment.html>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Nov 2012 - [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

Apparently Analagous Threads