Dan Gohman
2012-Oct-30 23:19 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Tue, Oct 30, 2012 at 2:25 PM, Michael Ilseman <milseman at apple.com> wrote:> Here's a new version of the RFC, incorporating and addressing the feedback > from Krzysztof, Eli, Duncan, and Dan. > > > Revision 1 changes: > * Removed Fusion flag from all sections > * Clarified and changed descriptions of remaining flags: > * Make 'N' and 'I' flags be explicitly concerning values of operands, > and > producing undef values if a NaN/Inf is provided. > * 'S' is now only about distinguishing between +/-0. > * LangRef changes updated to reflect flags changes > * Updated Quesiton section given the now simpler set of flags > * Optimizations changed to reflect 'N' and 'I' describing operands and > not > results > * Be explicit on what LLVM's default behavior is (no signaling NaNs, etc) > * Mention that this could be solved with metadata, and open the debate > > Introduction > --- > > LLVM IR currently does not have any support for specifying fine-grained > control > over relaxing floating point requirements for the optimizer. The below is a > proposal to extend floating point IR instructions to support a number of > flags > that a creator of IR can use to allow for greater optimizations when > desired. Such changes are sometimes referred to as fast-math, but this > proposal > is about finer-grained specifications at a per-instruction level. > > > What this doesn't address > --- > > Default behavior is retained, and this proposal is only addressing relaxing > restrictions. LLVM currently by default: > - ignores signaling NaNs > - assumes default rounding mode > - assumes FENV_ACCESS is off > > Discussion on changing the default behavior of LLVM or allowing for more > restrictive behavior is outside the scope of this proposal. This proposal > does > not address behavior of denormals, which is more of a backend concern. > > Specifying exact precision control or requirements is outside the scope of > this > proposal, and can probably be handled with the existing metadata > implementation. > > This proposal covers changes to and optimizations over LLVM IR, and > changes to > codegen are outside the scope of this proposal. The flags described in the > next > section exist only at the IR level, and will not be propagated into > codegen or > the SelectionDAG. > > > Flags > --- > no NaNs (N) > - The optimizer is allowed to optimize under the assumption that the > operands' > values are not NaN. If one of the operands is NaN, the value of the > result > is undefined. > > no Infs (I) > - The optimizer is allowed to optimize under the assumption that the > operands' > values are not +/-Inf. If one of the operands is +/-Inf, the value of > the > result is undefined. > > no signed zeros (S) > - The optimizer is allowed to not distinguish between -0 and +0 for the > purposes of optimizations. >Ok, I checked LLVM CodeGen's existing -enable-no-infs-fp-math and -enable-no-nans-fp-math flags, and GCC's -ffinite-math-only flag, and they all say they apply to results as well as arguments. Do you have a good reason for varying from existing practice here? Phrasing these from the perspective of the optimizer is a little confusing here. Also, "The optimizer is allowed to [not care about X]" read literally means that the semantics for X are unconstrained, which would be Undefined Behavior. For I and N here you have a second sentence which says only the result is undefined, but for S you don't. Also, even when you do have the second sentence, it seems to contradict the first sentence.> unsafe algebra (A) > - The optimizer is allowed to perform algebraically equivalent > transformations > that may dramatically change results in floating point. (e.g. > reassociation) > > Throughout I'll refer to these options in their short-hand, e.g. 'A'. > Internally, these flags are to reside in SubclassData. > > > =====> Question: > > Not all combinations make sense (e.g. 'A' pretty much implies all other > flags). > > Basically, I have the below lattice of sensible relations: > A > S > N > A > I > N > Meaning that 'A' implies all the others, 'S' implies 'N', etc. >Why does S still imply N? Also, I'm curious if there's a specific motivation to have I imply N. LLVM CodeGen's existing options for these are independent.> It might be desirable to simplify this into just being a fast-math level. >What would make this desirable?> Changes to optimizations > --- > > Optimizations should be allowed to perform unsafe optimizations provided > the > instructions involved have the corresponding restrictions relaxed. When > combining instructions, optimizations should do what makes sense to not > remove > restrictions that previously existed (commonly, a bitwise-AND of the > flags). > > Below are some example optimizations that could be allowed with the given > relaxations. > > N - no NaNs > x == x ==> true > > S - no signed zeros > x - 0 ==> x > 0 - (x - y) ==> y - x > > NIS - no signed zeros AND no NaNs AND no Infs > x * 0 ==> 0 > > NI - no infs AND no NaNs > x - x ==> 0 > > A - unsafe-algebra > Reassociation > (x + y) + z ==> x + (y + z) > (x + C1) + C2 ==> x + (C1 + C2) > Redistribution > (x * C) + x ==> x * (C+1) > (x * C) + (x + x) ==> x * (C + 2) > Reciprocal > x / C ==> x * (1/C) > > These examples apply when the new constants are permitted, e.g. not > denormal, > and all the instructions involved have the needed flags. >I'm still confused by what you mean in this sentence. Why are you talking about constants, if you intend this optimizations to be valid for non-constants? And, it's not clear what you're trying to say about denormal values here. Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121030/a8e4e04f/attachment.html>
Michael Ilseman
2012-Oct-31 03:28 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Oct 30, 2012, at 4:19 PM, Dan Gohman <dan433584 at gmail.com> wrote:> On Tue, Oct 30, 2012 at 2:25 PM, Michael Ilseman <milseman at apple.com> wrote: > Here's a new version of the RFC, incorporating and addressing the feedback from Krzysztof, Eli, Duncan, and Dan. > > > Revision 1 changes: > * Removed Fusion flag from all sections > * Clarified and changed descriptions of remaining flags: > * Make 'N' and 'I' flags be explicitly concerning values of operands, and > producing undef values if a NaN/Inf is provided. > * 'S' is now only about distinguishing between +/-0. > * LangRef changes updated to reflect flags changes > * Updated Quesiton section given the now simpler set of flags > * Optimizations changed to reflect 'N' and 'I' describing operands and not > results > * Be explicit on what LLVM's default behavior is (no signaling NaNs, etc) > * Mention that this could be solved with metadata, and open the debate > > Introduction > --- > > LLVM IR currently does not have any support for specifying fine-grained control > over relaxing floating point requirements for the optimizer. The below is a > proposal to extend floating point IR instructions to support a number of flags > that a creator of IR can use to allow for greater optimizations when > desired. Such changes are sometimes referred to as fast-math, but this proposal > is about finer-grained specifications at a per-instruction level. > > > What this doesn't address > --- > > Default behavior is retained, and this proposal is only addressing relaxing > restrictions. LLVM currently by default: > - ignores signaling NaNs > - assumes default rounding mode > - assumes FENV_ACCESS is off > > Discussion on changing the default behavior of LLVM or allowing for more > restrictive behavior is outside the scope of this proposal. This proposal does > not address behavior of denormals, which is more of a backend concern. > > Specifying exact precision control or requirements is outside the scope of this > proposal, and can probably be handled with the existing metadata implementation. > > This proposal covers changes to and optimizations over LLVM IR, and changes to > codegen are outside the scope of this proposal. The flags described in the next > section exist only at the IR level, and will not be propagated into codegen or > the SelectionDAG. > > > Flags > --- > no NaNs (N) > - The optimizer is allowed to optimize under the assumption that the operands' > values are not NaN. If one of the operands is NaN, the value of the result > is undefined. > > no Infs (I) > - The optimizer is allowed to optimize under the assumption that the operands' > values are not +/-Inf. If one of the operands is +/-Inf, the value of the > result is undefined. > > no signed zeros (S) > - The optimizer is allowed to not distinguish between -0 and +0 for the > purposes of optimizations. > > Ok, I checked LLVM CodeGen's existing -enable-no-infs-fp-math and -enable-no-nans-fp-math flags, and GCC's -ffinite-math-only flag, and they all say they apply to results as well as arguments. Do you have a good reason for varying from existing practice here? >The primary example I was trying to simplify with that change was x * 0 ==> 0. It can be performed if you assume NIS inputs, or NS inputs and N outputs. This is because Inf * 0 is NaN. In hindsight, this is all making things more confusing, so I think I'll go back to "arguments and results" and allow this optimization for NS. GCC gets around this by lumping Inf and NaN under the same command line option.> Phrasing these from the perspective of the optimizer is a little confusing here.I think it might be clearer to change "The optimizer is allowed to …" to "Allow optimizations to …" and clean up the wording a bit.> Also, "The optimizer is allowed to [not care about X]" read literally means that the semantics for X are unconstrained, which would be Undefined Behavior. For I and N here you have a second sentence which says only the result is undefined, but for S you don't.'S' shouldn't have any undefined behavior, it just allows optimizations to not distinguish between +/-0. It's perfectly legal for the operation to receive a negative zero, the operation just might treat it exactly the same as a positive zero. I would rather have that than undefined behavior. This is similar to how gcc defines -fno-signed-zeros: "Allow optimizations for floating point arithmetic that ignore the signedness of zero. IEEE arithmetic specifies the behavior of distinct +0.0 and -0.0 values, which then prohibits simplification of expressions such as x+0.0 or 0.0*x (even with -ffinite-math-only). This option implies that the sign of a zero result isn't significant." I'll revise my description to also mention that the sign of a zero result isn't significant.> Also, even when you do have the second sentence, it seems to contradict the first sentence. >Why does it contradict the first sentence? I meant it as a clarification or reinforcement of the first, not a contradiction.> > unsafe algebra (A) > - The optimizer is allowed to perform algebraically equivalent transformations > that may dramatically change results in floating point. (e.g. > reassociation) > > Throughout I'll refer to these options in their short-hand, e.g. 'A'. > Internally, these flags are to reside in SubclassData. > > > =====> Question: > > Not all combinations make sense (e.g. 'A' pretty much implies all other flags). > > Basically, I have the below lattice of sensible relations: > A > S > N > A > I > N > Meaning that 'A' implies all the others, 'S' implies 'N', etc. > > Why does S still imply N? > > Also, I'm curious if there's a specific motivation to have I imply N. LLVM CodeGen's existing options for these are independent. > > > It might be desirable to simplify this into just being a fast-math level. > > What would make this desirable? >I think this "Question" I had no longer makes too much sense, so I'm going to delete this section.> Changes to optimizations > --- > > Optimizations should be allowed to perform unsafe optimizations provided the > instructions involved have the corresponding restrictions relaxed. When > combining instructions, optimizations should do what makes sense to not remove > restrictions that previously existed (commonly, a bitwise-AND of the flags). > > Below are some example optimizations that could be allowed with the given > relaxations. > > N - no NaNs > x == x ==> true > > S - no signed zeros > x - 0 ==> x > 0 - (x - y) ==> y - x > > NIS - no signed zeros AND no NaNs AND no Infs > x * 0 ==> 0 > > NI - no infs AND no NaNs > x - x ==> 0 > > A - unsafe-algebra > Reassociation > (x + y) + z ==> x + (y + z) > (x + C1) + C2 ==> x + (C1 + C2) > Redistribution > (x * C) + x ==> x * (C+1) > (x * C) + (x + x) ==> x * (C + 2) > Reciprocal > x / C ==> x * (1/C) > > These examples apply when the new constants are permitted, e.g. not denormal, > and all the instructions involved have the needed flags. > > I'm still confused by what you mean in this sentence. Why are you talking about constants, if you intend this optimizations to be valid for non-constants? And, it's not clear what you're trying to say about denormal values here. >I was mentioning denormals for one of the optimizations. I think it would be more clear to say something like:> Reciprocal > x / C ==> x * (1/C) when (1/C) is not denormalI was mostly trying to say that the optimizations are not blindly applied, but are applied when they are still legal. I think the sentence is more confusing than helpful, though.> Dan >Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121030/4624e664/attachment.html>
Joshua Cranmer
2012-Oct-31 04:11 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On 10/30/2012 10:28 PM, Michael Ilseman wrote:> > On Oct 30, 2012, at 4:19 PM, Dan Gohman <dan433584 at gmail.com > <mailto:dan433584 at gmail.com>> wrote: > >> On Tue, Oct 30, 2012 at 2:25 PM, Michael Ilseman <milseman at apple.com >> <mailto:milseman at apple.com>> wrote: >> >> >> no signed zeros (S) >> - The optimizer is allowed to not distinguish between -0 and +0 >> for the >> purposes of optimizations. >> >> >> Ok, I checked LLVM CodeGen's existing -enable-no-infs-fp-math >> and -enable-no-nans-fp-math flags, and GCC's -ffinite-math-only flag, >> and they all say they apply to results as well as arguments. Do you >> have a good reason for varying from existing practice here? >> > > The primary example I was trying to simplify with that change was x * > 0 ==> 0. It can be performed if you assume NIS inputs, or NS inputs > and N outputs. This is because Inf * 0 is NaN. In hindsight, this is > all making things more confusing, so I think I'll go back to > "arguments and results" and allow this optimization for NS. GCC gets > around this by lumping Inf and NaN under the same command line option. > >> Phrasing these from the perspective of the optimizer is a little >> confusing here. > > I think it might be clearer to change "The optimizer is allowed to …" > to "Allow optimizations to …" and clean up the wording a bit. > >> Also, "The optimizer is allowed to [not care about X]" read literally >> means that the semantics for X are unconstrained, which would be >> Undefined Behavior. For I and N here you have a second sentence which >> says only the result is undefined, but for S you don't. > > 'S' shouldn't have any undefined behavior, it just allows > optimizations to not distinguish between +/-0. It's perfectly legal > for the operation to receive a negative zero, the operation just might > treat it exactly the same as a positive zero. I would rather have that > than undefined behavior.I'm not an expert in writing specifications, but I think defining the S flag in this manner would be preferable: no signed zeros (S) - If present, then the result of a floating point operation with -0.0 or +0.0 as an operand is either the result of the operation with the original specified values or the result of the operation with the +0.0 or -0.0 replaced with its opposite sign. As a side note, it's never explicitly stated in the language reference how much of IEEE 754 semantics floating point operations must follow. -- Joshua Cranmer News submodule owner DXR coauthor -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121030/e8b4a8b9/attachment.html>
Dan Gohman
2012-Nov-01 22:08 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Tue, Oct 30, 2012 at 8:28 PM, Michael Ilseman <milseman at apple.com> wrote:>> This is similar to how gcc defines *-fno-signed-zeros:* > "Allow optimizations for floating point arithmetic that ignore the > signedness of zero. IEEE arithmetic specifies the behavior of distinct > +0.0 and -0.0 values, which then prohibits simplification of expressions > such as x+0.0 or 0.0*x (even with *-ffinite-math-only*). This option > implies that the sign of a zero result isn't significant." > > I'll revise my description to also mention that the sign of a zero result > isn't significant. >Ok, I see what you're saying here now.> > Also, even when you do have the second sentence, it seems to contradict > the first sentence. > > > Why does it contradict the first sentence? I meant it as a clarification > or reinforcement of the first, not a contradiction. >Suppose I'm writing a backend for a target which has an instruction that traps on any kind of NaN. Assuming I care about NaNs, I can't use such an instruction for regular floating-point operations. However, would it be ok to use it when the N flag is set? If the "optimizer" may truly ignore the possibility of NaNs under the N flag, this would seem to be ok. However, a trap is outside the boundaries of "undefined result". So, which half is right? Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121101/3be3166d/attachment.html>
Reasonably Related Threads
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level