Michael Ilseman
2012-Oct-29 23:34 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
Introduction --- LLVM IR currently does not have any support for specifying fine-grained control over relaxing floating point requirements for the optimizer. The below is a proposal to extend floating point IR instructions to support a number of flags that a creator of IR can use to allow for greater optimizations when desired. Such changes are sometimes referred to as fast-math, but this proposal is about finer-grained specifications at a per-instruction level. What this doesn't address --- Default behavior is retained, and this proposal is only addressing relaxing restrictions. For example, assuming default rounding mode will remain untouched. Discussion on changing the default behavior of LLVM or allowing for more restrictive behavior is outside the scope of this proposal. This proposal does not address behavior of denormals, which is more of a backend concern. Specifying exact precision control or requirements is outside the scope of this proposal, and can probably be handled with the existing metadata implementation. This proposal covers changes to and optimizations over LLVM IR, and changes to codegen are outside the scope of this proposal. The flags described in the next section exist only at the IR level, and will not be propagated into codegen or the SelectionDAG. Flags --- no NaNs (N) - ignore the existence of NaNs when convenient no Infs (I) - ignore the existence of Infs when convenient no signed zeros (S) - ignore the existence of negative zero when convenient allow fusion (F) - fuse FP operations when convenient, despite possible differences in rounding (e.g. form FMAs) unsafe algebra (A) - allow for algebraically equivalent transformations that may dramatically change results in floating point. (e.g. reassociation) Throughout I'll refer to these options in their short-hand, e.g. 'A'. Internally, these flags are to reside in SubclassData. =====Question: Not all combinations make sense (e.g. 'A' pretty much implies all other flags). Basically, I have the below semilattice of sensible relations: A > S > I > N A > F Meaning that 'A' implies all the others, 'S' implies 'I' and 'N', etc. It might make sense to change the S, I, and N options to be some kind of finite option with levels 3, 2, and 1 respectively. F and A could be kept distinct. It is still the case that A would imply pretty much everything else. ===== Changes to LangRef --- Change the definitions of floating point arithmetic operations, below is how fadd will change: 'fadd' Instruction Syntax: <result> = fadd {flag}* <ty> <op1>, <op2> ; yields {ty}:result ... Semantics: ... flag can be one of the following optimizer hints to enable otherwise unsafe floating point optimizations: N: no NaNs - ignore the existence of NaNs when convenient I: no infs - ignore the existence of Infs when convenient S: no signed zeros - ignore the existence of negative zero when convenient F: allow fusion - fuse FP operations when convenient, despite possible differences in rounding A: unsafe algebra - allow for algebraically equivalent transformations that may dramatically change results in floating point. Changes to optimizations --- Optimizations should be allowed to perform unsafe optimizations provided the instructions involved have the corresponding restrictions relaxed. When combining instructions, optimizations should do what makes sense to not remove restrictions that previously existed (commonly, a bitwise-AND of the flags). Below are some example optimizations that could be allowed with the given relaxations. N - no NaNs x == x ==> true S - no signed zeros x - 0 ==> x 0 - (x - y) ==> y - x NS - no signed zeros AND no NaNs x * 0 ==> 0 NI - no infs AND no NaNs x - x ==> 0 Inf > x ==> true A - unsafe-algebra Reassociation (x + C1) + C2 ==> x + (C1 + C2) Redistribution (x * C) + x ==> x * (C+1) (x * C) + (x + x) ==> x * (C + 2) Reciprocal x / C ==> x * (1/C) These examples apply when the new constants are permitted, e.g. not denormal, and all the instructions involved have the needed flags. I propose to expand -instsimplify and -instcombine to perform these kinds of optimizations. -reassociate will be expanded to reassociate floating point operations when allowed. Similar to existing behavior regarding integer wrapping, -early-cse will not CSE FP operations with mismatched flags, while -gvn will (conservatively). This allows later optimizations to optimize the expressions independently between runs of -early-cse and -gvn. Changes to frontends --- Frontends are free to generate code with flags set as they desire. Frontends should continue to call llc with their desired options, as the flags apply only at the IR level and not at codegen or the SelectionDAGs. Below is a suggested change to clang's command-line options. -ffast-math Currently described as: Enable the *frontend*'s 'fast-math' mode. This has no effect on optimizations, but provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math flag I propose to change the description and behavior to: Enable 'fast-math' mode. This allows for optimizations that may produce incorrect and unsafe results, and thus should only be used with care. This also provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math flag I propose that this turn on all flags for all floating point instructions. If this flag doesn't already cause clang to run llc with -enable-unsafe-fp-math, then I propose that it does so as well. -fp-contract=<value> I'm not too familiar with this option, but I recommend that 'all' turn on the 'F' bit for all FP instructinos, default do so when following the pragma, and off never doing so. This option should still be passed to the backend. (Optional) I propose adding the below flags: -ffinite-math-only Allow optimizations to assume that floating point arguments and results are NaNs or +/-Inf. This may produce incorrect results, and so should be used with care. This would set the 'I' and 'N' bits on all generated floating point instructions. -fno-signed-zeros Allow optimizations to ignore the signedness of zero. This may produce incorrect results, and so should be used with care. This would set the 'S' bit on all FP instructions. Changes to llvm cli tools --- opt and llc already have the command line options -enable-unsafe-fp-math: Enable optimizations that may decrease FP precision -enable-fp-mad: Enable less precise MAD instructions to be generated -enable-no-infs-fp-math: Enable FP math optimizations that assume no +-Infs -enable-no-nans-fp-math: Enable FP math optimizations that assume no NaNs However, opt makes no use of them as they are currently only considered to be TargetOptions. llc will remain unchanged, as these options apply to DAG optimizations while this proposal deals with IR optimizations. (Optional) Have an opt pass that adds the desired flags to floating point instructions. Miscellaneous explanations in the form of Q&A --- Why not just have "fast-math" rather than individual flags? Having the individual flags gives the granularity to choose the levels of optimizations. For example, unsafe-algebra can lead to dramatically different results in corner cases, and may not be desired when a user just wants to ensure that x*0 folds to 0. Why have these flags attached to the instruction itself, rather than be a compiler mode? Being attached to the instruction itself allows much greater flexibility both for other optimizations and for the concerns of the source and target. For example, a frontend may desire that x - x be folded to 0. This would require no-NaNs for the subtract. However, the frontend may want to keep NaNs for its comparisons. Additionally, these properties can be set internally in the optimizer when the property has been proven. For example, if x has been found to be positive, then operations involving x and a constant can be marked to ignore signed zero. Finally, having these flags allows for greater safety and optimization when code of different flags are mixed. For example, a function author may set the unsafe-algebra flag knowing that such transformations will not meaningfully alter its result. If that function gets inlined into a caller, however, we don't want to always assume that the function's expressions can be reassociated with the caller's expressions. These properties allow us to preserve the optimizations of the inlined function without affecting the caller. Why not use metadata rather than flags? There is existing metadata to denote precisions, and this proposal is orthogonal to those efforts. These flags are analogous to nsw/nuw, and are inherent properties of the IR instructions themselves that all transformations should respect.
Krzysztof Parzyszek
2012-Oct-30 00:18 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On 10/29/2012 6:34 PM, Michael Ilseman wrote: > > N: no NaNs - ignore the existence of NaNs when convenient Maybe distinguish between quiet and signaling NaNs? > NI - no infs AND no NaNs > x - x ==> 0 > Inf > x ==> true Inf * x ==> 0? I think that if an infinity appears when NI (or I) is given, the result should be left as "undefined". Similarly with NaNs. In such cases, it's impossible to predict the accuracy of the result, so trying to define what happens is pretty much moot. In this case Inf > x may as well be simplified to "false" without any loss of (already absent) meaning. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Eli Friedman
2012-Oct-30 00:30 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Mon, Oct 29, 2012 at 5:18 PM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote:> On 10/29/2012 6:34 PM, Michael Ilseman wrote: >> >> N: no NaNs - ignore the existence of NaNs when convenient > > Maybe distinguish between quiet and signaling NaNs?We already ignore the existence of signaling NaNs by default. The proposal could make that more clear, though. -Eli
Michael Ilseman
2012-Oct-30 03:22 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Oct 29, 2012, at 5:18 PM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote:> On 10/29/2012 6:34 PM, Michael Ilseman wrote: > > > > N: no NaNs - ignore the existence of NaNs when convenient > > Maybe distinguish between quiet and signaling NaNs? > > > > NI - no infs AND no NaNs > > x - x ==> 0 > > Inf > x ==> true > > Inf * x ==> 0? > > I think that if an infinity appears when NI (or I) is given, the result should be left as "undefined". Similarly with NaNs. In such cases, it's impossible to predict the accuracy of the result, so trying to define what happens is pretty much moot. In this case Inf > x may as well be simplified to "false" without any loss of (already absent) meaning. >The goal is not necessarily to un-define Inf/NaN, but to opt-in to unsafe optimizations that would otherwise not be allowed to be applied, e.g. x*0==>0. There may be examples where these optimizations produce arbitrary results as though those constructs were absent in meaning, but that doesn't make Inf/NaN constants completely undefined in general. The "when convenient" wording is already a little vague/permissive, and could be re-worded to state that Values are assumed to not be Inf/NaN when convenient, but Constants may be honored.> -Krzysztof > > > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Duncan Sands
2012-Oct-30 08:46 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
Hi Michael,> Flags > --- > no NaNs (N) > - ignore the existence of NaNs when convenient > no Infs (I) > - ignore the existence of Infs when convenient > no signed zeros (S) > - ignore the existence of negative zero when convenientwhile the above flags make perfect sense for me, the other two seem more dubious:> allow fusion (F) > - fuse FP operations when convenient, despite possible differences in rounding > (e.g. form FMAs) > unsafe algebra (A) > - allow for algebraically equivalent transformations that may dramatically > change results in floating point. (e.g. reassociation)They don't seem to be capturing a clear concept, they seem more like a grab-bag of "everything else" (A) or "here's a random thing that is important today so let's have a flag for it" (F). ...> Why not use metadata rather than flags? > > There is existing metadata to denote precisions, and this proposal is orthogonal > to those efforts. These flags are analogous to nsw/nuw, and are inherent > properties of the IR instructions themselves that all transformations should > respect.If you drop any of these flags then things are still conservatively correct, just like with metadata. In my opinion this could be implemented as metadata. (I'm not saying it should be represented as metadata, I'm saying it could be). Disadvantages of metadata: - Bloats the IR (however my measurements suggest this is by < 2% for math heavy code) - More painful to work with (though helper classes can mitigate this) - Less efficient to modify (but will flags be cleared that often)? Disadvantages of using subclass data bits: - Can only represent flags. Thus you might end up with a mix of flags and metadata for floating point math, with the metadata holding the non-flag info, and subclass data holding the flags. In which case it might be better to just have it all be metadata in the first place - Only a limited number of bits (but hey) Hopefully Chris will weigh in with his opinion. Ciao, Duncan.
Dan Gohman
2012-Oct-30 15:23 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
Hi Micheal, On Mon, Oct 29, 2012 at 4:34 PM, Michael Ilseman <milseman at apple.com> wrote:> I > Flags > --- > no NaNs (N) > - ignore the existence of NaNs when convenient > no Infs (I) > - ignore the existence of Infs when convenient > no signed zeros (S) > - ignore the existence of negative zero when convenient >Does this mean ignore the possibility of NaNs as operands, as results, or both? Ditto for infinity and negative zero. Also, what does "ignore" mean? As worded, it seems to imply Undefined Behavior if the value is encountered. Is that intended?> allow fusion (F) > - fuse FP operations when convenient, despite possible differences in > rounding > (e.g. form FMAs) >What do you intend to be the relationship between this and @llvm.fmuladd? It's not clear whether you're trying to replace it or trying to set up an alternative for different use cases. Is your wording of "fusing" intended to imply fusing with infinite intermediate precision only, or is mere increased precision also valid? unsafe algebra (A)> - allow for algebraically equivalent transformations that may > dramatically > change results in floating point. (e.g. reassociation) >[...]> Not all combinations make sense (e.g. 'A' pretty much implies all other > flags). > > Basically, I have the below semilattice of sensible relations: > A > S > I > N > A > F > Meaning that 'A' implies all the others, 'S' implies 'I' and 'N', etc. >Why does it make sense for S to imply I and N? GCC's -fno-signed-zeros flag doesn't seem to imply -ffinite-math-only, among other things. The concept of negative zero isn't inherently linked with the concepts of infinity or NaN.> > It might make sense to change the S, I, and N options to be some kind of > finite > option with levels 3, 2, and 1 respectively. F and A could be kept > distinct. It > is still the case that A would imply pretty much everything else. > > N - no NaNs > x == x ==> true >This is not true if x is infinity.> > S - no signed zeros > x - 0 ==> x > 0 - (x - y) ==> y - x > > NS - no signed zeros AND no NaNs > x * 0 ==> 0 > > NI - no infs AND no NaNs > x - x ==> 0 > Inf > x ==> true >With the I flag, would the infinity as an operand make this undefined?> > A - unsafe-algebra > Reassociation > (x + C1) + C2 ==> x + (C1 + C2) >Redistribution> (x * C) + x ==> x * (C+1) > (x * C) + (x + x) ==> x * (C + 2) > Reciprocal > x / C ==> x * (1/C) > > These examples apply when the new constants are permitted, e.g. not > denormal, > and all the instructions involved have the needed flags. >I'm confused. In other places, you seem to apply that reassociation would be valid even on non-constant values. It's not clear whether you meant to contradict that here. [...]> -fp-contract=<value> > I'm not too familiar with this option, but I recommend that 'all' turn > on the > 'F' bit for all FP instructinos, default do so when following the > pragma, and > off never doing so. This option should still be passed to the backend. >Please coordinate with Lang and others who have already done a fair amount of work on FP_CONTRACT.> > (Optional) > I propose adding the below flags: > > -ffinite-math-only > Allow optimizations to assume that floating point arguments and results > are > NaNs or +/-Inf. This may produce incorrect results, and so should be > used with > care. > > This would set the 'I' and 'N' bits on all generated floating point > instructions. > > -fno-signed-zeros > Allow optimizations to ignore the signedness of zero. This may produce > incorrect results, and so should be used with care. > > This would set the 'S' bit on all FP instructions. >These are established flags in GCC. Do you know if there are any semantic differences between your proposed semantics and the semantics of these flags in GCC? If so, it would be good to either change to match them, or document the differences. Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121030/a649966a/attachment.html>
Dan Gohman
2012-Oct-30 15:31 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Tue, Oct 30, 2012 at 8:23 AM, Dan Gohman <dan433584 at gmail.com> wrote:> > On Mon, Oct 29, 2012 at 4:34 PM, Michael Ilseman <milseman at apple.com>wrote: > > >> N - no NaNs >> x == x ==> true >> > > This is not true if x is infinity. >Oops, I was wrong here. Infinity is defined to be equal to infinity. Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121030/d8bcc2c7/attachment.html>
Michael Ilseman
2012-Oct-30 16:36 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Oct 30, 2012, at 1:46 AM, Duncan Sands <baldrick at free.fr> wrote:> Hi Michael, > >> Flags >> --- >> no NaNs (N) >> - ignore the existence of NaNs when convenient >> no Infs (I) >> - ignore the existence of Infs when convenient >> no signed zeros (S) >> - ignore the existence of negative zero when convenient > > while the above flags make perfect sense for me, the other two seem more > dubious: > >> allow fusion (F) >> - fuse FP operations when convenient, despite possible differences in rounding >> (e.g. form FMAs) >> unsafe algebra (A) >> - allow for algebraically equivalent transformations that may dramatically >> change results in floating point. (e.g. reassociation) > > They don't seem to be capturing a clear concept, they seem more like a grab-bag > of "everything else" (A) or "here's a random thing that is important today so > let's have a flag for it" (F). >'A' is certainly a bit of a grab-bag, but I had difficulty breaking it apart into finer-grained pieces that a user would want to pick and choose between. I'd be interested in any suggestions you might have along these lines. Why is 'F' such a random flag to have? 'F' implies ignoring intermediate rounding when a more efficient version exists, and it seems fair for it to be its own category.> ... > >> Why not use metadata rather than flags? >> >> There is existing metadata to denote precisions, and this proposal is orthogonal >> to those efforts. These flags are analogous to nsw/nuw, and are inherent >> properties of the IR instructions themselves that all transformations should >> respect. > > If you drop any of these flags then things are still conservatively correct, > just like with metadata. In my opinion this could be implemented as metadata. > (I'm not saying it should be represented as metadata, I'm saying it could be). > > Disadvantages of metadata: > > - Bloats the IR (however my measurements suggest this is by < 2% for math heavy > code) > - More painful to work with (though helper classes can mitigate this) > - Less efficient to modify (but will flags be cleared that often)? > > Disadvantages of using subclass data bits: > > - Can only represent flags. Thus you might end up with a mix of flags and > metadata for floating point math, with the metadata holding the non-flag > info, and subclass data holding the flags. In which case it might be better > to just have it all be metadata in the first place > - Only a limited number of bits (but hey) > > Hopefully Chris will weigh in with his opinion. > > Ciao, Duncan. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdevThanks for the feedback!
Michael Ilseman
2012-Oct-30 17:18 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Oct 30, 2012, at 8:23 AM, Dan Gohman <dan433584 at gmail.com> wrote:> Hi Micheal, > > On Mon, Oct 29, 2012 at 4:34 PM, Michael Ilseman <milseman at apple.com> wrote: > I > Flags > --- > no NaNs (N) > - ignore the existence of NaNs when convenient > no Infs (I) > - ignore the existence of Infs when convenient > no signed zeros (S) > - ignore the existence of negative zero when convenient > > Does this mean ignore the possibility of NaNs as operands, as results, or both? Ditto for infinity and negative zero. >I wrote this thinking both, though I could certainly imagine it being clearer if defined as operands. The example optimizations section is written along the lines of ignoring both.> Also, what does "ignore" mean? As worded, it seems to imply Undefined Behavior if the value is encountered. Is that intended? >What I'm intending is for optimizations to be allowed to ignore the possibility of those values. Thinking about it more, this is pretty vague. With your and Krzysztof's feedback in mind, I think something along the lines of: no NaNs (N) - The operands' values can be assumed to be non-NaN by the optimizer. The result of this operator is Undef if passed a NaN. Might be more clear. I'll think about that more and revise the examples section too.> allow fusion (F) > - fuse FP operations when convenient, despite possible differences in rounding > (e.g. form FMAs) > > What do you intend to be the relationship between this and @llvm.fmuladd? It's not clear whether you're trying to replace it or trying to set up an alternative for different use cases. >Interesting, I had not seen llvm.fmuladd. I'll have to think about this more; perhaps fmuladd can already provide what I was intending here.> Is your wording of "fusing" intended to imply fusing with infinite intermediate precision only, or is mere increased precision also valid? >My intention is that increased precision is also valid, though I haven't though too deeply about the difference> unsafe algebra (A) > - allow for algebraically equivalent transformations that may dramatically > change results in floating point. (e.g. reassociation) > [...] > Not all combinations make sense (e.g. 'A' pretty much implies all other flags). > > Basically, I have the below semilattice of sensible relations: > A > S > I > N > A > F > Meaning that 'A' implies all the others, 'S' implies 'I' and 'N', etc. > > Why does it make sense for S to imply I and N? GCC's -fno-signed-zeros flag doesn't seem to imply -ffinite-math-only, among other things. The concept of negative zero isn't inherently linked with the concepts of infinity or NaN. >What I mean here is that I'm finding it hard to think of a case where a user would desire to specify 'I' and not specify 'N'. This is more so a question I had as to whether we could/should express this as a fast-math level rather than allow each flag to be individually toggle-able. Any thoughts on this?> > It might make sense to change the S, I, and N options to be some kind of finite > option with levels 3, 2, and 1 respectively. F and A could be kept distinct. It > is still the case that A would imply pretty much everything else. > > N - no NaNs > x == x ==> true > > This is not true if x is infinity. > > > S - no signed zeros > x - 0 ==> x > 0 - (x - y) ==> y - x > > NS - no signed zeros AND no NaNs > x * 0 ==> 0 > > NI - no infs AND no NaNs > x - x ==> 0 > Inf > x ==> true > > With the I flag, would the infinity as an operand make this undefined? >I'll think about this more with regards to the prior changes.> > A - unsafe-algebra > Reassociation > (x + C1) + C2 ==> x + (C1 + C2) > Redistribution > (x * C) + x ==> x * (C+1) > (x * C) + (x + x) ==> x * (C + 2) > Reciprocal > x / C ==> x * (1/C) > > These examples apply when the new constants are permitted, e.g. not denormal, > and all the instructions involved have the needed flags. > > I'm confused. In other places, you seem to apply that reassociation would be valid even on non-constant values. It's not clear whether you meant to contradict that here. >Reassociation is still valid. These examples are just cases where there would be a clear optimization benefit to be had. I'll probably add in a general expression to clarify.> [...] > -fp-contract=<value> > I'm not too familiar with this option, but I recommend that 'all' turn on the > 'F' bit for all FP instructinos, default do so when following the pragma, and > off never doing so. This option should still be passed to the backend. > > Please coordinate with Lang and others who have already done a fair amount of work on FP_CONTRACT.I will, thanks.> > > (Optional) > I propose adding the below flags: > > -ffinite-math-only > Allow optimizations to assume that floating point arguments and results are > NaNs or +/-Inf. This may produce incorrect results, and so should be used with > care. > > This would set the 'I' and 'N' bits on all generated floating point instructions. > > -fno-signed-zeros > Allow optimizations to ignore the signedness of zero. This may produce > incorrect results, and so should be used with care. > > This would set the 'S' bit on all FP instructions. > > These are established flags in GCC. Do you know if there are any semantic differences between your proposed semantics and the semantics of these flags in GCC? If so, it would be good to either change to match them, or document the differences. >I don't know of any differences, but I'll have to look into GCC's behavior more.> Dan >Thanks for the feedback! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121030/84c4149d/attachment.html>
Michael Ilseman
2012-Oct-30 21:25 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
Here's a new version of the RFC, incorporating and addressing the feedback from Krzysztof, Eli, Duncan, and Dan. Revision 1 changes: * Removed Fusion flag from all sections * Clarified and changed descriptions of remaining flags: * Make 'N' and 'I' flags be explicitly concerning values of operands, and producing undef values if a NaN/Inf is provided. * 'S' is now only about distinguishing between +/-0. * LangRef changes updated to reflect flags changes * Updated Quesiton section given the now simpler set of flags * Optimizations changed to reflect 'N' and 'I' describing operands and not results * Be explicit on what LLVM's default behavior is (no signaling NaNs, etc) * Mention that this could be solved with metadata, and open the debate Introduction --- LLVM IR currently does not have any support for specifying fine-grained control over relaxing floating point requirements for the optimizer. The below is a proposal to extend floating point IR instructions to support a number of flags that a creator of IR can use to allow for greater optimizations when desired. Such changes are sometimes referred to as fast-math, but this proposal is about finer-grained specifications at a per-instruction level. What this doesn't address --- Default behavior is retained, and this proposal is only addressing relaxing restrictions. LLVM currently by default: - ignores signaling NaNs - assumes default rounding mode - assumes FENV_ACCESS is off Discussion on changing the default behavior of LLVM or allowing for more restrictive behavior is outside the scope of this proposal. This proposal does not address behavior of denormals, which is more of a backend concern. Specifying exact precision control or requirements is outside the scope of this proposal, and can probably be handled with the existing metadata implementation. This proposal covers changes to and optimizations over LLVM IR, and changes to codegen are outside the scope of this proposal. The flags described in the next section exist only at the IR level, and will not be propagated into codegen or the SelectionDAG. Flags --- no NaNs (N) - The optimizer is allowed to optimize under the assumption that the operands' values are not NaN. If one of the operands is NaN, the value of the result is undefined. no Infs (I) - The optimizer is allowed to optimize under the assumption that the operands' values are not +/-Inf. If one of the operands is +/-Inf, the value of the result is undefined. no signed zeros (S) - The optimizer is allowed to not distinguish between -0 and +0 for the purposes of optimizations. unsafe algebra (A) - The optimizer is allowed to perform algebraically equivalent transformations that may dramatically change results in floating point. (e.g. reassociation) Throughout I'll refer to these options in their short-hand, e.g. 'A'. Internally, these flags are to reside in SubclassData. =====Question: Not all combinations make sense (e.g. 'A' pretty much implies all other flags). Basically, I have the below lattice of sensible relations: A > S > N A > I > N Meaning that 'A' implies all the others, 'S' implies 'N', etc. It might be desirable to simplify this into just being a fast-math level. ===== Changes to LangRef --- Change the definitions of floating point arithmetic operations, below is how fadd will change: 'fadd' Instruction Syntax: <result> = fadd {flag}* <ty> <op1>, <op2> ; yields {ty}:result ... Semantics: ... flag can be one of the following optimizer hints to enable otherwise unsafe floating point optimizations: N: no NaNs - The optimizer is allowed to optimize under the assumption that the operands' values are not NaN. If one of the operands is NaN, the value of the result is undefined. I: no infs - The optimizer is allowed to optimize under the assumption that the operands' values are not +/-Inf. If one of the operands is +/-Inf, the value of the result is undefined. S: no signed zeros - The optimizer is allowed to not distinguish between -0 and +0 for the purposes of optimizations. A: unsafe algebra - The optimizer is allowed to perform algebraically equivalent transformations that may dramatically change results in floating point. (e.g. reassociation) Changes to optimizations --- Optimizations should be allowed to perform unsafe optimizations provided the instructions involved have the corresponding restrictions relaxed. When combining instructions, optimizations should do what makes sense to not remove restrictions that previously existed (commonly, a bitwise-AND of the flags). Below are some example optimizations that could be allowed with the given relaxations. N - no NaNs x == x ==> true S - no signed zeros x - 0 ==> x 0 - (x - y) ==> y - x NIS - no signed zeros AND no NaNs AND no Infs x * 0 ==> 0 NI - no infs AND no NaNs x - x ==> 0 A - unsafe-algebra Reassociation (x + y) + z ==> x + (y + z) (x + C1) + C2 ==> x + (C1 + C2) Redistribution (x * C) + x ==> x * (C+1) (x * C) + (x + x) ==> x * (C + 2) Reciprocal x / C ==> x * (1/C) These examples apply when the new constants are permitted, e.g. not denormal, and all the instructions involved have the needed flags. I propose to expand -instsimplify and -instcombine to perform these kinds of optimizations. -reassociate will be expanded to reassociate floating point operations when allowed. Similar to existing behavior regarding integer wrapping, -early-cse will not CSE FP operations with mismatched flags, while -gvn will (conservatively). This allows later optimizations to optimize the expressions independently between runs of -early-cse and -gvn. Changes to frontends --- Frontends are free to generate code with flags set as they desire. Frontends should continue to call llc with their desired options, as the flags apply only at the IR level and not at codegen or the SelectionDAGs. Below is a suggested change to clang's command-line options. -ffast-math Currently described as: Enable the *frontend*'s 'fast-math' mode. This has no effect on optimizations, but provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math flag I propose to change the description and behavior to: Enable 'fast-math' mode. This allows for optimizations that may produce incorrect and unsafe results, and thus should only be used with care. This also provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math flag I propose that this turn on all flags for all floating point instructions. If this flag doesn't already cause clang to run llc with -enable-unsafe-fp-math, then I propose that it does so as well. (Optional) I propose adding the below flags: -ffinite-math-only Allow optimizations to assume that floating point arguments and results are NaNs or +/-Inf. This may produce incorrect results, and so should be used with care. This would set the 'I' and 'N' bits on all generated floating point instructions. -fno-signed-zeros Allow optimizations to ignore the signedness of zero. This may produce incorrect results, and so should be used with care. This would set the 'S' bit on all FP instructions. Changes to llvm cli tools --- opt and llc already have the command line options -enable-unsafe-fp-math: Enable optimizations that may decrease FP precision -enable-no-infs-fp-math: Enable FP math optimizations that assume no +-Infs -enable-no-nans-fp-math: Enable FP math optimizations that assume no NaNs However, opt makes no use of them as they are currently only considered to be TargetOptions. llc will remain unchanged, as these options apply to DAG optimizations while this proposal deals with IR optimizations. (Optional) Have an opt pass that adds the desired flags to floating point instructions. Miscellaneous explanations in the form of Q&A --- Why not just have "fast-math" rather than individual flags? Having the individual flags gives the granularity to choose the levels of optimizations. For example, unsafe-algebra can lead to dramatically different results in corner cases, and may not be desired when a user just wants to ensure that x*0 folds to 0. Why have these flags attached to the instruction itself, rather than be a compiler mode? Being attached to the instruction itself allows much greater flexibility both for other optimizations and for the concerns of the source and target. For example, a frontend may desire that x - x be folded to 0. This would require no-NaNs for the subtract. However, the frontend may want to keep NaNs for its comparisons. Additionally, these properties can be set internally in the optimizer when the property has been proven. For example, if x has been found to be positive, then operations involving x and a constant can be marked to ignore signed zero. Finally, having these flags allows for greater safety and optimization when code of different flags are mixed. For example, a function author may set the unsafe-algebra flag knowing that such transformations will not meaningfully alter its result. If that function gets inlined into a caller, however, we don't want to always assume that the function's expressions can be reassociated with the caller's expressions. These properties allow us to preserve the optimizations of the inlined function without affecting the caller. Why not use metadata rather than flags? There is existing metadata to denote precisions, and this proposal is orthogonal to those efforts. While these properties could still be expressed as metadata, the proposed flags are analogous to nsw/nuw and are inherent properties of the IR instructions themselves that all transformations should respect. There is still some debate on what form, metadata vs flags, should be used.
Evan Cheng
2012-Oct-30 22:11 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Oct 30, 2012, at 1:46 AM, Duncan Sands <baldrick at free.fr> wrote:> Hi Michael, > >> Flags >> --- >> no NaNs (N) >> - ignore the existence of NaNs when convenient >> no Infs (I) >> - ignore the existence of Infs when convenient >> no signed zeros (S) >> - ignore the existence of negative zero when convenient > > while the above flags make perfect sense for me, the other two seem more > dubious: > >> allow fusion (F) >> - fuse FP operations when convenient, despite possible differences in rounding >> (e.g. form FMAs) >> unsafe algebra (A) >> - allow for algebraically equivalent transformations that may dramatically >> change results in floating point. (e.g. reassociation) > > They don't seem to be capturing a clear concept, they seem more like a grab-bag > of "everything else" (A) or "here's a random thing that is important today so > let's have a flag for it" (F). > > ... > >> Why not use metadata rather than flags? >> >> There is existing metadata to denote precisions, and this proposal is orthogonal >> to those efforts. These flags are analogous to nsw/nuw, and are inherent >> properties of the IR instructions themselves that all transformations should >> respect. > > If you drop any of these flags then things are still conservatively correct, > just like with metadata. In my opinion this could be implemented as metadata. > (I'm not saying it should be represented as metadata, I'm saying it could be). > > Disadvantages of metadata: > > - Bloats the IR (however my measurements suggest this is by < 2% for math heavy > code) > - More painful to work with (though helper classes can mitigate this) > - Less efficient to modify (but will flags be cleared that often)? > > Disadvantages of using subclass data bits: > > - Can only represent flags. Thus you might end up with a mix of flags and > metadata for floating point math, with the metadata holding the non-flag > info, and subclass data holding the flags. In which case it might be better > to just have it all be metadata in the first place > - Only a limited number of bits (but hey) > > Hopefully Chris will weigh in with his opinion.FYI. We've already had extensive discussion with Chris on this. He has made it clear this *must* be implemented with subclass data bits, not with metadata. Evan> > Ciao, Duncan. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Dan Gohman
2012-Oct-30 23:19 UTC
[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
On Tue, Oct 30, 2012 at 2:25 PM, Michael Ilseman <milseman at apple.com> wrote:> Here's a new version of the RFC, incorporating and addressing the feedback > from Krzysztof, Eli, Duncan, and Dan. > > > Revision 1 changes: > * Removed Fusion flag from all sections > * Clarified and changed descriptions of remaining flags: > * Make 'N' and 'I' flags be explicitly concerning values of operands, > and > producing undef values if a NaN/Inf is provided. > * 'S' is now only about distinguishing between +/-0. > * LangRef changes updated to reflect flags changes > * Updated Quesiton section given the now simpler set of flags > * Optimizations changed to reflect 'N' and 'I' describing operands and > not > results > * Be explicit on what LLVM's default behavior is (no signaling NaNs, etc) > * Mention that this could be solved with metadata, and open the debate > > Introduction > --- > > LLVM IR currently does not have any support for specifying fine-grained > control > over relaxing floating point requirements for the optimizer. The below is a > proposal to extend floating point IR instructions to support a number of > flags > that a creator of IR can use to allow for greater optimizations when > desired. Such changes are sometimes referred to as fast-math, but this > proposal > is about finer-grained specifications at a per-instruction level. > > > What this doesn't address > --- > > Default behavior is retained, and this proposal is only addressing relaxing > restrictions. LLVM currently by default: > - ignores signaling NaNs > - assumes default rounding mode > - assumes FENV_ACCESS is off > > Discussion on changing the default behavior of LLVM or allowing for more > restrictive behavior is outside the scope of this proposal. This proposal > does > not address behavior of denormals, which is more of a backend concern. > > Specifying exact precision control or requirements is outside the scope of > this > proposal, and can probably be handled with the existing metadata > implementation. > > This proposal covers changes to and optimizations over LLVM IR, and > changes to > codegen are outside the scope of this proposal. The flags described in the > next > section exist only at the IR level, and will not be propagated into > codegen or > the SelectionDAG. > > > Flags > --- > no NaNs (N) > - The optimizer is allowed to optimize under the assumption that the > operands' > values are not NaN. If one of the operands is NaN, the value of the > result > is undefined. > > no Infs (I) > - The optimizer is allowed to optimize under the assumption that the > operands' > values are not +/-Inf. If one of the operands is +/-Inf, the value of > the > result is undefined. > > no signed zeros (S) > - The optimizer is allowed to not distinguish between -0 and +0 for the > purposes of optimizations. >Ok, I checked LLVM CodeGen's existing -enable-no-infs-fp-math and -enable-no-nans-fp-math flags, and GCC's -ffinite-math-only flag, and they all say they apply to results as well as arguments. Do you have a good reason for varying from existing practice here? Phrasing these from the perspective of the optimizer is a little confusing here. Also, "The optimizer is allowed to [not care about X]" read literally means that the semantics for X are unconstrained, which would be Undefined Behavior. For I and N here you have a second sentence which says only the result is undefined, but for S you don't. Also, even when you do have the second sentence, it seems to contradict the first sentence.> unsafe algebra (A) > - The optimizer is allowed to perform algebraically equivalent > transformations > that may dramatically change results in floating point. (e.g. > reassociation) > > Throughout I'll refer to these options in their short-hand, e.g. 'A'. > Internally, these flags are to reside in SubclassData. > > > =====> Question: > > Not all combinations make sense (e.g. 'A' pretty much implies all other > flags). > > Basically, I have the below lattice of sensible relations: > A > S > N > A > I > N > Meaning that 'A' implies all the others, 'S' implies 'N', etc. >Why does S still imply N? Also, I'm curious if there's a specific motivation to have I imply N. LLVM CodeGen's existing options for these are independent.> It might be desirable to simplify this into just being a fast-math level. >What would make this desirable?> Changes to optimizations > --- > > Optimizations should be allowed to perform unsafe optimizations provided > the > instructions involved have the corresponding restrictions relaxed. When > combining instructions, optimizations should do what makes sense to not > remove > restrictions that previously existed (commonly, a bitwise-AND of the > flags). > > Below are some example optimizations that could be allowed with the given > relaxations. > > N - no NaNs > x == x ==> true > > S - no signed zeros > x - 0 ==> x > 0 - (x - y) ==> y - x > > NIS - no signed zeros AND no NaNs AND no Infs > x * 0 ==> 0 > > NI - no infs AND no NaNs > x - x ==> 0 > > A - unsafe-algebra > Reassociation > (x + y) + z ==> x + (y + z) > (x + C1) + C2 ==> x + (C1 + C2) > Redistribution > (x * C) + x ==> x * (C+1) > (x * C) + (x + x) ==> x * (C + 2) > Reciprocal > x / C ==> x * (1/C) > > These examples apply when the new constants are permitted, e.g. not > denormal, > and all the instructions involved have the needed flags. >I'm still confused by what you mean in this sentence. Why are you talking about constants, if you intend this optimizations to be valid for non-constants? And, it's not clear what you're trying to say about denormal values here. Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121030/a8e4e04f/attachment.html>
Possibly Parallel Threads
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level
- [LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level