Mehdi Amini via llvm-dev
2016-Feb-03 23:04 UTC
[llvm-dev] [RFC] FP Environment and Rounding mode handling in LLVM
Hi everyone, Sergey (CC’ed) worked on a series of patches to add support for floating-point environment and floating-point rounding modes in LLVM. This started *in 2014* and the patches after multiple rounds of review in the last months (involving amongst other Steve Canon, Hal Finkel, David Majnemer, and myself) are getting very close (IMO) to be in a state where we can land them. This is the thread that started this development: “ [LLVMdev] More careful treatment of floating point exceptions" http://marc.info/?l=llvm-dev&m=141113983302113&w=2 And this is the thread where most of the discussion on the design occurred: "[PATCH] Flag to enable IEEE-754 friendly FP optimizations” http://marc.info/?l=llvm-commits&m=141235814915999&w=2 Since Chandler raised some concerns on IRC today, so I figured I should send a heads-up on this topic to allow any one to comment on the current plan. We plan on adding two new FP env flags to the existing FMF (fast-math flags). Without these flags set, the optimizer has to assume that the FP env can be observed, or the rounding mode can be changed. For clang, these flags would be set unless a command line option would require to preserve the FP env. Here is the list of patches: [FPEnv Core 01/14] Add flags and command-line switches for FPEnv: http://reviews.llvm.org/D14066 [FPEnv Core 02/14] Add FPEnv access flags to fast-math flags: http://reviews.llvm.org/D14067 [FPEnv Core 03/14] Make SelectionDAG aware of FPEnv flags: http://reviews.llvm.org/D14068 [FPEnv Core 04/14] Skip constant folding to preserve FPEnv: http://reviews.llvm.org/D14069 [FPEnv Core 05/14] Teach IR builder and folders about new flags: http://reviews.llvm.org/D14070 [FPEnv Core 06/14] Do not fold constants on reading in IR asm/bitcode: http://reviews.llvm.org/D14071 [FPEnv Core 07/14] Prevent undesired folding by InstSimplify: http://reviews.llvm.org/D14072 [FPEnv Core 08/14] Do not simplify expressions with FPEnv access: http://reviews.llvm.org/D14073 [FPEnv Core 09/14] Make Strict flag available for more clients: http://reviews.llvm.org/D14074 [FPEnv Core 10/14] Use Strict in IRBuilder: http://reviews.llvm.org/D14075 [FPEnv Core 11/14] Don't convert fpops to constexprs in SCCP: http://reviews.llvm.org/D14076 [FPEnv Core 13/14] Don't hoist FP-ops with side-effects in LICM: http://reviews.llvm.org/D14078 [FPEnv Core 14/14] Introduce F*_W_CHAIN instrs to prevent reordering: http://reviews.llvm.org/D14079 — Mehdi
Chandler Carruth via llvm-dev
2016-Feb-05 02:05 UTC
[llvm-dev] [RFC] FP Environment and Rounding mode handling in LLVM
First, thanks Mehdi for putting something on llvm-dev and getting wider awareness of this. I am actually really interested in finding a way for LLVM to support the interesting functionality we are missing from fenv-like interfaces. Things like rounding modes, exceptions, etc. However, I think the current design is going to be a really high burden for the entire optimizer and I think there is a simpler model that we might pursue instead. I'll start off with some underlying principles that I'm operating from: a) Most code in the world will be very happy with the default floating point environment, doesn't need to carefully model floating point exceptions, etc. Essentially, I think that LLVM's behavior today is probably right for most code. Now, the code which needs support for the other features of floating point isn't bad or unimportant! But it is relatively speaking rare, and so I think it is reasonable to optimize the *representation* model for the common case provided we don't lose support for functionality. a) When outside the default floating point environment's rules, there are few if any optimizations that we realistically expect from LLVM. Certainly, any changes to the LLVM optimizer which impact code outside the default needs to be done *much* more carefully to avoid introducing subtle bugs. OK, based on that, consider the following model: We provide intrinsics that mirror the instructions 'fadd', 'fsub', 'fmul', 'fdiv', and 'frem' (so 5 total). From here on out, I'll exclusively use 'fadd' as my examples. The intrinsics would look like: declare {f32, i1} @llvm.fadd.with.environment.f32(f32 %lhs, f32 %rhs, i8 %rounding_mode, i8 %exception_behavior) Then we define specific values to be used for the IEEE rounding modes. And we define values to control exception behavior. I'm not an expert on floating point exceptions in particular (my platforms don't use them) but I'm imagining three states "ignore", "return", and "trap". I've used a single 'i1', but I'm assuming it would need to be several i1s or an iN in order to model the set of FP exceptions. I'm using i1 here just to simplify the explanation, I think it generalizes and I'll let the experts suggest the exact formulation. If the default rounding mode is provided to these intrinsics and the "ignore" exception behavior is provided, they behave exactly as the existing instructions do, and instcombine should canonicalize to the existing instructions. The semantics of non-default rounding modes are to perform the operation with that rounding mode. If "return" is provided for the exception behavior, then the i1 component of the result is true if an FP exception occured and false otherwise. If "ignore" is provided then any FP exceptions are ignored and the i1 is always false. If "trap" is provided then the i1 is always false, but the call to the intrinsic might trap. We could either define a trap as precisely the same as a call to @llvm.trap(), or we could introduce an @llvm.fp.trap() and define it as a call to that. The frontend would then be responsible for lowering floating point arithmetic using these intrinsics. This may be somewhat challenging because in the frontend behavior is controlled dynamically in some languages. In those situations, we can either allow these intrinsics to accept non-constant arguments for %rounding_mode and %exception_behavior so that frontends can emit code that just dynamically computes them, or we could follow the same model that atomics use, and if the frontend cannot trivially compute a constant, it can emit a switch over the possible states with a specific intrinsic call in each case. I don't have strong opinions about which would be best, I think either could be made to work. If we go with constant arguments being required, we could use "metadata arguments" which aren't actually metadata but just encoded arguments for intrinsics. When emitting constants and trying to respect floating point environment settings, frontends will have to emit runtime calls instead of actual constants. But this seems actually good because that is what we'll need anyways -- we aren't able to with full generality emulate all the environment options if I understand things correctly (and let me know if I've misunderstood). The two really big reasons why I like this model much more than extending flags are: 1) This avoids implicit state. The implicit state of the floating point environment makes things like code motion extremely hard to reason about. I think we will just get it wrong too often to make this a good approach. By modeling all of this as actual SSA values I think there is a much better chance we'll get this stuff right. For example by or-ing all the i1s for floating point exceptions and testing the result to implement fetestexcept. Then the backend can spill the state when necessary and reload it when needed even if other floating point math is introduced. I admit that first class aggregate returns aren't a beautiful way to encapsulate this, but they are an *effective* way that we know how to work with in the LLVM IR. If we ever come up with a better multi-def model, we can always switch these and all the other intrinsics which need this to that model. 2) Every pass will conservatively correctly model the operations. This is most significant when modeling trapping on exceptions. We need every pass to realize that control flow might not proceed past such operations. We already have this logic for calls, and it seems a really nice fit for allowing most of the optimizer to be unaware of these constructs while respecting them and preserving behavior in the face of them. I suspect that there are things this model doesn't handle that I've not thought of (as this is outside the are of FP that I'm deeply familiar with), but I really think this model would be easier to reason about and would be much less invasive within the IR and optimizer. I wonder if folks think this could work and would be up for moving their efforts in this direction? -Chandler On Wed, Feb 3, 2016 at 3:04 PM Mehdi Amini <mehdi.amini at apple.com> wrote:> Hi everyone, > > Sergey (CC’ed) worked on a series of patches to add support for > floating-point environment and floating-point rounding modes in LLVM. > This started *in 2014* and the patches after multiple rounds of review in > the last months (involving amongst other Steve Canon, Hal Finkel, David > Majnemer, and myself) are getting very close (IMO) to be in a state where > we can land them. > > This is the thread that started this development: “ [LLVMdev] More careful > treatment of floating point exceptions" > http://marc.info/?l=llvm-dev&m=141113983302113&w=2 > And this is the thread where most of the discussion on the design > occurred: "[PATCH] Flag to enable IEEE-754 friendly FP optimizations” > http://marc.info/?l=llvm-commits&m=141235814915999&w=2 > > Since Chandler raised some concerns on IRC today, so I figured I should > send a heads-up on this topic to allow any one to comment on the current > plan. > > We plan on adding two new FP env flags to the existing FMF (fast-math > flags). Without these flags set, the optimizer has to assume that the FP > env can be observed, or the rounding mode can be changed. For clang, these > flags would be set unless a command line option would require to preserve > the FP env. > > Here is the list of patches: > > [FPEnv Core 01/14] Add flags and command-line switches for FPEnv: > http://reviews.llvm.org/D14066 > [FPEnv Core 02/14] Add FPEnv access flags to fast-math flags: > http://reviews.llvm.org/D14067 > [FPEnv Core 03/14] Make SelectionDAG aware of FPEnv flags: > http://reviews.llvm.org/D14068 > [FPEnv Core 04/14] Skip constant folding to preserve FPEnv: > http://reviews.llvm.org/D14069 > [FPEnv Core 05/14] Teach IR builder and folders about new flags: > http://reviews.llvm.org/D14070 > [FPEnv Core 06/14] Do not fold constants on reading in IR asm/bitcode: > http://reviews.llvm.org/D14071 > [FPEnv Core 07/14] Prevent undesired folding by InstSimplify: > http://reviews.llvm.org/D14072 > [FPEnv Core 08/14] Do not simplify expressions with FPEnv access: > http://reviews.llvm.org/D14073 > [FPEnv Core 09/14] Make Strict flag available for more clients: > http://reviews.llvm.org/D14074 > [FPEnv Core 10/14] Use Strict in IRBuilder: http://reviews.llvm.org/D14075 > [FPEnv Core 11/14] Don't convert fpops to constexprs in SCCP: > http://reviews.llvm.org/D14076 > [FPEnv Core 13/14] Don't hoist FP-ops with side-effects in LICM: > http://reviews.llvm.org/D14078 > [FPEnv Core 14/14] Introduce F*_W_CHAIN instrs to prevent reordering: > http://reviews.llvm.org/D14079 > > > — > Mehdi > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/7088ffbb/attachment.html>
Mueller-Roemer, Johannes Sebastian via llvm-dev
2016-Feb-05 07:38 UTC
[llvm-dev] [RFC] FP Environment and Rounding mode handling in LLVM
I strongly agree with this. A further reason why explicit modes are desirable are (pseudo)architectures such as PTX which encode the rounding mode within the instruction itself. It should also make *some* optimizations on x86 possible by reducing the number of environment mode changes. PS Sorry for sending this twice, I initially forgot to add the list. -- Johannes S. Mueller-Roemer, MSc Wiss. Mitarbeiter - Interactive Engineering Technologies (IET) Fraunhofer-Institut für Graphische Datenverarbeitung IGD Fraunhoferstr. 5 | 64283 Darmstadt | Germany Tel +49 6151 155-606 | Fax +49 6151 155-139 johannes.mueller-roemer at igd.fraunhofer.de | www.igd.fraunhofer.de From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Chandler Carruth via llvm-dev Sent: Friday, February 05, 2016 03:06 To: Mehdi Amini; llvm-dev Subject: Re: [llvm-dev] [RFC] FP Environment and Rounding mode handling in LLVM First, thanks Mehdi for putting something on llvm-dev and getting wider awareness of this. I am actually really interested in finding a way for LLVM to support the interesting functionality we are missing from fenv-like interfaces. Things like rounding modes, exceptions, etc. However, I think the current design is going to be a really high burden for the entire optimizer and I think there is a simpler model that we might pursue instead. I'll start off with some underlying principles that I'm operating from: a) Most code in the world will be very happy with the default floating point environment, doesn't need to carefully model floating point exceptions, etc. Essentially, I think that LLVM's behavior today is probably right for most code. Now, the code which needs support for the other features of floating point isn't bad or unimportant! But it is relatively speaking rare, and so I think it is reasonable to optimize the *representation* model for the common case provided we don't lose support for functionality. a) When outside the default floating point environment's rules, there are few if any optimizations that we realistically expect from LLVM. Certainly, any changes to the LLVM optimizer which impact code outside the default needs to be done *much* more carefully to avoid introducing subtle bugs. OK, based on that, consider the following model: We provide intrinsics that mirror the instructions 'fadd', 'fsub', 'fmul', 'fdiv', and 'frem' (so 5 total). From here on out, I'll exclusively use 'fadd' as my examples. The intrinsics would look like: declare {f32, i1} @llvm.fadd.with.environment.f32(f32 %lhs, f32 %rhs, i8 %rounding_mode, i8 %exception_behavior) Then we define specific values to be used for the IEEE rounding modes. And we define values to control exception behavior. I'm not an expert on floating point exceptions in particular (my platforms don't use them) but I'm imagining three states "ignore", "return", and "trap". I've used a single 'i1', but I'm assuming it would need to be several i1s or an iN in order to model the set of FP exceptions. I'm using i1 here just to simplify the explanation, I think it generalizes and I'll let the experts suggest the exact formulation. If the default rounding mode is provided to these intrinsics and the "ignore" exception behavior is provided, they behave exactly as the existing instructions do, and instcombine should canonicalize to the existing instructions. The semantics of non-default rounding modes are to perform the operation with that rounding mode. If "return" is provided for the exception behavior, then the i1 component of the result is true if an FP exception occured and false otherwise. If "ignore" is provided then any FP exceptions are ignored and the i1 is always false. If "trap" is provided then the i1 is always false, but the call to the intrinsic might trap. We could either define a trap as precisely the same as a call to @llvm.trap(), or we could introduce an @llvm.fp.trap() and define it as a call to that. The frontend would then be responsible for lowering floating point arithmetic using these intrinsics. This may be somewhat challenging because in the frontend behavior is controlled dynamically in some languages. In those situations, we can either allow these intrinsics to accept non-constant arguments for %rounding_mode and %exception_behavior so that frontends can emit code that just dynamically computes them, or we could follow the same model that atomics use, and if the frontend cannot trivially compute a constant, it can emit a switch over the possible states with a specific intrinsic call in each case. I don't have strong opinions about which would be best, I think either could be made to work. If we go with constant arguments being required, we could use "metadata arguments" which aren't actually metadata but just encoded arguments for intrinsics. When emitting constants and trying to respect floating point environment settings, frontends will have to emit runtime calls instead of actual constants. But this seems actually good because that is what we'll need anyways -- we aren't able to with full generality emulate all the environment options if I understand things correctly (and let me know if I've misunderstood). The two really big reasons why I like this model much more than extending flags are: 1) This avoids implicit state. The implicit state of the floating point environment makes things like code motion extremely hard to reason about. I think we will just get it wrong too often to make this a good approach. By modeling all of this as actual SSA values I think there is a much better chance we'll get this stuff right. For example by or-ing all the i1s for floating point exceptions and testing the result to implement fetestexcept. Then the backend can spill the state when necessary and reload it when needed even if other floating point math is introduced. I admit that first class aggregate returns aren't a beautiful way to encapsulate this, but they are an *effective* way that we know how to work with in the LLVM IR. If we ever come up with a better multi-def model, we can always switch these and all the other intrinsics which need this to that model. 2) Every pass will conservatively correctly model the operations. This is most significant when modeling trapping on exceptions. We need every pass to realize that control flow might not proceed past such operations. We already have this logic for calls, and it seems a really nice fit for allowing most of the optimizer to be unaware of these constructs while respecting them and preserving behavior in the face of them. I suspect that there are things this model doesn't handle that I've not thought of (as this is outside the are of FP that I'm deeply familiar with), but I really think this model would be easier to reason about and would be much less invasive within the IR and optimizer. I wonder if folks think this could work and would be up for moving their efforts in this direction? -Chandler On Wed, Feb 3, 2016 at 3:04 PM Mehdi Amini <mehdi.amini at apple.com<mailto:mehdi.amini at apple.com>> wrote: Hi everyone, Sergey (CC’ed) worked on a series of patches to add support for floating-point environment and floating-point rounding modes in LLVM. This started *in 2014* and the patches after multiple rounds of review in the last months (involving amongst other Steve Canon, Hal Finkel, David Majnemer, and myself) are getting very close (IMO) to be in a state where we can land them. This is the thread that started this development: “ [LLVMdev] More careful treatment of floating point exceptions" http://marc.info/?l=llvm-dev&m=141113983302113&w=2 And this is the thread where most of the discussion on the design occurred: "[PATCH] Flag to enable IEEE-754 friendly FP optimizations” http://marc.info/?l=llvm-commits&m=141235814915999&w=2 Since Chandler raised some concerns on IRC today, so I figured I should send a heads-up on this topic to allow any one to comment on the current plan. We plan on adding two new FP env flags to the existing FMF (fast-math flags). Without these flags set, the optimizer has to assume that the FP env can be observed, or the rounding mode can be changed. For clang, these flags would be set unless a command line option would require to preserve the FP env. Here is the list of patches: [FPEnv Core 01/14] Add flags and command-line switches for FPEnv: http://reviews.llvm.org/D14066 [FPEnv Core 02/14] Add FPEnv access flags to fast-math flags: http://reviews.llvm.org/D14067 [FPEnv Core 03/14] Make SelectionDAG aware of FPEnv flags: http://reviews.llvm.org/D14068 [FPEnv Core 04/14] Skip constant folding to preserve FPEnv: http://reviews.llvm.org/D14069 [FPEnv Core 05/14] Teach IR builder and folders about new flags: http://reviews.llvm.org/D14070 [FPEnv Core 06/14] Do not fold constants on reading in IR asm/bitcode: http://reviews.llvm.org/D14071 [FPEnv Core 07/14] Prevent undesired folding by InstSimplify: http://reviews.llvm.org/D14072 [FPEnv Core 08/14] Do not simplify expressions with FPEnv access: http://reviews.llvm.org/D14073 [FPEnv Core 09/14] Make Strict flag available for more clients: http://reviews.llvm.org/D14074 [FPEnv Core 10/14] Use Strict in IRBuilder: http://reviews.llvm.org/D14075 [FPEnv Core 11/14] Don't convert fpops to constexprs in SCCP: http://reviews.llvm.org/D14076 [FPEnv Core 13/14] Don't hoist FP-ops with side-effects in LICM: http://reviews.llvm.org/D14078 [FPEnv Core 14/14] Introduce F*_W_CHAIN instrs to prevent reordering: http://reviews.llvm.org/D14079 — Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/9070d471/attachment-0001.html>
Jonas Maebe via llvm-dev
2016-Feb-05 11:18 UTC
[llvm-dev] [RFC] FP Environment and Rounding mode handling in LLVM
Chandler Carruth via llvm-dev wrote on Fri, 05 Feb 2016:> If "return" is provided for the exception behavior, then the i1 component > of the result is true if an FP exception occured and false otherwise. If > "ignore" is provided then any FP exceptions are ignored and the i1 is > always false. If "trap" is provided then the i1 is always false, but the > call to the intrinsic might trap. We could either define a trap as > precisely the same as a call to @llvm.trap(), or we could introduce an > @llvm.fp.trap() and define it as a call to that.Our run time library installs signal handlers/exception filters to catch FPU exceptions. Can that be modeled in this way too?> The frontend would then be responsible for lowering floating point > arithmetic using these intrinsics. This may be somewhat challenging because > in the frontend behavior is controlled dynamically in some languages. In > those situations, we can either allow these intrinsics to accept > non-constant arguments for %rounding_mode and %exception_behavior so that > frontends can emit code that just dynamically computes them, or we could > follow the same model that atomics use, and if the frontend cannot > trivially compute a constant, it can emit a switch over the possible states > with a specific intrinsic call in each case. I don't have strong opinions > about which would be best, I think either could be made to work.In our run time library you have calls to dynamically change the rounding mode of the FPU, and to dynamically mask individual floating point exceptions. With our current (non-llvm) code generators, we simply emit regular FPU instructions and depending on those settings, they always do "the right thing". It's true that we cannot perform a number of optimisations because of this, but on the other hand there is no overhead at run time for any kind of checks. If I understood your proposal above correctly, you propose that for LLVM this would be implemented by our frontend emitting a bunch of checking code for each (sequence of) FPU instructions to determine the current FPU exception mask and rounding mode? That seems rather heavy, even if LLVM can optimise away a bunch of those calls if they're annotated correctly as not changing any state themselves.> When emitting constants and trying to respect floating point environment > settings, frontends will have to emit runtime calls instead of actual > constants. But this seems actually good because that is what we'll need > anyways -- we aren't able to with full generality emulate all the > environment options if I understand things correctly (and let me know if > I've misunderstood).You indeed can't, but I don't understand how calling these run time functions will help: 1) at compile time, you still can't do anything about it, unless you want to generate umpteen different versions of the FPU code that are then selected at run time depending on which results those functions returned (like with your "switch" proposal above, but I think that would completely kill performance in many cases -- atomics are used sparingly and are slow by definition; that's not true for floating point code) 2) at run time, you get the extra overhead of the extra function calls everywhere I wonder whether this won't result in enormous code bloat, and under which circumstances this would result in better performance than simply an option whereby the frontend instructs LLVM to 1) assume that all FPU instructions may trap and may use any rounding mode 2) emit regular FPU opcodes without the need for any extra calls etc. At least such an option would be seem desirable for our language. Having a similar option for telling LLVM to stop assuming that the results of null-pointer dereferences and integer divisions-by-zero are undefined (they are not, in our case; only if the hardware/OS does not support exceptions for them, we generate explicit checks in our non-LLVM code generators), would be even better. Jonas
Antoine Pitrou via llvm-dev
2016-Feb-05 12:47 UTC
[llvm-dev] [RFC] FP Environment and Rounding mode handling in LLVM
Hello, On Fri, 05 Feb 2016 02:05:38 +0000 Chandler Carruth via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > 1) This avoids implicit state. The implicit state of the floating point > environment makes things like code motion extremely hard to reason about. I > think we will just get it wrong too often to make this a good approach. By > modeling all of this as actual SSA values I think there is a much better > chance we'll get this stuff right. For example by or-ing all the i1s for > floating point exceptions and testing the result to implement fetestexcept.I'm not sure I understand everything here, but as a data point we would like to access the FP error status (get and set its individual bits). However, we don't need to change the rounding mode or the exception mode. Does your proposal mean we would need to use the mentioned intrinsics and get a performance hit, or am I missing something obvious? (in the context of Numba, a performance hit on FP calculations is certainly unacceptable for us) Regards Antoine.
Hal Finkel via llvm-dev
2016-Feb-05 22:10 UTC
[llvm-dev] [RFC] FP Environment and Rounding mode handling in LLVM
Hi Chandler, This scheme has significant advantages over what was being pursued, but one question (or two)... Under the proposed system, how would you represent the necessary dependency edges between the fp intrinsics and function calls? How is the state 'returned' to the caller? [I was thinking that our new operand bundles could help for the inputs, but the outputs? Plus what about the live-in state?] This is important because any external subroutine call could (potentially) change the rounding mode or any other part of the floating-point environment. Thanks again, Hal ----- Original Message -----> From: "Chandler Carruth" <chandlerc at gmail.com> > To: "Mehdi Amini" <mehdi.amini at apple.com>, "llvm-dev" <llvm-dev at lists.llvm.org> > Cc: "Steve (Numerics) Canon" <scanon at apple.com>, "Sergey Dmitrouk" <sdmitrouk at accesssoftek.com>, "David Majnemer" > <david.majnemer at gmail.com>, "Hal Finkel" <hfinkel at anl.gov> > Sent: Thursday, February 4, 2016 8:05:38 PM > Subject: Re: [RFC] FP Environment and Rounding mode handling in LLVM > > > First, thanks Mehdi for putting something on llvm-dev and getting > wider awareness of this. > > > I am actually really interested in finding a way for LLVM to support > the interesting functionality we are missing from fenv-like > interfaces. Things like rounding modes, exceptions, etc. However, I > think the current design is going to be a really high burden for the > entire optimizer and I think there is a simpler model that we might > pursue instead. > > > I'll start off with some underlying principles that I'm operating > from: > a) Most code in the world will be very happy with the default > floating point environment, doesn't need to carefully model floating > point exceptions, etc. Essentially, I think that LLVM's behavior > today is probably right for most code. Now, the code which needs > support for the other features of floating point isn't bad or > unimportant! But it is relatively speaking rare, and so I think it > is reasonable to optimize the *representation* model for the common > case provided we don't lose support for functionality. > > > a) When outside the default floating point environment's rules, there > are few if any optimizations that we realistically expect from LLVM. > Certainly, any changes to the LLVM optimizer which impact code > outside the default needs to be done *much* more carefully to avoid > introducing subtle bugs. > > > OK, based on that, consider the following model: > We provide intrinsics that mirror the instructions 'fadd', 'fsub', > 'fmul', 'fdiv', and 'frem' (so 5 total). From here on out, I'll > exclusively use 'fadd' as my examples. The intrinsics would look > like: > > declare {f32, i1} @llvm.fadd.with.environment.f32(f32 %lhs, f32 %rhs, > i8 %rounding_mode, i8 %exception_behavior) > > > Then we define specific values to be used for the IEEE rounding > modes. And we define values to control exception behavior. I'm not > an expert on floating point exceptions in particular (my platforms > don't use them) but I'm imagining three states "ignore", "return", > and "trap". I've used a single 'i1', but I'm assuming it would need > to be several i1s or an iN in order to model the set of FP > exceptions. I'm using i1 here just to simplify the explanation, I > think it generalizes and I'll let the experts suggest the exact > formulation. > > > If the default rounding mode is provided to these intrinsics and the > "ignore" exception behavior is provided, they behave exactly as the > existing instructions do, and instcombine should canonicalize to the > existing instructions. > > > The semantics of non-default rounding modes are to perform the > operation with that rounding mode. > > > If "return" is provided for the exception behavior, then the i1 > component of the result is true if an FP exception occured and false > otherwise. If "ignore" is provided then any FP exceptions are > ignored and the i1 is always false. If "trap" is provided then the > i1 is always false, but the call to the intrinsic might trap. We > could either define a trap as precisely the same as a call to > @llvm.trap(), or we could introduce an @llvm.fp.trap() and define it > as a call to that. > > > The frontend would then be responsible for lowering floating point > arithmetic using these intrinsics. This may be somewhat challenging > because in the frontend behavior is controlled dynamically in some > languages. In those situations, we can either allow these intrinsics > to accept non-constant arguments for %rounding_mode and > %exception_behavior so that frontends can emit code that just > dynamically computes them, or we could follow the same model that > atomics use, and if the frontend cannot trivially compute a > constant, it can emit a switch over the possible states with a > specific intrinsic call in each case. I don't have strong opinions > about which would be best, I think either could be made to work. > > > If we go with constant arguments being required, we could use > "metadata arguments" which aren't actually metadata but just encoded > arguments for intrinsics. > > > When emitting constants and trying to respect floating point > environment settings, frontends will have to emit runtime calls > instead of actual constants. But this seems actually good because > that is what we'll need anyways -- we aren't able to with full > generality emulate all the environment options if I understand > things correctly (and let me know if I've misunderstood). > > > > > The two really big reasons why I like this model much more than > extending flags are: > > > 1) This avoids implicit state. The implicit state of the floating > point environment makes things like code motion extremely hard to > reason about. I think we will just get it wrong too often to make > this a good approach. By modeling all of this as actual SSA values I > think there is a much better chance we'll get this stuff right. For > example by or-ing all the i1s for floating point exceptions and > testing the result to implement fetestexcept. Then the backend can > spill the state when necessary and reload it when needed even if > other floating point math is introduced. I admit that first class > aggregate returns aren't a beautiful way to encapsulate this, but > they are an *effective* way that we know how to work with in the > LLVM IR. If we ever come up with a better multi-def model, we can > always switch these and all the other intrinsics which need this to > that model. > > > 2) Every pass will conservatively correctly model the operations. > This is most significant when modeling trapping on exceptions. We > need every pass to realize that control flow might not proceed past > such operations. We already have this logic for calls, and it seems > a really nice fit for allowing most of the optimizer to be unaware > of these constructs while respecting them and preserving behavior in > the face of them. > > > > > I suspect that there are things this model doesn't handle that I've > not thought of (as this is outside the are of FP that I'm deeply > familiar with), but I really think this model would be easier to > reason about and would be much less invasive within the IR and > optimizer. I wonder if folks think this could work and would be up > for moving their efforts in this direction? > > > -Chandler > > > On Wed, Feb 3, 2016 at 3:04 PM Mehdi Amini < mehdi.amini at apple.com > > wrote: > > > Hi everyone, > > Sergey (CC’ed) worked on a series of patches to add support for > floating-point environment and floating-point rounding modes in > LLVM. > This started *in 2014* and the patches after multiple rounds of > review in the last months (involving amongst other Steve Canon, Hal > Finkel, David Majnemer, and myself) are getting very close (IMO) to > be in a state where we can land them. > > This is the thread that started this development: “ [LLVMdev] More > careful treatment of floating point exceptions" > http://marc.info/?l=llvm-dev&m=141113983302113&w=2 > And this is the thread where most of the discussion on the design > occurred: "[PATCH] Flag to enable IEEE-754 friendly FP > optimizations” > http://marc.info/?l=llvm-commits&m=141235814915999&w=2 > > Since Chandler raised some concerns on IRC today, so I figured I > should send a heads-up on this topic to allow any one to comment on > the current plan. > > We plan on adding two new FP env flags to the existing FMF (fast-math > flags). Without these flags set, the optimizer has to assume that > the FP env can be observed, or the rounding mode can be changed. For > clang, these flags would be set unless a command line option would > require to preserve the FP env. > > Here is the list of patches: > > [FPEnv Core 01/14] Add flags and command-line switches for FPEnv: > http://reviews.llvm.org/D14066 > [FPEnv Core 02/14] Add FPEnv access flags to fast-math flags: > http://reviews.llvm.org/D14067 > [FPEnv Core 03/14] Make SelectionDAG aware of FPEnv flags: > http://reviews.llvm.org/D14068 > [FPEnv Core 04/14] Skip constant folding to preserve FPEnv: > http://reviews.llvm.org/D14069 > [FPEnv Core 05/14] Teach IR builder and folders about new flags: > http://reviews.llvm.org/D14070 > [FPEnv Core 06/14] Do not fold constants on reading in IR > asm/bitcode: http://reviews.llvm.org/D14071 > [FPEnv Core 07/14] Prevent undesired folding by InstSimplify: > http://reviews.llvm.org/D14072 > [FPEnv Core 08/14] Do not simplify expressions with FPEnv access: > http://reviews.llvm.org/D14073 > [FPEnv Core 09/14] Make Strict flag available for more clients: > http://reviews.llvm.org/D14074 > [FPEnv Core 10/14] Use Strict in IRBuilder: > http://reviews.llvm.org/D14075 > [FPEnv Core 11/14] Don't convert fpops to constexprs in SCCP: > http://reviews.llvm.org/D14076 > [FPEnv Core 13/14] Don't hoist FP-ops with side-effects in LICM: > http://reviews.llvm.org/D14078 > [FPEnv Core 14/14] Introduce F*_W_CHAIN instrs to prevent reordering: > http://reviews.llvm.org/D14079 > > > — > Mehdi > >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Tom Stellard via llvm-dev
2016-Apr-11 16:08 UTC
[llvm-dev] [RFC] FP Environment and Rounding mode handling in LLVM
On Wed, Feb 03, 2016 at 03:04:55PM -0800, Mehdi Amini via llvm-dev wrote:> Hi everyone, > > Sergey (CC’ed) worked on a series of patches to add support for floating-point environment and floating-point rounding modes in LLVM. > This started *in 2014* and the patches after multiple rounds of review in the last months (involving amongst other Steve Canon, Hal Finkel, David Majnemer, and myself) are getting very close (IMO) to be in a state where we can land them. > > This is the thread that started this development: “ [LLVMdev] More careful treatment of floating point exceptions" http://marc.info/?l=llvm-dev&m=141113983302113&w=2 > And this is the thread where most of the discussion on the design occurred: "[PATCH] Flag to enable IEEE-754 friendly FP optimizations” http://marc.info/?l=llvm-commits&m=141235814915999&w=2 > > Since Chandler raised some concerns on IRC today, so I figured I should send a heads-up on this topic to allow any one to comment on the current plan. > > We plan on adding two new FP env flags to the existing FMF (fast-math flags). Without these flags set, the optimizer has to assume that the FP env can be observed, or the rounding mode can be changed. For clang, these flags would be set unless a command line option would require to preserve the FP env. >Hi, Is anyone still working on this? Based on the discussion in this thread: http://lists.llvm.org/pipermail/llvm-dev/2016-February/094869.html, it seems like there is a preference to start with an intrinsic based approach. Is this a correct interpretation of the discussion? Thanks, Tom> Here is the list of patches: > > [FPEnv Core 01/14] Add flags and command-line switches for FPEnv: http://reviews.llvm.org/D14066 > [FPEnv Core 02/14] Add FPEnv access flags to fast-math flags: http://reviews.llvm.org/D14067 > [FPEnv Core 03/14] Make SelectionDAG aware of FPEnv flags: http://reviews.llvm.org/D14068 > [FPEnv Core 04/14] Skip constant folding to preserve FPEnv: http://reviews.llvm.org/D14069 > [FPEnv Core 05/14] Teach IR builder and folders about new flags: http://reviews.llvm.org/D14070 > [FPEnv Core 06/14] Do not fold constants on reading in IR asm/bitcode: http://reviews.llvm.org/D14071 > [FPEnv Core 07/14] Prevent undesired folding by InstSimplify: http://reviews.llvm.org/D14072 > [FPEnv Core 08/14] Do not simplify expressions with FPEnv access: http://reviews.llvm.org/D14073 > [FPEnv Core 09/14] Make Strict flag available for more clients: http://reviews.llvm.org/D14074 > [FPEnv Core 10/14] Use Strict in IRBuilder: http://reviews.llvm.org/D14075 > [FPEnv Core 11/14] Don't convert fpops to constexprs in SCCP: http://reviews.llvm.org/D14076 > [FPEnv Core 13/14] Don't hoist FP-ops with side-effects in LICM: http://reviews.llvm.org/D14078 > [FPEnv Core 14/14] Introduce F*_W_CHAIN instrs to prevent reordering: http://reviews.llvm.org/D14079 > > > — > Mehdi > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Mehdi Amini via llvm-dev
2016-Apr-11 17:22 UTC
[llvm-dev] [RFC] FP Environment and Rounding mode handling in LLVM
> On Apr 11, 2016, at 9:08 AM, Tom Stellard <tom at stellard.net> wrote: > > On Wed, Feb 03, 2016 at 03:04:55PM -0800, Mehdi Amini via llvm-dev wrote: >> Hi everyone, >> >> Sergey (CC’ed) worked on a series of patches to add support for floating-point environment and floating-point rounding modes in LLVM. >> This started *in 2014* and the patches after multiple rounds of review in the last months (involving amongst other Steve Canon, Hal Finkel, David Majnemer, and myself) are getting very close (IMO) to be in a state where we can land them. >> >> This is the thread that started this development: “ [LLVMdev] More careful treatment of floating point exceptions" http://marc.info/?l=llvm-dev&m=141113983302113&w=2 >> And this is the thread where most of the discussion on the design occurred: "[PATCH] Flag to enable IEEE-754 friendly FP optimizations” http://marc.info/?l=llvm-commits&m=141235814915999&w=2 >> >> Since Chandler raised some concerns on IRC today, so I figured I should send a heads-up on this topic to allow any one to comment on the current plan. >> >> We plan on adding two new FP env flags to the existing FMF (fast-math flags). Without these flags set, the optimizer has to assume that the FP env can be observed, or the rounding mode can be changed. For clang, these flags would be set unless a command line option would require to preserve the FP env. >> > > Hi, > > Is anyone still working on this?Not that I am aware of.> Based on the discussion in this thread: > http://lists.llvm.org/pipermail/llvm-dev/2016-February/094869.html, > it seems like there is a preference to start with an intrinsic based > approach. Is this a correct interpretation of the discussion?Yes! -- Mehdi> > Thanks, > Tom > >> Here is the list of patches: >> >> [FPEnv Core 01/14] Add flags and command-line switches for FPEnv: http://reviews.llvm.org/D14066 >> [FPEnv Core 02/14] Add FPEnv access flags to fast-math flags: http://reviews.llvm.org/D14067 >> [FPEnv Core 03/14] Make SelectionDAG aware of FPEnv flags: http://reviews.llvm.org/D14068 >> [FPEnv Core 04/14] Skip constant folding to preserve FPEnv: http://reviews.llvm.org/D14069 >> [FPEnv Core 05/14] Teach IR builder and folders about new flags: http://reviews.llvm.org/D14070 >> [FPEnv Core 06/14] Do not fold constants on reading in IR asm/bitcode: http://reviews.llvm.org/D14071 >> [FPEnv Core 07/14] Prevent undesired folding by InstSimplify: http://reviews.llvm.org/D14072 >> [FPEnv Core 08/14] Do not simplify expressions with FPEnv access: http://reviews.llvm.org/D14073 >> [FPEnv Core 09/14] Make Strict flag available for more clients: http://reviews.llvm.org/D14074 >> [FPEnv Core 10/14] Use Strict in IRBuilder: http://reviews.llvm.org/D14075 >> [FPEnv Core 11/14] Don't convert fpops to constexprs in SCCP: http://reviews.llvm.org/D14076 >> [FPEnv Core 13/14] Don't hoist FP-ops with side-effects in LICM: http://reviews.llvm.org/D14078 >> [FPEnv Core 14/14] Introduce F*_W_CHAIN instrs to prevent reordering: http://reviews.llvm.org/D14079 >> >> >> — >> Mehdi >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev