Hello, at the recent EuroLLVM developer meeting in Bristol I held a BoF session on the topic "Towards implementing #pragma STDC FENV_ACCESS". I've also had a number of follow-on discussions both on-site in Bristol and online since. This post is intended as a summary of my current understanding set of requirements and implementation details covering the overall topic. I'm posting this here in the hope this can serve as a basis for the various more detailed discussions that are still ongoing (e.g. in various Phabricator proposals right now). Any comments are welcome! Semantics of #pragma STDC FENV_ACCESS ==================================== To provide a baseline for the implementation discussion, first an overview of the features required to handle the strict floating-point mode defined by the C and IEEE standard: 1. Floating-point rounding modes 2. Default floating-point exception handling 3. Trapping floating-point exception handling Each of these separate features imposes different constraints on the optimizations that LLVM may perform involving FP expressions: 1. Floating-point rounding modes Outside of FENV_ACCESS regions, all FP operations are supposed to be performed in the "default" rounding mode. But inside FENV_ACCESS regions, FP operations implicitly depend on a "current" rounding mode setting, which may be changed by certain C library calls (plus some platform-specific intrinsics). In addition, those calls may be performed within subroutines (as long as those are also within FENV_ACCESS), so *any* function call within a FENV_ACCESS must be considered as potentially changing the rounding mode. In effect, this means the compiler may not move or combine FP operations accross function call sites. 2. Default floating-point exception handling Inside FENV_ACCESS regions, every floating-point operation that causes an exception must be considered to set a "status flag" associated with this exception type. Those flags can be queried using C library calls (plus some platform-specific intrinsics), and there are other such calls to explicitly set or clear those flags as well. As with the rounding modes, those calls may be performed in subroutines as well, so any function call within a FENV_ACCESS region must be considered as potentially *using* and changing the floating-point exception status flags. The values of the status flags on entry to a FENV_ACCESS are to be considered undefined according to the C standard. Compiler optimizations are supposed to preserve the values of all exception status bits at any point where they can be (potentially) inspected by the program, i.e. at all call sites within FENV_ACCESS regions. This still allows a number of optimizations, e.g. to reorder FP operations or combine two identical operations within a region uninterrupted by calls. But other optimizations should be avoided, e.g. optimizing away an unused FP operation may result in an exception flag now being unset that would otherwise have been set. The same applies to floating-point constant folding. 3. Trapping floating-point exception handling Within a FENV_ACCESS region, library calls may be used to switch exception handling semantics to a "trapping" mode by setting corresponding mask bits. Any subsequent FP instruction that raises an exception with the associated mask bit set will cause a trap. Usually, this will be a hardware trap that is translated by the operating system into some form of software exception that can by handled by the applcation; on Linux systems this takes the form of a SIGFPE signal. As above, those mask bits can be set and reset via (operating- system specific) library calls and/or platform-specific intrinsics, all of which may also be done within subroutine calls. In effect, this requires the compiler to treat any floating-point operation within a FENV_ACCESS region as potentially trapping, which means the same restrictions apply as with e.g. memory accesses (cannot be speculated etc.) However, according to the C standard, the implementation is not required to preserve the *number* of different traps, so identical operations may still be combined (unless there is an intervening function call). The C standard requires all user code to explicitly switch back to non-trapping mode for all exceptions whenever leaving a FENV_ACCESS region (both by "falling off the end" of the region and by calling a subroutine defined outside of FENV_ACCESS). Implementation requirements on parts of the compiler =================================================== A. clang front end The front end needs to determine which instructions are part of FENV_ACCESS regions and which are not. This takes into account both the semantics of the #pragma as defined by the standard, and the implementation-defined default rules that apply to code outside of any #pragma. GCC currently has the following two related command-line options: -frounding-math: Do not assume default rounding mode -ftrapping-math: Assume FP operations may trap clang accepts but (basically) ignores those options. As a first step, it might make sense to have the FENV_ACCESS default behavior triggered by these options, even while the front end does not yet support the actual #pragma. The front end then needs to transmit the information about FENV_ACCESS regions to later passes. However, I believe that we do not actually have to implement "regions" as such at the IR level. Instead, it would be sufficient to track the follwing information: - For each FP operation, whether it is within a FENV_ACCESS region. - For each call site, whether it is within a FENV_ACCESS region. The former requires new IR support; the approach currently under investigation uses the experimental "constrained FP" intrinsics instead of traditional floating-point operations for this. The latter can be done simply by annotating those call sites with an attribute. In addition to that, the front-end itself needs to disable any early optimizations that do not preserve strict FP semantics, in particular it must not speculate FP operations if they may trap. (Currently, the front end transforms "? :" on floating- point types into a select IR statement; for trapping FP operations, an explicit branch must be used instead.) B. LLVM IR and LLVM common optimizations As mentioned in the previous section, we need some IR to annotate FP instructions and call sites within FENV_ACCESS regions. All common optimizations then need to respect the strict FP semantics associated with those regions. The current approach uses experimental intrinsics. This has the advantage that most optimizations never trigger since they don't even recognize those new intrinsics. Also, the intrinsics can be marked as having side-effects and/or being non-speculatable. The overall effect is that more optimizations are suppressed than would be strictly necessary. But this may still be a good first step, since the result is now safe but maybe not optimal -- which can be improved upon over time by teaching the specific semantics of those intrinsics to optimization passes. However, some open questions remain. If at some point we want to model the constrained FP semantics more precisely than just as "unmodeled side effects", this may have to be reflected at the IR level directly. For example, to model rounding mode behavior, at some point we might require explicit tracking of data dependencies on the rounding mode by representing the rounding mode as SSA values defined by function calls and used by FP intrinsics. Similarly, to track exception status flags, they might be modeled as SSA values set by FP intrinsics and used by function calls. (There is a possibly related question of how to optimally model the property of many math library routines that they may access the "errno" variable but no other memory ... It might also be possible to model e.g. exception status as a thread-local "memory" location that is modified by FP operations, just like errno.) Another currently unresolved issue is that at the moment nothing prevents *standard* floating-point operations from being moved *inside* FENV_ACCESS regions. This may also be invalid, since those operations now may cause unexpected traps etc. (More specifically, what is invalid is moving any standard FP operation across a *call site* within a FENV_ACCESS region.) Note that this is even an issue if we only support changing the default (and no actual #pragma) if mutiple object files using different default settings are being linked together using LTO. This last issue could in theory be solved by having all optimization passes respect the requirement that floating-point operations may not be moved across call sites marked with the strict FP attribute. But that does not appear to be straightforward since it would introduce a "new" type of dependeny that would have to be added throughout LLVM code. If this must be avoided, we'd have to find a way to explicity track dependencies at the IR level. In the extreme, this could end up equivalent to just always using the constrained intrinsics for everything ... C. Code generation In the back end, effects of strict FP mode have to passed through to lower-level representations including SelectionDAG and MI. Currently, the "unmodeled side effect" logic of the constrained intrinsics is modeled by putting them on the chain during SelectionDAG. (If we ever model semantics more precisely at the IR level, that would need to be reflected on SelectionDAG accordingly.) At the MI level, there is no representation at all. One option to fix this would be to model target-specific registers that implement the IEEE semantics. Most platforms have registers (or parts of registers) that hold: - the current rounding mode - the exception status flags - the exception masks (which enable traps) Marking FP instructions as using and/or defining these registers would enforce ordering requirements. It may be too strict in some cases (e.g. two instructions setting exception status flags may still be reordered). On the other hand, I believe if instructions may actually *trap*, we actually need the hasSideEffects flag even if register dependencies are modeled. If we do need hasSideEffects, there is a separate discussion on whether this can be implemented without each back end having to duplicate all FP instruction patterns (one with hasSideEffects and one without), e.g. by having a new feature that allows to describe the side-effect status using an MI operand. Next steps ========= I believe it is important to break up the full amount of work into incremental steps that provide some useful benefits on their own. At first, we should be able to get to a state where clang can be used to build programs that use some (maybe not all) strict FP features, where the generated code is always correct but may not always be optimal. To get there, I think we need at a minimum: - Implement clang support for the default flags, e.g. GCC's -frounding-math and -ftrapping-math, and generate always the constrained intrinsics. clang should also mark all call sites then (as mentioned above). - For now, add the requirement that LTO is not supported if this would cause mixing of strict and non-strict FP code. In the alternative, have the LTO pass automatically transform and floating-point operation into a constrained intrinsic if *any* (other) module already uses the latter. - At the IR level, complete the set of supported constrained FP intrinsics (there are still some missing, see e.g https://reviews.llvm.org/D43515). Also, it seems not all variants (e.g. for vector types) are supported correctly through codegen (see e.g. https://reviews.llvm.org/D46967). - Allow targets to correctly reflect constrained intrinsics semantics at the MI level and final machine code generation (see e.g. https://reviews.llvm.org/D45576). - Review all optimization and codegen passes to verify they fully respect strict FP semantics. Once this is done, we can improve on the solution by: - Supporting mixing strict and non-strict FP operations (would lift the LTO restriction). (Note: there seems to be still some "invention required" here, see above.) - Actually implementing the #pragma supporting different regions within a compilation unit (prereq: support for mixing strict and non-strict FP operations). - Add more optimization of constrained FP intrinsics in common optimizers and/or target back ends. Does this look reasonable? Please let me know if there's anything I overlooked, or you have any additional comments or questions. Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU/Linux compilers and toolchain IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/de6c1b04/attachment-0001.html>
Hi Ulrich, I am interested in knowing if the current proposals also take into account the FP_CONTRACT pragma and the ability to implement options that imply a specific value for the FLT_EVAL_METHOD macro. Additionally, I am not aware of the IR being able to represent the potentially deferred loss of precision that the C language semantics provide; in particular, applying such semantics to the existing IR would hit an issue that the limits of such deferment would need an agreed representation. As for the mixing of strict and non-strict modes, I would be interested in where LLVM is in its handling of non-SSA (pseudo-memory?) dependencies. I have a vague impression that it is very coarse-grained in that respect, but I admit to not being particularly informed in that space. If there is a good model for such dependencies, then I think it could be used to handle the strict/non-strict mixing. -- Hubert Tong, IBM PS A nitpick on wording: The idea of being inside or outside of FENV_ACCESS regions is instead be expressed in terms of the state of the FENV_ACCESS pragma within the C Standard. On Wed, May 23, 2018 at 10:48 AM, Ulrich Weigand via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello, > > at the recent EuroLLVM developer meeting in Bristol I held a BoF > session on the topic "Towards implementing #pragma STDC FENV_ACCESS". > I've also had a number of follow-on discussions both on-site in > Bristol and online since. This post is intended as a summary of > my current understanding set of requirements and implementation > details covering the overall topic. > > I'm posting this here in the hope this can serve as a basis for > the various more detailed discussions that are still ongoing > (e.g. in various Phabricator proposals right now). Any comments > are welcome! > > > Semantics of #pragma STDC FENV_ACCESS > ====================================> > To provide a baseline for the implementation discussion, first an > overview of the features required to handle the strict floating-point > mode defined by the C and IEEE standard: > > 1. Floating-point rounding modes > 2. Default floating-point exception handling > 3. Trapping floating-point exception handling > > Each of these separate features imposes different constraints on the > optimizations that LLVM may perform involving FP expressions: > > 1. Floating-point rounding modes > > Outside of FENV_ACCESS regions, all FP operations are supposed to be > performed in the "default" rounding mode. > > But inside FENV_ACCESS regions, FP operations implicitly depend on > a "current" rounding mode setting, which may be changed by certain > C library calls (plus some platform-specific intrinsics). In addition, > those calls may be performed within subroutines (as long as those are > also within FENV_ACCESS), so *any* function call within a FENV_ACCESS > must be considered as potentially changing the rounding mode. > > In effect, this means the compiler may not move or combine FP > operations accross function call sites. > > 2. Default floating-point exception handling > > Inside FENV_ACCESS regions, every floating-point operation that > causes an exception must be considered to set a "status flag" > associated with this exception type. Those flags can be queried > using C library calls (plus some platform-specific intrinsics), > and there are other such calls to explicitly set or clear those > flags as well. As with the rounding modes, those calls may be > performed in subroutines as well, so any function call within a > FENV_ACCESS region must be considered as potentially *using* and > changing the floating-point exception status flags. > > The values of the status flags on entry to a FENV_ACCESS are to > be considered undefined according to the C standard. > > Compiler optimizations are supposed to preserve the values of > all exception status bits at any point where they can be > (potentially) inspected by the program, i.e. at all call sites > within FENV_ACCESS regions. This still allows a number of > optimizations, e.g. to reorder FP operations or combine two > identical operations within a region uninterrupted by calls. > But other optimizations should be avoided, e.g. optimizing > away an unused FP operation may result in an exception flag > now being unset that would otherwise have been set. The same > applies to floating-point constant folding. > > 3. Trapping floating-point exception handling > > Within a FENV_ACCESS region, library calls may be used to switch > exception handling semantics to a "trapping" mode by setting > corresponding mask bits. Any subsequent FP instruction that > raises an exception with the associated mask bit set will cause > a trap. Usually, this will be a hardware trap that is translated > by the operating system into some form of software exception that > can by handled by the applcation; on Linux systems this takes the > form of a SIGFPE signal. > > As above, those mask bits can be set and reset via (operating- > system specific) library calls and/or platform-specific intrinsics, > all of which may also be done within subroutine calls. > > In effect, this requires the compiler to treat any floating-point > operation within a FENV_ACCESS region as potentially trapping, > which means the same restrictions apply as with e.g. memory accesses > (cannot be speculated etc.) However, according to the C standard, > the implementation is not required to preserve the *number* of > different traps, so identical operations may still be combined > (unless there is an intervening function call). > > The C standard requires all user code to explicitly switch back > to non-trapping mode for all exceptions whenever leaving a > FENV_ACCESS region (both by "falling off the end" of the region > and by calling a subroutine defined outside of FENV_ACCESS). > > > Implementation requirements on parts of the compiler > ===================================================> > A. clang front end > > The front end needs to determine which instructions are part of > FENV_ACCESS regions and which are not. This takes into account > both the semantics of the #pragma as defined by the standard, > and the implementation-defined default rules that apply to code > outside of any #pragma. GCC currently has the following two > related command-line options: > > -frounding-math: Do not assume default rounding mode > -ftrapping-math: Assume FP operations may trap > > clang accepts but (basically) ignores those options. As a first > step, it might make sense to have the FENV_ACCESS default > behavior triggered by these options, even while the front end > does not yet support the actual #pragma. > > The front end then needs to transmit the information about > FENV_ACCESS regions to later passes. However, I believe that > we do not actually have to implement "regions" as such at the > IR level. Instead, it would be sufficient to track the follwing > information: > > - For each FP operation, whether it is within a FENV_ACCESS region. > - For each call site, whether it is within a FENV_ACCESS region. > > The former requires new IR support; the approach currently under > investigation uses the experimental "constrained FP" intrinsics > instead of traditional floating-point operations for this. The > latter can be done simply by annotating those call sites with an > attribute. > > In addition to that, the front-end itself needs to disable any > early optimizations that do not preserve strict FP semantics, > in particular it must not speculate FP operations if they may > trap. (Currently, the front end transforms "? :" on floating- > point types into a select IR statement; for trapping FP > operations, an explicit branch must be used instead.) > > > B. LLVM IR and LLVM common optimizations > > As mentioned in the previous section, we need some IR to annotate > FP instructions and call sites within FENV_ACCESS regions. All > common optimizations then need to respect the strict FP semantics > associated with those regions. > > The current approach uses experimental intrinsics. This has the > advantage that most optimizations never trigger since they don't > even recognize those new intrinsics. Also, the intrinsics can > be marked as having side-effects and/or being non-speculatable. > > The overall effect is that more optimizations are suppressed > than would be strictly necessary. But this may still be a good > first step, since the result is now safe but maybe not optimal > -- which can be improved upon over time by teaching the specific > semantics of those intrinsics to optimization passes. > > However, some open questions remain. If at some point we want > to model the constrained FP semantics more precisely than just > as "unmodeled side effects", this may have to be reflected at > the IR level directly. For example, to model rounding mode > behavior, at some point we might require explicit tracking of > data dependencies on the rounding mode by representing the > rounding mode as SSA values defined by function calls and used > by FP intrinsics. Similarly, to track exception status flags, > they might be modeled as SSA values set by FP intrinsics and > used by function calls. > > (There is a possibly related question of how to optimally model > the property of many math library routines that they may access > the "errno" variable but no other memory ... It might also be > possible to model e.g. exception status as a thread-local "memory" > location that is modified by FP operations, just like errno.) > > Another currently unresolved issue is that at the moment nothing > prevents *standard* floating-point operations from being moved > *inside* FENV_ACCESS regions. This may also be invalid, since > those operations now may cause unexpected traps etc. (More > specifically, what is invalid is moving any standard FP operation > across a *call site* within a FENV_ACCESS region.) Note that > this is even an issue if we only support changing the default > (and no actual #pragma) if mutiple object files using different > default settings are being linked together using LTO. > > This last issue could in theory be solved by having all optimization > passes respect the requirement that floating-point operations may > not be moved across call sites marked with the strict FP attribute. > But that does not appear to be straightforward since it would > introduce a "new" type of dependeny that would have to be added > throughout LLVM code. If this must be avoided, we'd have to > find a way to explicity track dependencies at the IR level. In > the extreme, this could end up equivalent to just always using > the constrained intrinsics for everything ... > > > C. Code generation > > In the back end, effects of strict FP mode have to passed through > to lower-level representations including SelectionDAG and MI. > > Currently, the "unmodeled side effect" logic of the constrained > intrinsics is modeled by putting them on the chain during SelectionDAG. > (If we ever model semantics more precisely at the IR level, that > would need to be reflected on SelectionDAG accordingly.) > > At the MI level, there is no representation at all. One option to > fix this would be to model target-specific registers that implement > the IEEE semantics. Most platforms have registers (or parts of > registers) that hold: > - the current rounding mode > - the exception status flags > - the exception masks (which enable traps) > Marking FP instructions as using and/or defining these registers > would enforce ordering requirements. It may be too strict in some > cases (e.g. two instructions setting exception status flags may > still be reordered). On the other hand, I believe if instructions > may actually *trap*, we actually need the hasSideEffects flag even > if register dependencies are modeled. > > If we do need hasSideEffects, there is a separate discussion on > whether this can be implemented without each back end having to > duplicate all FP instruction patterns (one with hasSideEffects > and one without), e.g. by having a new feature that allows to > describe the side-effect status using an MI operand. > > > Next steps > =========> > I believe it is important to break up the full amount of work > into incremental steps that provide some useful benefits on their > own. At first, we should be able to get to a state where clang > can be used to build programs that use some (maybe not all) strict > FP features, where the generated code is always correct but may > not always be optimal. To get there, I think we need at a > minimum: > > - Implement clang support for the default flags, e.g. GCC's > -frounding-math and -ftrapping-math, and generate always > the constrained intrinsics. clang should also mark all > call sites then (as mentioned above). > > - For now, add the requirement that LTO is not supported if > this would cause mixing of strict and non-strict FP code. > In the alternative, have the LTO pass automatically transform > and floating-point operation into a constrained intrinsic > if *any* (other) module already uses the latter. > > - At the IR level, complete the set of supported constrained > FP intrinsics (there are still some missing, see e.g > https://reviews.llvm.org/D43515). > Also, it seems not all variants (e.g. for vector types) are > supported correctly through codegen (see e.g. > https://reviews.llvm.org/D46967). > > - Allow targets to correctly reflect constrained intrinsics > semantics at the MI level and final machine code generation > (see e.g. https://reviews.llvm.org/D45576). > > - Review all optimization and codegen passes to verify they > fully respect strict FP semantics. > > Once this is done, we can improve on the solution by: > > - Supporting mixing strict and non-strict FP operations > (would lift the LTO restriction). (Note: there seems > to be still some "invention required" here, see above.) > > - Actually implementing the #pragma supporting different > regions within a compilation unit (prereq: support for > mixing strict and non-strict FP operations). > > - Add more optimization of constrained FP intrinsics in > common optimizers and/or target back ends. > > Does this look reasonable? Please let me know if there's > anything I overlooked, or you have any additional comments > or questions. > > > > Mit freundlichen Gruessen / Best Regards > > Ulrich Weigand > > -- > Dr. Ulrich Weigand | Phone: +49-7031/16-3727 > STSM, GNU/Linux compilers and toolchain > IBM Deutschland Research & Development GmbH > Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk > Wittkopp > Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, > HRB 243294 > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/6df93c4e/attachment.html>
On 05/23/2018 11:06 AM, Hubert Tong via llvm-dev wrote:> Hi Ulrich, > > I am interested in knowing if the current proposals also take into > account the FP_CONTRACT pragmaWe should already do this (we turn relevant operations into the @llvm.fmuladd. when FP_CONTRACT is set to on during IR generation).> and the ability to implement options that imply a specific value for > the FLT_EVAL_METHOD macro.What do you mean by this? -Hal> > Additionally, I am not aware of the IR being able to represent the > potentially deferred loss of precision that the C language semantics > provide; in particular, applying such semantics to the existing IR > would hit an issue that the limits of such deferment would need an > agreed representation. > > As for the mixing of strict and non-strict modes, I would be > interested in where LLVM is in its handling of non-SSA > (pseudo-memory?) dependencies. I have a vague impression that it is > very coarse-grained in that respect, but I admit to not being > particularly informed in that space. If there is a good model for such > dependencies, then I think it could be used to handle the > strict/non-strict mixing. > > -- Hubert Tong, IBM > > PS A nitpick on wording: The idea of being inside or outside of > FENV_ACCESS regions is instead be expressed in terms of the state of > the FENV_ACCESS pragma within the C Standard. > > On Wed, May 23, 2018 at 10:48 AM, Ulrich Weigand via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Hello, > > at the recent EuroLLVM developer meeting in Bristol I held a BoF > session on the topic "Towards implementing #pragma STDC FENV_ACCESS". > I've also had a number of follow-on discussions both on-site in > Bristol and online since. This post is intended as a summary of > my current understanding set of requirements and implementation > details covering the overall topic. > > I'm posting this here in the hope this can serve as a basis for > the various more detailed discussions that are still ongoing > (e.g. in various Phabricator proposals right now). Any comments > are welcome! > > > Semantics of #pragma STDC FENV_ACCESS > ====================================> > To provide a baseline for the implementation discussion, first an > overview of the features required to handle the strict floating-point > mode defined by the C and IEEE standard: > > 1. Floating-point rounding modes > 2. Default floating-point exception handling > 3. Trapping floating-point exception handling > > Each of these separate features imposes different constraints on the > optimizations that LLVM may perform involving FP expressions: > > 1. Floating-point rounding modes > > Outside of FENV_ACCESS regions, all FP operations are supposed to be > performed in the "default" rounding mode. > > But inside FENV_ACCESS regions, FP operations implicitly depend on > a "current" rounding mode setting, which may be changed by certain > C library calls (plus some platform-specific intrinsics). In addition, > those calls may be performed within subroutines (as long as those are > also within FENV_ACCESS), so *any* function call within a FENV_ACCESS > must be considered as potentially changing the rounding mode. > > In effect, this means the compiler may not move or combine FP > operations accross function call sites. > > 2. Default floating-point exception handling > > Inside FENV_ACCESS regions, every floating-point operation that > causes an exception must be considered to set a "status flag" > associated with this exception type. Those flags can be queried > using C library calls (plus some platform-specific intrinsics), > and there are other such calls to explicitly set or clear those > flags as well. As with the rounding modes, those calls may be > performed in subroutines as well, so any function call within a > FENV_ACCESS region must be considered as potentially *using* and > changing the floating-point exception status flags. > > The values of the status flags on entry to a FENV_ACCESS are to > be considered undefined according to the C standard. > > Compiler optimizations are supposed to preserve the values of > all exception status bits at any point where they can be > (potentially) inspected by the program, i.e. at all call sites > within FENV_ACCESS regions. This still allows a number of > optimizations, e.g. to reorder FP operations or combine two > identical operations within a region uninterrupted by calls. > But other optimizations should be avoided, e.g. optimizing > away an unused FP operation may result in an exception flag > now being unset that would otherwise have been set. The same > applies to floating-point constant folding. > > 3. Trapping floating-point exception handling > > Within a FENV_ACCESS region, library calls may be used to switch > exception handling semantics to a "trapping" mode by setting > corresponding mask bits. Any subsequent FP instruction that > raises an exception with the associated mask bit set will cause > a trap. Usually, this will be a hardware trap that is translated > by the operating system into some form of software exception that > can by handled by the applcation; on Linux systems this takes the > form of a SIGFPE signal. > > As above, those mask bits can be set and reset via (operating- > system specific) library calls and/or platform-specific intrinsics, > all of which may also be done within subroutine calls. > > In effect, this requires the compiler to treat any floating-point > operation within a FENV_ACCESS region as potentially trapping, > which means the same restrictions apply as with e.g. memory accesses > (cannot be speculated etc.) However, according to the C standard, > the implementation is not required to preserve the *number* of > different traps, so identical operations may still be combined > (unless there is an intervening function call). > > The C standard requires all user code to explicitly switch back > to non-trapping mode for all exceptions whenever leaving a > FENV_ACCESS region (both by "falling off the end" of the region > and by calling a subroutine defined outside of FENV_ACCESS). > > > Implementation requirements on parts of the compiler > ===================================================> > A. clang front end > > The front end needs to determine which instructions are part of > FENV_ACCESS regions and which are not. This takes into account > both the semantics of the #pragma as defined by the standard, > and the implementation-defined default rules that apply to code > outside of any #pragma. GCC currently has the following two > related command-line options: > > -frounding-math: Do not assume default rounding mode > -ftrapping-math: Assume FP operations may trap > > clang accepts but (basically) ignores those options. As a first > step, it might make sense to have the FENV_ACCESS default > behavior triggered by these options, even while the front end > does not yet support the actual #pragma. > > The front end then needs to transmit the information about > FENV_ACCESS regions to later passes. However, I believe that > we do not actually have to implement "regions" as such at the > IR level. Instead, it would be sufficient to track the follwing > information: > > - For each FP operation, whether it is within a FENV_ACCESS region. > - For each call site, whether it is within a FENV_ACCESS region. > > The former requires new IR support; the approach currently under > investigation uses the experimental "constrained FP" intrinsics > instead of traditional floating-point operations for this. The > latter can be done simply by annotating those call sites with an > attribute. > > In addition to that, the front-end itself needs to disable any > early optimizations that do not preserve strict FP semantics, > in particular it must not speculate FP operations if they may > trap. (Currently, the front end transforms "? :" on floating- > point types into a select IR statement; for trapping FP > operations, an explicit branch must be used instead.) > > > B. LLVM IR and LLVM common optimizations > > As mentioned in the previous section, we need some IR to annotate > FP instructions and call sites within FENV_ACCESS regions. All > common optimizations then need to respect the strict FP semantics > associated with those regions. > > The current approach uses experimental intrinsics. This has the > advantage that most optimizations never trigger since they don't > even recognize those new intrinsics. Also, the intrinsics can > be marked as having side-effects and/or being non-speculatable. > > The overall effect is that more optimizations are suppressed > than would be strictly necessary. But this may still be a good > first step, since the result is now safe but maybe not optimal > -- which can be improved upon over time by teaching the specific > semantics of those intrinsics to optimization passes. > > However, some open questions remain. If at some point we want > to model the constrained FP semantics more precisely than just > as "unmodeled side effects", this may have to be reflected at > the IR level directly. For example, to model rounding mode > behavior, at some point we might require explicit tracking of > data dependencies on the rounding mode by representing the > rounding mode as SSA values defined by function calls and used > by FP intrinsics. Similarly, to track exception status flags, > they might be modeled as SSA values set by FP intrinsics and > used by function calls. > > (There is a possibly related question of how to optimally model > the property of many math library routines that they may access > the "errno" variable but no other memory ... It might also be > possible to model e.g. exception status as a thread-local "memory" > location that is modified by FP operations, just like errno.) > > Another currently unresolved issue is that at the moment nothing > prevents *standard* floating-point operations from being moved > *inside* FENV_ACCESS regions. This may also be invalid, since > those operations now may cause unexpected traps etc. (More > specifically, what is invalid is moving any standard FP operation > across a *call site* within a FENV_ACCESS region.) Note that > this is even an issue if we only support changing the default > (and no actual #pragma) if mutiple object files using different > default settings are being linked together using LTO. > > This last issue could in theory be solved by having all optimization > passes respect the requirement that floating-point operations may > not be moved across call sites marked with the strict FP attribute. > But that does not appear to be straightforward since it would > introduce a "new" type of dependeny that would have to be added > throughout LLVM code. If this must be avoided, we'd have to > find a way to explicity track dependencies at the IR level. In > the extreme, this could end up equivalent to just always using > the constrained intrinsics for everything ... > > > C. Code generation > > In the back end, effects of strict FP mode have to passed through > to lower-level representations including SelectionDAG and MI. > > Currently, the "unmodeled side effect" logic of the constrained > intrinsics is modeled by putting them on the chain during > SelectionDAG. > (If we ever model semantics more precisely at the IR level, that > would need to be reflected on SelectionDAG accordingly.) > > At the MI level, there is no representation at all. One option to > fix this would be to model target-specific registers that implement > the IEEE semantics. Most platforms have registers (or parts of > registers) that hold: > - the current rounding mode > - the exception status flags > - the exception masks (which enable traps) > Marking FP instructions as using and/or defining these registers > would enforce ordering requirements. It may be too strict in some > cases (e.g. two instructions setting exception status flags may > still be reordered). On the other hand, I believe if instructions > may actually *trap*, we actually need the hasSideEffects flag even > if register dependencies are modeled. > > If we do need hasSideEffects, there is a separate discussion on > whether this can be implemented without each back end having to > duplicate all FP instruction patterns (one with hasSideEffects > and one without), e.g. by having a new feature that allows to > describe the side-effect status using an MI operand. > > > Next steps > =========> > I believe it is important to break up the full amount of work > into incremental steps that provide some useful benefits on their > own. At first, we should be able to get to a state where clang > can be used to build programs that use some (maybe not all) strict > FP features, where the generated code is always correct but may > not always be optimal. To get there, I think we need at a > minimum: > > - Implement clang support for the default flags, e.g. GCC's > -frounding-math and -ftrapping-math, and generate always > the constrained intrinsics. clang should also mark all > call sites then (as mentioned above). > > - For now, add the requirement that LTO is not supported if > this would cause mixing of strict and non-strict FP code. > In the alternative, have the LTO pass automatically transform > and floating-point operation into a constrained intrinsic > if *any* (other) module already uses the latter. > > - At the IR level, complete the set of supported constrained > FP intrinsics (there are still some missing, see e.g > https://reviews.llvm.org/D43515 <https://reviews.llvm.org/D43515>). > Also, it seems not all variants (e.g. for vector types) are > supported correctly through codegen (see e.g. > https://reviews.llvm.org/D46967 <https://reviews.llvm.org/D46967>). > > - Allow targets to correctly reflect constrained intrinsics > semantics at the MI level and final machine code generation > (see e.g. https://reviews.llvm.org/D45576 > <https://reviews.llvm.org/D45576>). > > - Review all optimization and codegen passes to verify they > fully respect strict FP semantics. > > Once this is done, we can improve on the solution by: > > - Supporting mixing strict and non-strict FP operations > (would lift the LTO restriction). (Note: there seems > to be still some "invention required" here, see above.) > > - Actually implementing the #pragma supporting different > regions within a compilation unit (prereq: support for > mixing strict and non-strict FP operations). > > - Add more optimization of constrained FP intrinsics in > common optimizers and/or target back ends. > > Does this look reasonable? Please let me know if there's > anything I overlooked, or you have any additional comments > or questions. > > > > Mit freundlichen Gruessen / Best Regards > > Ulrich Weigand > > -- > Dr. Ulrich Weigand | Phone: +49-7031/16-3727 > STSM, GNU/Linux compilers and toolchain > IBM Deutschland Research & Development GmbH > Vorsitzende des Aufsichtsrats: Martina Koederitz | > Geschäftsführung: Dirk Wittkopp > Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht > Stuttgart, HRB 243294 > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/430e68b2/attachment-0001.html>
Hi Hubert,
I had not really thought about FP_CONTRACT. As Hal mentioned, LLVM uses
FP_CONTRACT only to allow use of floating-point multiply-and-add, and this
would remain valid even when FENV_ACCESS is on.
However, it seems that FP_CONTRACT on would also allow constant folding to
be performed even when FENV_ACCESS is on. This is what you're refering to,
right? This should certainly be considered as option when implementing
clang front-end support for FENV_ACCESS.
I have thought even less about using different values for FLT_EVAL_METHOD.
However, one concern we found with this in the past (on s390x GCC) is that
changing that value on an existing target may have ABI impact: glibc header
files choose the definition of float_t and double_t based on the current
value of FLT_EVAL_METHOD, and changing those types could result in an ABI
break for applications using them in interfaces.
Finding a good model for non-SSA dependencies is indeed the main problem
here. See e.g. the recent discussion here https://reviews.llvm.org/D45576
about possibly modeling FP status flags as "memory" at the MI level.
I
guess it might be possible to do the same at the IR level, but I'm less
familiar with that part of LLVM.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU/Linux compilers and toolchain
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
From: Hubert Tong <hubert.reinterpretcast at gmail.com>
To: Ulrich Weigand <Ulrich.Weigand at de.ibm.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>, Ahmed Bougacha
<abougacha at apple.com>, Kevin Neal <Kevin.Neal at
sas.com>
Date: 23.05.2018 18:09
Subject: Re: [llvm-dev] Update on strict FP status
Hi Ulrich,
I am interested in knowing if the current proposals also take into account
the FP_CONTRACT pragma and the ability to implement options that imply a
specific value for the FLT_EVAL_METHOD macro.
Additionally, I am not aware of the IR being able to represent the
potentially deferred loss of precision that the C language semantics
provide; in particular, applying such semantics to the existing IR would
hit an issue that the limits of such deferment would need an agreed
representation.
As for the mixing of strict and non-strict modes, I would be interested in
where LLVM is in its handling of non-SSA (pseudo-memory?) dependencies. I
have a vague impression that it is very coarse-grained in that respect, but
I admit to not being particularly informed in that space. If there is a
good model for such dependencies, then I think it could be used to handle
the strict/non-strict mixing.
-- Hubert Tong, IBM
PS A nitpick on wording: The idea of being inside or outside of FENV_ACCESS
regions is instead be expressed in terms of the state of the FENV_ACCESS
pragma within the C Standard.
On Wed, May 23, 2018 at 10:48 AM, Ulrich Weigand via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
Hello,
at the recent EuroLLVM developer meeting in Bristol I held a BoF
session on the topic "Towards implementing #pragma STDC
FENV_ACCESS".
I've also had a number of follow-on discussions both on-site in
Bristol and online since. This post is intended as a summary of
my current understanding set of requirements and implementation
details covering the overall topic.
I'm posting this here in the hope this can serve as a basis for
the various more detailed discussions that are still ongoing
(e.g. in various Phabricator proposals right now). Any comments
are welcome!
Semantics of #pragma STDC FENV_ACCESS
====================================
To provide a baseline for the implementation discussion, first an
overview of the features required to handle the strict floating-point
mode defined by the C and IEEE standard:
1. Floating-point rounding modes
2. Default floating-point exception handling
3. Trapping floating-point exception handling
Each of these separate features imposes different constraints on the
optimizations that LLVM may perform involving FP expressions:
1. Floating-point rounding modes
Outside of FENV_ACCESS regions, all FP operations are supposed to be
performed in the "default" rounding mode.
But inside FENV_ACCESS regions, FP operations implicitly depend on
a "current" rounding mode setting, which may be changed by certain
C library calls (plus some platform-specific intrinsics). In addition,
those calls may be performed within subroutines (as long as those are
also within FENV_ACCESS), so *any* function call within a FENV_ACCESS
must be considered as potentially changing the rounding mode.
In effect, this means the compiler may not move or combine FP
operations accross function call sites.
2. Default floating-point exception handling
Inside FENV_ACCESS regions, every floating-point operation that
causes an exception must be considered to set a "status flag"
associated with this exception type. Those flags can be queried
using C library calls (plus some platform-specific intrinsics),
and there are other such calls to explicitly set or clear those
flags as well. As with the rounding modes, those calls may be
performed in subroutines as well, so any function call within a
FENV_ACCESS region must be considered as potentially *using* and
changing the floating-point exception status flags.
The values of the status flags on entry to a FENV_ACCESS are to
be considered undefined according to the C standard.
Compiler optimizations are supposed to preserve the values of
all exception status bits at any point where they can be
(potentially) inspected by the program, i.e. at all call sites
within FENV_ACCESS regions. This still allows a number of
optimizations, e.g. to reorder FP operations or combine two
identical operations within a region uninterrupted by calls.
But other optimizations should be avoided, e.g. optimizing
away an unused FP operation may result in an exception flag
now being unset that would otherwise have been set. The same
applies to floating-point constant folding.
3. Trapping floating-point exception handling
Within a FENV_ACCESS region, library calls may be used to switch
exception handling semantics to a "trapping" mode by setting
corresponding mask bits. Any subsequent FP instruction that
raises an exception with the associated mask bit set will cause
a trap. Usually, this will be a hardware trap that is translated
by the operating system into some form of software exception that
can by handled by the applcation; on Linux systems this takes the
form of a SIGFPE signal.
As above, those mask bits can be set and reset via (operating-
system specific) library calls and/or platform-specific intrinsics,
all of which may also be done within subroutine calls.
In effect, this requires the compiler to treat any floating-point
operation within a FENV_ACCESS region as potentially trapping,
which means the same restrictions apply as with e.g. memory accesses
(cannot be speculated etc.) However, according to the C standard,
the implementation is not required to preserve the *number* of
different traps, so identical operations may still be combined
(unless there is an intervening function call).
The C standard requires all user code to explicitly switch back
to non-trapping mode for all exceptions whenever leaving a
FENV_ACCESS region (both by "falling off the end" of the region
and by calling a subroutine defined outside of FENV_ACCESS).
Implementation requirements on parts of the compiler
===================================================
A. clang front end
The front end needs to determine which instructions are part of
FENV_ACCESS regions and which are not. This takes into account
both the semantics of the #pragma as defined by the standard,
and the implementation-defined default rules that apply to code
outside of any #pragma. GCC currently has the following two
related command-line options:
-frounding-math: Do not assume default rounding mode
-ftrapping-math: Assume FP operations may trap
clang accepts but (basically) ignores those options. As a first
step, it might make sense to have the FENV_ACCESS default
behavior triggered by these options, even while the front end
does not yet support the actual #pragma.
The front end then needs to transmit the information about
FENV_ACCESS regions to later passes. However, I believe that
we do not actually have to implement "regions" as such at the
IR level. Instead, it would be sufficient to track the follwing
information:
- For each FP operation, whether it is within a FENV_ACCESS region.
- For each call site, whether it is within a FENV_ACCESS region.
The former requires new IR support; the approach currently under
investigation uses the experimental "constrained FP" intrinsics
instead of traditional floating-point operations for this. The
latter can be done simply by annotating those call sites with an
attribute.
In addition to that, the front-end itself needs to disable any
early optimizations that do not preserve strict FP semantics,
in particular it must not speculate FP operations if they may
trap. (Currently, the front end transforms "? :" on floating-
point types into a select IR statement; for trapping FP
operations, an explicit branch must be used instead.)
B. LLVM IR and LLVM common optimizations
As mentioned in the previous section, we need some IR to annotate
FP instructions and call sites within FENV_ACCESS regions. All
common optimizations then need to respect the strict FP semantics
associated with those regions.
The current approach uses experimental intrinsics. This has the
advantage that most optimizations never trigger since they don't
even recognize those new intrinsics. Also, the intrinsics can
be marked as having side-effects and/or being non-speculatable.
The overall effect is that more optimizations are suppressed
than would be strictly necessary. But this may still be a good
first step, since the result is now safe but maybe not optimal
-- which can be improved upon over time by teaching the specific
semantics of those intrinsics to optimization passes.
However, some open questions remain. If at some point we want
to model the constrained FP semantics more precisely than just
as "unmodeled side effects", this may have to be reflected at
the IR level directly. For example, to model rounding mode
behavior, at some point we might require explicit tracking of
data dependencies on the rounding mode by representing the
rounding mode as SSA values defined by function calls and used
by FP intrinsics. Similarly, to track exception status flags,
they might be modeled as SSA values set by FP intrinsics and
used by function calls.
(There is a possibly related question of how to optimally model
the property of many math library routines that they may access
the "errno" variable but no other memory ... It might also be
possible to model e.g. exception status as a thread-local "memory"
location that is modified by FP operations, just like errno.)
Another currently unresolved issue is that at the moment nothing
prevents *standard* floating-point operations from being moved
*inside* FENV_ACCESS regions. This may also be invalid, since
those operations now may cause unexpected traps etc. (More
specifically, what is invalid is moving any standard FP operation
across a *call site* within a FENV_ACCESS region.) Note that
this is even an issue if we only support changing the default
(and no actual #pragma) if mutiple object files using different
default settings are being linked together using LTO.
This last issue could in theory be solved by having all optimization
passes respect the requirement that floating-point operations may
not be moved across call sites marked with the strict FP attribute.
But that does not appear to be straightforward since it would
introduce a "new" type of dependeny that would have to be added
throughout LLVM code. If this must be avoided, we'd have to
find a way to explicity track dependencies at the IR level. In
the extreme, this could end up equivalent to just always using
the constrained intrinsics for everything ...
C. Code generation
In the back end, effects of strict FP mode have to passed through
to lower-level representations including SelectionDAG and MI.
Currently, the "unmodeled side effect" logic of the constrained
intrinsics is modeled by putting them on the chain during SelectionDAG.
(If we ever model semantics more precisely at the IR level, that
would need to be reflected on SelectionDAG accordingly.)
At the MI level, there is no representation at all. One option to
fix this would be to model target-specific registers that implement
the IEEE semantics. Most platforms have registers (or parts of
registers) that hold:
- the current rounding mode
- the exception status flags
- the exception masks (which enable traps)
Marking FP instructions as using and/or defining these registers
would enforce ordering requirements. It may be too strict in some
cases (e.g. two instructions setting exception status flags may
still be reordered). On the other hand, I believe if instructions
may actually *trap*, we actually need the hasSideEffects flag even
if register dependencies are modeled.
If we do need hasSideEffects, there is a separate discussion on
whether this can be implemented without each back end having to
duplicate all FP instruction patterns (one with hasSideEffects
and one without), e.g. by having a new feature that allows to
describe the side-effect status using an MI operand.
Next steps
=========
I believe it is important to break up the full amount of work
into incremental steps that provide some useful benefits on their
own. At first, we should be able to get to a state where clang
can be used to build programs that use some (maybe not all) strict
FP features, where the generated code is always correct but may
not always be optimal. To get there, I think we need at a
minimum:
- Implement clang support for the default flags, e.g. GCC's
-frounding-math and -ftrapping-math, and generate always
the constrained intrinsics. clang should also mark all
call sites then (as mentioned above).
- For now, add the requirement that LTO is not supported if
this would cause mixing of strict and non-strict FP code.
In the alternative, have the LTO pass automatically transform
and floating-point operation into a constrained intrinsic
if *any* (other) module already uses the latter.
- At the IR level, complete the set of supported constrained
FP intrinsics (there are still some missing, see e.g
https://reviews.llvm.org/D43515).
Also, it seems not all variants (e.g. for vector types) are
supported correctly through codegen (see e.g.
https://reviews.llvm.org/D46967).
- Allow targets to correctly reflect constrained intrinsics
semantics at the MI level and final machine code generation
(see e.g. https://reviews.llvm.org/D45576).
- Review all optimization and codegen passes to verify they
fully respect strict FP semantics.
Once this is done, we can improve on the solution by:
- Supporting mixing strict and non-strict FP operations
(would lift the LTO restriction). (Note: there seems
to be still some "invention required" here, see above.)
- Actually implementing the #pragma supporting different
regions within a compilation unit (prereq: support for
mixing strict and non-strict FP operations).
- Add more optimization of constrained FP intrinsics in
common optimizers and/or target back ends.
Does this look reasonable? Please let me know if there's
anything I overlooked, or you have any additional comments
or questions.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU/Linux compilers and toolchain
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180524/535b0843/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180524/535b0843/attachment.gif>