On 05/23/2018 11:06 AM, Hubert Tong via llvm-dev wrote:> Hi Ulrich, > > I am interested in knowing if the current proposals also take into > account the FP_CONTRACT pragmaWe should already do this (we turn relevant operations into the @llvm.fmuladd. when FP_CONTRACT is set to on during IR generation).> and the ability to implement options that imply a specific value for > the FLT_EVAL_METHOD macro.What do you mean by this? -Hal> > Additionally, I am not aware of the IR being able to represent the > potentially deferred loss of precision that the C language semantics > provide; in particular, applying such semantics to the existing IR > would hit an issue that the limits of such deferment would need an > agreed representation. > > As for the mixing of strict and non-strict modes, I would be > interested in where LLVM is in its handling of non-SSA > (pseudo-memory?) dependencies. I have a vague impression that it is > very coarse-grained in that respect, but I admit to not being > particularly informed in that space. If there is a good model for such > dependencies, then I think it could be used to handle the > strict/non-strict mixing. > > -- Hubert Tong, IBM > > PS A nitpick on wording: The idea of being inside or outside of > FENV_ACCESS regions is instead be expressed in terms of the state of > the FENV_ACCESS pragma within the C Standard. > > On Wed, May 23, 2018 at 10:48 AM, Ulrich Weigand via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Hello, > > at the recent EuroLLVM developer meeting in Bristol I held a BoF > session on the topic "Towards implementing #pragma STDC FENV_ACCESS". > I've also had a number of follow-on discussions both on-site in > Bristol and online since. This post is intended as a summary of > my current understanding set of requirements and implementation > details covering the overall topic. > > I'm posting this here in the hope this can serve as a basis for > the various more detailed discussions that are still ongoing > (e.g. in various Phabricator proposals right now). Any comments > are welcome! > > > Semantics of #pragma STDC FENV_ACCESS > ====================================> > To provide a baseline for the implementation discussion, first an > overview of the features required to handle the strict floating-point > mode defined by the C and IEEE standard: > > 1. Floating-point rounding modes > 2. Default floating-point exception handling > 3. Trapping floating-point exception handling > > Each of these separate features imposes different constraints on the > optimizations that LLVM may perform involving FP expressions: > > 1. Floating-point rounding modes > > Outside of FENV_ACCESS regions, all FP operations are supposed to be > performed in the "default" rounding mode. > > But inside FENV_ACCESS regions, FP operations implicitly depend on > a "current" rounding mode setting, which may be changed by certain > C library calls (plus some platform-specific intrinsics). In addition, > those calls may be performed within subroutines (as long as those are > also within FENV_ACCESS), so *any* function call within a FENV_ACCESS > must be considered as potentially changing the rounding mode. > > In effect, this means the compiler may not move or combine FP > operations accross function call sites. > > 2. Default floating-point exception handling > > Inside FENV_ACCESS regions, every floating-point operation that > causes an exception must be considered to set a "status flag" > associated with this exception type. Those flags can be queried > using C library calls (plus some platform-specific intrinsics), > and there are other such calls to explicitly set or clear those > flags as well. As with the rounding modes, those calls may be > performed in subroutines as well, so any function call within a > FENV_ACCESS region must be considered as potentially *using* and > changing the floating-point exception status flags. > > The values of the status flags on entry to a FENV_ACCESS are to > be considered undefined according to the C standard. > > Compiler optimizations are supposed to preserve the values of > all exception status bits at any point where they can be > (potentially) inspected by the program, i.e. at all call sites > within FENV_ACCESS regions. This still allows a number of > optimizations, e.g. to reorder FP operations or combine two > identical operations within a region uninterrupted by calls. > But other optimizations should be avoided, e.g. optimizing > away an unused FP operation may result in an exception flag > now being unset that would otherwise have been set. The same > applies to floating-point constant folding. > > 3. Trapping floating-point exception handling > > Within a FENV_ACCESS region, library calls may be used to switch > exception handling semantics to a "trapping" mode by setting > corresponding mask bits. Any subsequent FP instruction that > raises an exception with the associated mask bit set will cause > a trap. Usually, this will be a hardware trap that is translated > by the operating system into some form of software exception that > can by handled by the applcation; on Linux systems this takes the > form of a SIGFPE signal. > > As above, those mask bits can be set and reset via (operating- > system specific) library calls and/or platform-specific intrinsics, > all of which may also be done within subroutine calls. > > In effect, this requires the compiler to treat any floating-point > operation within a FENV_ACCESS region as potentially trapping, > which means the same restrictions apply as with e.g. memory accesses > (cannot be speculated etc.) However, according to the C standard, > the implementation is not required to preserve the *number* of > different traps, so identical operations may still be combined > (unless there is an intervening function call). > > The C standard requires all user code to explicitly switch back > to non-trapping mode for all exceptions whenever leaving a > FENV_ACCESS region (both by "falling off the end" of the region > and by calling a subroutine defined outside of FENV_ACCESS). > > > Implementation requirements on parts of the compiler > ===================================================> > A. clang front end > > The front end needs to determine which instructions are part of > FENV_ACCESS regions and which are not. This takes into account > both the semantics of the #pragma as defined by the standard, > and the implementation-defined default rules that apply to code > outside of any #pragma. GCC currently has the following two > related command-line options: > > -frounding-math: Do not assume default rounding mode > -ftrapping-math: Assume FP operations may trap > > clang accepts but (basically) ignores those options. As a first > step, it might make sense to have the FENV_ACCESS default > behavior triggered by these options, even while the front end > does not yet support the actual #pragma. > > The front end then needs to transmit the information about > FENV_ACCESS regions to later passes. However, I believe that > we do not actually have to implement "regions" as such at the > IR level. Instead, it would be sufficient to track the follwing > information: > > - For each FP operation, whether it is within a FENV_ACCESS region. > - For each call site, whether it is within a FENV_ACCESS region. > > The former requires new IR support; the approach currently under > investigation uses the experimental "constrained FP" intrinsics > instead of traditional floating-point operations for this. The > latter can be done simply by annotating those call sites with an > attribute. > > In addition to that, the front-end itself needs to disable any > early optimizations that do not preserve strict FP semantics, > in particular it must not speculate FP operations if they may > trap. (Currently, the front end transforms "? :" on floating- > point types into a select IR statement; for trapping FP > operations, an explicit branch must be used instead.) > > > B. LLVM IR and LLVM common optimizations > > As mentioned in the previous section, we need some IR to annotate > FP instructions and call sites within FENV_ACCESS regions. All > common optimizations then need to respect the strict FP semantics > associated with those regions. > > The current approach uses experimental intrinsics. This has the > advantage that most optimizations never trigger since they don't > even recognize those new intrinsics. Also, the intrinsics can > be marked as having side-effects and/or being non-speculatable. > > The overall effect is that more optimizations are suppressed > than would be strictly necessary. But this may still be a good > first step, since the result is now safe but maybe not optimal > -- which can be improved upon over time by teaching the specific > semantics of those intrinsics to optimization passes. > > However, some open questions remain. If at some point we want > to model the constrained FP semantics more precisely than just > as "unmodeled side effects", this may have to be reflected at > the IR level directly. For example, to model rounding mode > behavior, at some point we might require explicit tracking of > data dependencies on the rounding mode by representing the > rounding mode as SSA values defined by function calls and used > by FP intrinsics. Similarly, to track exception status flags, > they might be modeled as SSA values set by FP intrinsics and > used by function calls. > > (There is a possibly related question of how to optimally model > the property of many math library routines that they may access > the "errno" variable but no other memory ... It might also be > possible to model e.g. exception status as a thread-local "memory" > location that is modified by FP operations, just like errno.) > > Another currently unresolved issue is that at the moment nothing > prevents *standard* floating-point operations from being moved > *inside* FENV_ACCESS regions. This may also be invalid, since > those operations now may cause unexpected traps etc. (More > specifically, what is invalid is moving any standard FP operation > across a *call site* within a FENV_ACCESS region.) Note that > this is even an issue if we only support changing the default > (and no actual #pragma) if mutiple object files using different > default settings are being linked together using LTO. > > This last issue could in theory be solved by having all optimization > passes respect the requirement that floating-point operations may > not be moved across call sites marked with the strict FP attribute. > But that does not appear to be straightforward since it would > introduce a "new" type of dependeny that would have to be added > throughout LLVM code. If this must be avoided, we'd have to > find a way to explicity track dependencies at the IR level. In > the extreme, this could end up equivalent to just always using > the constrained intrinsics for everything ... > > > C. Code generation > > In the back end, effects of strict FP mode have to passed through > to lower-level representations including SelectionDAG and MI. > > Currently, the "unmodeled side effect" logic of the constrained > intrinsics is modeled by putting them on the chain during > SelectionDAG. > (If we ever model semantics more precisely at the IR level, that > would need to be reflected on SelectionDAG accordingly.) > > At the MI level, there is no representation at all. One option to > fix this would be to model target-specific registers that implement > the IEEE semantics. Most platforms have registers (or parts of > registers) that hold: > - the current rounding mode > - the exception status flags > - the exception masks (which enable traps) > Marking FP instructions as using and/or defining these registers > would enforce ordering requirements. It may be too strict in some > cases (e.g. two instructions setting exception status flags may > still be reordered). On the other hand, I believe if instructions > may actually *trap*, we actually need the hasSideEffects flag even > if register dependencies are modeled. > > If we do need hasSideEffects, there is a separate discussion on > whether this can be implemented without each back end having to > duplicate all FP instruction patterns (one with hasSideEffects > and one without), e.g. by having a new feature that allows to > describe the side-effect status using an MI operand. > > > Next steps > =========> > I believe it is important to break up the full amount of work > into incremental steps that provide some useful benefits on their > own. At first, we should be able to get to a state where clang > can be used to build programs that use some (maybe not all) strict > FP features, where the generated code is always correct but may > not always be optimal. To get there, I think we need at a > minimum: > > - Implement clang support for the default flags, e.g. GCC's > -frounding-math and -ftrapping-math, and generate always > the constrained intrinsics. clang should also mark all > call sites then (as mentioned above). > > - For now, add the requirement that LTO is not supported if > this would cause mixing of strict and non-strict FP code. > In the alternative, have the LTO pass automatically transform > and floating-point operation into a constrained intrinsic > if *any* (other) module already uses the latter. > > - At the IR level, complete the set of supported constrained > FP intrinsics (there are still some missing, see e.g > https://reviews.llvm.org/D43515 <https://reviews.llvm.org/D43515>). > Also, it seems not all variants (e.g. for vector types) are > supported correctly through codegen (see e.g. > https://reviews.llvm.org/D46967 <https://reviews.llvm.org/D46967>). > > - Allow targets to correctly reflect constrained intrinsics > semantics at the MI level and final machine code generation > (see e.g. https://reviews.llvm.org/D45576 > <https://reviews.llvm.org/D45576>). > > - Review all optimization and codegen passes to verify they > fully respect strict FP semantics. > > Once this is done, we can improve on the solution by: > > - Supporting mixing strict and non-strict FP operations > (would lift the LTO restriction). (Note: there seems > to be still some "invention required" here, see above.) > > - Actually implementing the #pragma supporting different > regions within a compilation unit (prereq: support for > mixing strict and non-strict FP operations). > > - Add more optimization of constrained FP intrinsics in > common optimizers and/or target back ends. > > Does this look reasonable? Please let me know if there's > anything I overlooked, or you have any additional comments > or questions. > > > > Mit freundlichen Gruessen / Best Regards > > Ulrich Weigand > > -- > Dr. Ulrich Weigand | Phone: +49-7031/16-3727 > STSM, GNU/Linux compilers and toolchain > IBM Deutschland Research & Development GmbH > Vorsitzende des Aufsichtsrats: Martina Koederitz | > Geschäftsführung: Dirk Wittkopp > Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht > Stuttgart, HRB 243294 > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/430e68b2/attachment-0001.html>
On Wed, May 23, 2018 at 12:19 PM, Hal Finkel <hfinkel at anl.gov> wrote:> > On 05/23/2018 11:06 AM, Hubert Tong via llvm-dev wrote: > > Hi Ulrich, > > I am interested in knowing if the current proposals also take into account > the FP_CONTRACT pragma > > > We should already do this (we turn relevant operations into the > @llvm.fmuladd. when FP_CONTRACT is set to on during IR generation). >I am not sure we have the same interpretation of what the FP_CONTRACT pragma does. Subclause 6.5 paragraph 8 of C11 implies (for example) that even where the FENV_ACCESS pragma is "on", folding a constant subexpression with an exactly representable result on an implementation where FLT_EVAL_METHOD is 0 is within the range of acceptable implementation-defined behaviour despite intermediate overflow under non-contracted evaluation. Which is to say that the current proposal reads as what needs to be done when FP_CONTRACT is "off" and FENV_ACCESS is "on". The note from Ulrich implies that the requirements are imposed by the Standard, but the range of implementation defined behaviour where FP_CONTRACT is "on" where FENV_ACCESS is also "on" is possibly a discussion to be had.> and the ability to implement options that imply a specific value for the > FLT_EVAL_METHOD macro. > > > What do you mean by this? >I admit that modes where FLT_EVAL_METHOD, respectively, is 0 (no extra range and precision), 1 (float in double range and precision), and 2 (float and double in long double range and precision) are all straightforward for the IR producer to implement by fixing the types used in the IR emitted (implying the value FLT_EVAL_METHOD is not constant within a program). So, this is more about implementing meaningful cases of FLT_EVAL_METHOD being -1. My point below (in my previous note) is that allowing IR passes or the back-end to choose the range and precision in a manner conforming to Standard C (for a FLT_EVAL_METHOD of -1)--perhaps for speed where multiple sets of floating-point operations/registers are available with differing "preferred types"--appears to be a use case that the IR does not seem to support well. As for why a FLT_EVAL_METHOD of -1 is on-topic for this thread: The language semantics allow the case of the constant subexpression folding I mentioned above even when FP_CONTRACT is "off" and FENV_ACCESS is "on", because the evaluation format used for the evaluation of that subexpression can be said to have infinite range and precision.> -Hal > > > > Additionally, I am not aware of the IR being able to represent the > potentially deferred loss of precision that the C language semantics > provide; in particular, applying such semantics to the existing IR would > hit an issue that the limits of such deferment would need an agreed > representation. > > As for the mixing of strict and non-strict modes, I would be interested in > where LLVM is in its handling of non-SSA (pseudo-memory?) dependencies. I > have a vague impression that it is very coarse-grained in that respect, but > I admit to not being particularly informed in that space. If there is a > good model for such dependencies, then I think it could be used to handle > the strict/non-strict mixing. > > -- Hubert Tong, IBM > > PS A nitpick on wording: The idea of being inside or outside of > FENV_ACCESS regions is instead be expressed in terms of the state of the > FENV_ACCESS pragma within the C Standard. > > On Wed, May 23, 2018 at 10:48 AM, Ulrich Weigand via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hello, >> >> at the recent EuroLLVM developer meeting in Bristol I held a BoF >> session on the topic "Towards implementing #pragma STDC FENV_ACCESS". >> I've also had a number of follow-on discussions both on-site in >> Bristol and online since. This post is intended as a summary of >> my current understanding set of requirements and implementation >> details covering the overall topic. >> >> I'm posting this here in the hope this can serve as a basis for >> the various more detailed discussions that are still ongoing >> (e.g. in various Phabricator proposals right now). Any comments >> are welcome! >> >> >> Semantics of #pragma STDC FENV_ACCESS >> ====================================>> >> To provide a baseline for the implementation discussion, first an >> overview of the features required to handle the strict floating-point >> mode defined by the C and IEEE standard: >> >> 1. Floating-point rounding modes >> 2. Default floating-point exception handling >> 3. Trapping floating-point exception handling >> >> Each of these separate features imposes different constraints on the >> optimizations that LLVM may perform involving FP expressions: >> >> 1. Floating-point rounding modes >> >> Outside of FENV_ACCESS regions, all FP operations are supposed to be >> performed in the "default" rounding mode. >> >> But inside FENV_ACCESS regions, FP operations implicitly depend on >> a "current" rounding mode setting, which may be changed by certain >> C library calls (plus some platform-specific intrinsics). In addition, >> those calls may be performed within subroutines (as long as those are >> also within FENV_ACCESS), so *any* function call within a FENV_ACCESS >> must be considered as potentially changing the rounding mode. >> >> In effect, this means the compiler may not move or combine FP >> operations accross function call sites. >> >> 2. Default floating-point exception handling >> >> Inside FENV_ACCESS regions, every floating-point operation that >> causes an exception must be considered to set a "status flag" >> associated with this exception type. Those flags can be queried >> using C library calls (plus some platform-specific intrinsics), >> and there are other such calls to explicitly set or clear those >> flags as well. As with the rounding modes, those calls may be >> performed in subroutines as well, so any function call within a >> FENV_ACCESS region must be considered as potentially *using* and >> changing the floating-point exception status flags. >> >> The values of the status flags on entry to a FENV_ACCESS are to >> be considered undefined according to the C standard. >> >> Compiler optimizations are supposed to preserve the values of >> all exception status bits at any point where they can be >> (potentially) inspected by the program, i.e. at all call sites >> within FENV_ACCESS regions. This still allows a number of >> optimizations, e.g. to reorder FP operations or combine two >> identical operations within a region uninterrupted by calls. >> But other optimizations should be avoided, e.g. optimizing >> away an unused FP operation may result in an exception flag >> now being unset that would otherwise have been set. The same >> applies to floating-point constant folding. >> >> 3. Trapping floating-point exception handling >> >> Within a FENV_ACCESS region, library calls may be used to switch >> exception handling semantics to a "trapping" mode by setting >> corresponding mask bits. Any subsequent FP instruction that >> raises an exception with the associated mask bit set will cause >> a trap. Usually, this will be a hardware trap that is translated >> by the operating system into some form of software exception that >> can by handled by the applcation; on Linux systems this takes the >> form of a SIGFPE signal. >> >> As above, those mask bits can be set and reset via (operating- >> system specific) library calls and/or platform-specific intrinsics, >> all of which may also be done within subroutine calls. >> >> In effect, this requires the compiler to treat any floating-point >> operation within a FENV_ACCESS region as potentially trapping, >> which means the same restrictions apply as with e.g. memory accesses >> (cannot be speculated etc.) However, according to the C standard, >> the implementation is not required to preserve the *number* of >> different traps, so identical operations may still be combined >> (unless there is an intervening function call). >> >> The C standard requires all user code to explicitly switch back >> to non-trapping mode for all exceptions whenever leaving a >> FENV_ACCESS region (both by "falling off the end" of the region >> and by calling a subroutine defined outside of FENV_ACCESS). >> >> >> Implementation requirements on parts of the compiler >> ===================================================>> >> A. clang front end >> >> The front end needs to determine which instructions are part of >> FENV_ACCESS regions and which are not. This takes into account >> both the semantics of the #pragma as defined by the standard, >> and the implementation-defined default rules that apply to code >> outside of any #pragma. GCC currently has the following two >> related command-line options: >> >> -frounding-math: Do not assume default rounding mode >> -ftrapping-math: Assume FP operations may trap >> >> clang accepts but (basically) ignores those options. As a first >> step, it might make sense to have the FENV_ACCESS default >> behavior triggered by these options, even while the front end >> does not yet support the actual #pragma. >> >> The front end then needs to transmit the information about >> FENV_ACCESS regions to later passes. However, I believe that >> we do not actually have to implement "regions" as such at the >> IR level. Instead, it would be sufficient to track the follwing >> information: >> >> - For each FP operation, whether it is within a FENV_ACCESS region. >> - For each call site, whether it is within a FENV_ACCESS region. >> >> The former requires new IR support; the approach currently under >> investigation uses the experimental "constrained FP" intrinsics >> instead of traditional floating-point operations for this. The >> latter can be done simply by annotating those call sites with an >> attribute. >> >> In addition to that, the front-end itself needs to disable any >> early optimizations that do not preserve strict FP semantics, >> in particular it must not speculate FP operations if they may >> trap. (Currently, the front end transforms "? :" on floating- >> point types into a select IR statement; for trapping FP >> operations, an explicit branch must be used instead.) >> >> >> B. LLVM IR and LLVM common optimizations >> >> As mentioned in the previous section, we need some IR to annotate >> FP instructions and call sites within FENV_ACCESS regions. All >> common optimizations then need to respect the strict FP semantics >> associated with those regions. >> >> The current approach uses experimental intrinsics. This has the >> advantage that most optimizations never trigger since they don't >> even recognize those new intrinsics. Also, the intrinsics can >> be marked as having side-effects and/or being non-speculatable. >> >> The overall effect is that more optimizations are suppressed >> than would be strictly necessary. But this may still be a good >> first step, since the result is now safe but maybe not optimal >> -- which can be improved upon over time by teaching the specific >> semantics of those intrinsics to optimization passes. >> >> However, some open questions remain. If at some point we want >> to model the constrained FP semantics more precisely than just >> as "unmodeled side effects", this may have to be reflected at >> the IR level directly. For example, to model rounding mode >> behavior, at some point we might require explicit tracking of >> data dependencies on the rounding mode by representing the >> rounding mode as SSA values defined by function calls and used >> by FP intrinsics. Similarly, to track exception status flags, >> they might be modeled as SSA values set by FP intrinsics and >> used by function calls. >> >> (There is a possibly related question of how to optimally model >> the property of many math library routines that they may access >> the "errno" variable but no other memory ... It might also be >> possible to model e.g. exception status as a thread-local "memory" >> location that is modified by FP operations, just like errno.) >> >> Another currently unresolved issue is that at the moment nothing >> prevents *standard* floating-point operations from being moved >> *inside* FENV_ACCESS regions. This may also be invalid, since >> those operations now may cause unexpected traps etc. (More >> specifically, what is invalid is moving any standard FP operation >> across a *call site* within a FENV_ACCESS region.) Note that >> this is even an issue if we only support changing the default >> (and no actual #pragma) if mutiple object files using different >> default settings are being linked together using LTO. >> >> This last issue could in theory be solved by having all optimization >> passes respect the requirement that floating-point operations may >> not be moved across call sites marked with the strict FP attribute. >> But that does not appear to be straightforward since it would >> introduce a "new" type of dependeny that would have to be added >> throughout LLVM code. If this must be avoided, we'd have to >> find a way to explicity track dependencies at the IR level. In >> the extreme, this could end up equivalent to just always using >> the constrained intrinsics for everything ... >> >> >> C. Code generation >> >> In the back end, effects of strict FP mode have to passed through >> to lower-level representations including SelectionDAG and MI. >> >> Currently, the "unmodeled side effect" logic of the constrained >> intrinsics is modeled by putting them on the chain during SelectionDAG. >> (If we ever model semantics more precisely at the IR level, that >> would need to be reflected on SelectionDAG accordingly.) >> >> At the MI level, there is no representation at all. One option to >> fix this would be to model target-specific registers that implement >> the IEEE semantics. Most platforms have registers (or parts of >> registers) that hold: >> - the current rounding mode >> - the exception status flags >> - the exception masks (which enable traps) >> Marking FP instructions as using and/or defining these registers >> would enforce ordering requirements. It may be too strict in some >> cases (e.g. two instructions setting exception status flags may >> still be reordered). On the other hand, I believe if instructions >> may actually *trap*, we actually need the hasSideEffects flag even >> if register dependencies are modeled. >> >> If we do need hasSideEffects, there is a separate discussion on >> whether this can be implemented without each back end having to >> duplicate all FP instruction patterns (one with hasSideEffects >> and one without), e.g. by having a new feature that allows to >> describe the side-effect status using an MI operand. >> >> >> Next steps >> =========>> >> I believe it is important to break up the full amount of work >> into incremental steps that provide some useful benefits on their >> own. At first, we should be able to get to a state where clang >> can be used to build programs that use some (maybe not all) strict >> FP features, where the generated code is always correct but may >> not always be optimal. To get there, I think we need at a >> minimum: >> >> - Implement clang support for the default flags, e.g. GCC's >> -frounding-math and -ftrapping-math, and generate always >> the constrained intrinsics. clang should also mark all >> call sites then (as mentioned above). >> >> - For now, add the requirement that LTO is not supported if >> this would cause mixing of strict and non-strict FP code. >> In the alternative, have the LTO pass automatically transform >> and floating-point operation into a constrained intrinsic >> if *any* (other) module already uses the latter. >> >> - At the IR level, complete the set of supported constrained >> FP intrinsics (there are still some missing, see e.g >> https://reviews.llvm.org/D43515). >> Also, it seems not all variants (e.g. for vector types) are >> supported correctly through codegen (see e.g. >> https://reviews.llvm.org/D46967). >> >> - Allow targets to correctly reflect constrained intrinsics >> semantics at the MI level and final machine code generation >> (see e.g. https://reviews.llvm.org/D45576). >> >> - Review all optimization and codegen passes to verify they >> fully respect strict FP semantics. >> >> Once this is done, we can improve on the solution by: >> >> - Supporting mixing strict and non-strict FP operations >> (would lift the LTO restriction). (Note: there seems >> to be still some "invention required" here, see above.) >> >> - Actually implementing the #pragma supporting different >> regions within a compilation unit (prereq: support for >> mixing strict and non-strict FP operations). >> >> - Add more optimization of constrained FP intrinsics in >> common optimizers and/or target back ends. >> >> Does this look reasonable? Please let me know if there's >> anything I overlooked, or you have any additional comments >> or questions. >> >> >> >> Mit freundlichen Gruessen / Best Regards >> >> Ulrich Weigand >> >> -- >> Dr. Ulrich Weigand | Phone: +49-7031/16-3727 >> STSM, GNU/Linux compilers and toolchain >> IBM Deutschland Research & Development GmbH >> Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk >> Wittkopp >> Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht >> Stuttgart, HRB 243294 >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > > > _______________________________________________ > LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/bd5255dd/attachment.html>
On 05/23/2018 04:04 PM, Hubert Tong wrote:> On Wed, May 23, 2018 at 12:19 PM, Hal Finkel <hfinkel at anl.gov > <mailto:hfinkel at anl.gov>> wrote: > > > On 05/23/2018 11:06 AM, Hubert Tong via llvm-dev wrote: >> Hi Ulrich, >> >> I am interested in knowing if the current proposals also take >> into account the FP_CONTRACT pragma > > We should already do this (we turn relevant operations into the > @llvm.fmuladd. when FP_CONTRACT is set to on during IR generation). > > I am not sure we have the same interpretation of what the FP_CONTRACT > pragma does. Subclause 6.5 paragraph 8 of C11 implies (for example) > that even where the FENV_ACCESS pragma is "on", folding a constant > subexpression with an exactly representable result on an > implementation where FLT_EVAL_METHOD is 0 is within the range of > acceptable implementation-defined behaviour despite intermediate > overflow under non-contracted evaluation. Which is to say that the > current proposal reads as what needs to be done when FP_CONTRACT is > "off" and FENV_ACCESS is "on". The note from Ulrich implies that the > requirements are imposed by the Standard, but the range of > implementation defined behaviour where FP_CONTRACT is "on" where > FENV_ACCESS is also "on" is possibly a discussion to be had.Thanks for explaining. Yes, I agree, this is certainly worth discussing. Do you have thoughts on what we should do? I think it makes sense to fold where possible, as the user has requested the extra intermediate precision available from FMA formation. Also, to what extent can we change our minds later? For example, with C++/constexpr, etc. does this have ABI implications?> > >> and the ability to implement options that imply a specific value >> for the FLT_EVAL_METHOD macro. > > What do you mean by this? > > I admit that modes where FLT_EVAL_METHOD, respectively, is 0 (no extra > range and precision), 1 (float in double range and precision), and 2 > (float and double in long double range and precision) are all > straightforward for the IR producer to implement by fixing the types > used in the IR emitted (implying the value FLT_EVAL_METHOD is not > constant within a program). > > So, this is more about implementing meaningful cases of > FLT_EVAL_METHOD being -1. My point below (in my previous note) is that > allowing IR passes or the back-end to choose the range and precision > in a manner conforming to Standard C (for a FLT_EVAL_METHOD of > -1)--perhaps for speed where multiple sets of floating-point > operations/registers are available with differing "preferred > types"--appears to be a use case that the IR does not seem to support > well.Yes. In the LangRef we do have fpmath metadata (http://llvm.org/docs/LangRef.html#fpmath-metadata), which might be useful in this space, but I don't think we actually use it for anything.> As for why a FLT_EVAL_METHOD of -1 is on-topic for this thread: The > language semantics allow the case of the constant subexpression > folding I mentioned above even when FP_CONTRACT is "off" and > FENV_ACCESS is "on", because the evaluation format used for the > evaluation of that subexpression can be said to have infinite range > and precision.An, interesting. FLT_EVAL_METHOD is a constant chosen (globally) by the implementation, correct? Do you know of platforms that set FLT_EVAL_METHOD to -1? -Hal> > > -Hal > > >> >> Additionally, I am not aware of the IR being able to represent >> the potentially deferred loss of precision that the C language >> semantics provide; in particular, applying such semantics to the >> existing IR would hit an issue that the limits of such deferment >> would need an agreed representation. >> >> As for the mixing of strict and non-strict modes, I would be >> interested in where LLVM is in its handling of non-SSA >> (pseudo-memory?) dependencies. I have a vague impression that it >> is very coarse-grained in that respect, but I admit to not being >> particularly informed in that space. If there is a good model for >> such dependencies, then I think it could be used to handle the >> strict/non-strict mixing. >> >> -- Hubert Tong, IBM >> >> PS A nitpick on wording: The idea of being inside or outside of >> FENV_ACCESS regions is instead be expressed in terms of the state >> of the FENV_ACCESS pragma within the C Standard. >> >> On Wed, May 23, 2018 at 10:48 AM, Ulrich Weigand via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hello, >> >> at the recent EuroLLVM developer meeting in Bristol I held a BoF >> session on the topic "Towards implementing #pragma STDC >> FENV_ACCESS". >> I've also had a number of follow-on discussions both on-site in >> Bristol and online since. This post is intended as a summary of >> my current understanding set of requirements and implementation >> details covering the overall topic. >> >> I'm posting this here in the hope this can serve as a basis for >> the various more detailed discussions that are still ongoing >> (e.g. in various Phabricator proposals right now). Any comments >> are welcome! >> >> >> Semantics of #pragma STDC FENV_ACCESS >> ====================================>> >> To provide a baseline for the implementation discussion, first an >> overview of the features required to handle the strict >> floating-point >> mode defined by the C and IEEE standard: >> >> 1. Floating-point rounding modes >> 2. Default floating-point exception handling >> 3. Trapping floating-point exception handling >> >> Each of these separate features imposes different constraints >> on the >> optimizations that LLVM may perform involving FP expressions: >> >> 1. Floating-point rounding modes >> >> Outside of FENV_ACCESS regions, all FP operations are >> supposed to be >> performed in the "default" rounding mode. >> >> But inside FENV_ACCESS regions, FP operations implicitly >> depend on >> a "current" rounding mode setting, which may be changed by >> certain >> C library calls (plus some platform-specific intrinsics). In >> addition, >> those calls may be performed within subroutines (as long as >> those are >> also within FENV_ACCESS), so *any* function call within a >> FENV_ACCESS >> must be considered as potentially changing the rounding mode. >> >> In effect, this means the compiler may not move or combine FP >> operations accross function call sites. >> >> 2. Default floating-point exception handling >> >> Inside FENV_ACCESS regions, every floating-point operation that >> causes an exception must be considered to set a "status flag" >> associated with this exception type. Those flags can be queried >> using C library calls (plus some platform-specific intrinsics), >> and there are other such calls to explicitly set or clear those >> flags as well. As with the rounding modes, those calls may be >> performed in subroutines as well, so any function call within a >> FENV_ACCESS region must be considered as potentially *using* and >> changing the floating-point exception status flags. >> >> The values of the status flags on entry to a FENV_ACCESS are to >> be considered undefined according to the C standard. >> >> Compiler optimizations are supposed to preserve the values of >> all exception status bits at any point where they can be >> (potentially) inspected by the program, i.e. at all call sites >> within FENV_ACCESS regions. This still allows a number of >> optimizations, e.g. to reorder FP operations or combine two >> identical operations within a region uninterrupted by calls. >> But other optimizations should be avoided, e.g. optimizing >> away an unused FP operation may result in an exception flag >> now being unset that would otherwise have been set. The same >> applies to floating-point constant folding. >> >> 3. Trapping floating-point exception handling >> >> Within a FENV_ACCESS region, library calls may be used to switch >> exception handling semantics to a "trapping" mode by setting >> corresponding mask bits. Any subsequent FP instruction that >> raises an exception with the associated mask bit set will cause >> a trap. Usually, this will be a hardware trap that is translated >> by the operating system into some form of software exception that >> can by handled by the applcation; on Linux systems this takes the >> form of a SIGFPE signal. >> >> As above, those mask bits can be set and reset via (operating- >> system specific) library calls and/or platform-specific >> intrinsics, >> all of which may also be done within subroutine calls. >> >> In effect, this requires the compiler to treat any floating-point >> operation within a FENV_ACCESS region as potentially trapping, >> which means the same restrictions apply as with e.g. memory >> accesses >> (cannot be speculated etc.) However, according to the C standard, >> the implementation is not required to preserve the *number* of >> different traps, so identical operations may still be combined >> (unless there is an intervening function call). >> >> The C standard requires all user code to explicitly switch back >> to non-trapping mode for all exceptions whenever leaving a >> FENV_ACCESS region (both by "falling off the end" of the region >> and by calling a subroutine defined outside of FENV_ACCESS). >> >> >> Implementation requirements on parts of the compiler >> ===================================================>> >> A. clang front end >> >> The front end needs to determine which instructions are part of >> FENV_ACCESS regions and which are not. This takes into account >> both the semantics of the #pragma as defined by the standard, >> and the implementation-defined default rules that apply to code >> outside of any #pragma. GCC currently has the following two >> related command-line options: >> >> -frounding-math: Do not assume default rounding mode >> -ftrapping-math: Assume FP operations may trap >> >> clang accepts but (basically) ignores those options. As a first >> step, it might make sense to have the FENV_ACCESS default >> behavior triggered by these options, even while the front end >> does not yet support the actual #pragma. >> >> The front end then needs to transmit the information about >> FENV_ACCESS regions to later passes. However, I believe that >> we do not actually have to implement "regions" as such at the >> IR level. Instead, it would be sufficient to track the follwing >> information: >> >> - For each FP operation, whether it is within a FENV_ACCESS >> region. >> - For each call site, whether it is within a FENV_ACCESS region. >> >> The former requires new IR support; the approach currently under >> investigation uses the experimental "constrained FP" intrinsics >> instead of traditional floating-point operations for this. The >> latter can be done simply by annotating those call sites with an >> attribute. >> >> In addition to that, the front-end itself needs to disable any >> early optimizations that do not preserve strict FP semantics, >> in particular it must not speculate FP operations if they may >> trap. (Currently, the front end transforms "? :" on floating- >> point types into a select IR statement; for trapping FP >> operations, an explicit branch must be used instead.) >> >> >> B. LLVM IR and LLVM common optimizations >> >> As mentioned in the previous section, we need some IR to annotate >> FP instructions and call sites within FENV_ACCESS regions. All >> common optimizations then need to respect the strict FP semantics >> associated with those regions. >> >> The current approach uses experimental intrinsics. This has the >> advantage that most optimizations never trigger since they don't >> even recognize those new intrinsics. Also, the intrinsics can >> be marked as having side-effects and/or being non-speculatable. >> >> The overall effect is that more optimizations are suppressed >> than would be strictly necessary. But this may still be a good >> first step, since the result is now safe but maybe not optimal >> -- which can be improved upon over time by teaching the specific >> semantics of those intrinsics to optimization passes. >> >> However, some open questions remain. If at some point we want >> to model the constrained FP semantics more precisely than just >> as "unmodeled side effects", this may have to be reflected at >> the IR level directly. For example, to model rounding mode >> behavior, at some point we might require explicit tracking of >> data dependencies on the rounding mode by representing the >> rounding mode as SSA values defined by function calls and used >> by FP intrinsics. Similarly, to track exception status flags, >> they might be modeled as SSA values set by FP intrinsics and >> used by function calls. >> >> (There is a possibly related question of how to optimally model >> the property of many math library routines that they may access >> the "errno" variable but no other memory ... It might also be >> possible to model e.g. exception status as a thread-local >> "memory" >> location that is modified by FP operations, just like errno.) >> >> Another currently unresolved issue is that at the moment nothing >> prevents *standard* floating-point operations from being moved >> *inside* FENV_ACCESS regions. This may also be invalid, since >> those operations now may cause unexpected traps etc. (More >> specifically, what is invalid is moving any standard FP operation >> across a *call site* within a FENV_ACCESS region.) Note that >> this is even an issue if we only support changing the default >> (and no actual #pragma) if mutiple object files using different >> default settings are being linked together using LTO. >> >> This last issue could in theory be solved by having all >> optimization >> passes respect the requirement that floating-point operations may >> not be moved across call sites marked with the strict FP >> attribute. >> But that does not appear to be straightforward since it would >> introduce a "new" type of dependeny that would have to be added >> throughout LLVM code. If this must be avoided, we'd have to >> find a way to explicity track dependencies at the IR level. In >> the extreme, this could end up equivalent to just always using >> the constrained intrinsics for everything ... >> >> >> C. Code generation >> >> In the back end, effects of strict FP mode have to passed through >> to lower-level representations including SelectionDAG and MI. >> >> Currently, the "unmodeled side effect" logic of the constrained >> intrinsics is modeled by putting them on the chain during >> SelectionDAG. >> (If we ever model semantics more precisely at the IR level, that >> would need to be reflected on SelectionDAG accordingly.) >> >> At the MI level, there is no representation at all. One option to >> fix this would be to model target-specific registers that >> implement >> the IEEE semantics. Most platforms have registers (or parts of >> registers) that hold: >> - the current rounding mode >> - the exception status flags >> - the exception masks (which enable traps) >> Marking FP instructions as using and/or defining these registers >> would enforce ordering requirements. It may be too strict in some >> cases (e.g. two instructions setting exception status flags may >> still be reordered). On the other hand, I believe if instructions >> may actually *trap*, we actually need the hasSideEffects flag >> even >> if register dependencies are modeled. >> >> If we do need hasSideEffects, there is a separate discussion on >> whether this can be implemented without each back end having to >> duplicate all FP instruction patterns (one with hasSideEffects >> and one without), e.g. by having a new feature that allows to >> describe the side-effect status using an MI operand. >> >> >> Next steps >> =========>> >> I believe it is important to break up the full amount of work >> into incremental steps that provide some useful benefits on their >> own. At first, we should be able to get to a state where clang >> can be used to build programs that use some (maybe not all) >> strict >> FP features, where the generated code is always correct but may >> not always be optimal. To get there, I think we need at a >> minimum: >> >> - Implement clang support for the default flags, e.g. GCC's >> -frounding-math and -ftrapping-math, and generate always >> the constrained intrinsics. clang should also mark all >> call sites then (as mentioned above). >> >> - For now, add the requirement that LTO is not supported if >> this would cause mixing of strict and non-strict FP code. >> In the alternative, have the LTO pass automatically transform >> and floating-point operation into a constrained intrinsic >> if *any* (other) module already uses the latter. >> >> - At the IR level, complete the set of supported constrained >> FP intrinsics (there are still some missing, see e.g >> https://reviews.llvm.org/D43515 >> <https://reviews.llvm.org/D43515>). >> Also, it seems not all variants (e.g. for vector types) are >> supported correctly through codegen (see e.g. >> https://reviews.llvm.org/D46967 >> <https://reviews.llvm.org/D46967>). >> >> - Allow targets to correctly reflect constrained intrinsics >> semantics at the MI level and final machine code generation >> (see e.g. https://reviews.llvm.org/D45576 >> <https://reviews.llvm.org/D45576>). >> >> - Review all optimization and codegen passes to verify they >> fully respect strict FP semantics. >> >> Once this is done, we can improve on the solution by: >> >> - Supporting mixing strict and non-strict FP operations >> (would lift the LTO restriction). (Note: there seems >> to be still some "invention required" here, see above.) >> >> - Actually implementing the #pragma supporting different >> regions within a compilation unit (prereq: support for >> mixing strict and non-strict FP operations). >> >> - Add more optimization of constrained FP intrinsics in >> common optimizers and/or target back ends. >> >> Does this look reasonable? Please let me know if there's >> anything I overlooked, or you have any additional comments >> or questions. >> >> >> >> Mit freundlichen Gruessen / Best Regards >> >> Ulrich Weigand >> >> -- >> Dr. Ulrich Weigand | Phone: +49-7031/16-3727 >> STSM, GNU/Linux compilers and toolchain >> IBM Deutschland Research & Development GmbH >> Vorsitzende des Aufsichtsrats: Martina Koederitz | >> Geschäftsführung: Dirk Wittkopp >> Sitz der Gesellschaft: Böblingen | Registergericht: >> Amtsgericht Stuttgart, HRB 243294 >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/97933b0d/attachment.html>