Kaylor, Andrew via llvm-dev
2020-Jan-18 00:30 UTC
[llvm-dev] Combining fast math flags with constrained intrinsics
Hi all, A question came up in a code review (https://reviews.llvm.org/D72820) about whether or not to allow fast-math flags to be applied to constrained floating point intrinsics (http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics). This has come up several times before, but I don't think we've ever made a decision about it. By default, the optimizer assumes that floating point operations have no side effects and use the default rounding mode (round to nearest, ties to even). The constrained intrinsics are meant to prevent the optimizer from making these assumptions when the user wants to access the floating point environment -- to change the rounding mode, to check floating point status bits, or to unmask floating point exceptions. The intrinsics have an argument that either specify a rounding mode that may be assumed or specify that the rounding mode is unknown (this argument is omitted if it doesn't apply to the operation) and an argument to specify whether the user wants precise exception semantics to be preserved, wants to prevent syntactically spurious exceptions from being raised, or doesn't care about floating point exceptions. Because the constrained mode can be localized to a sub-region within a function, we also need to support the case where a constrained intrinsic is used but the default behavior (default rounding mode, exceptions ignored) is used. For this reason, I think our IR definition must allow fast math flags to be applied to constrained intrinsics. That makes this primarily a question about what combinations should be permitted by front ends and how constructs like pragmas should affect the various states. For example, I might have source code like this: -=-=-=-=-=-=-=- double doSomethingVague(double X, double Y, double Z) { // Some operation that doesn't need to be precise. if (X/Y > SomeThreshold) return doSomethingPrecise(X, Y); else return Z; } #pragma STDC FENV_ACCESS ON double doSomethingPrecise(double X, double Y) { int SaveRM = fegetround(); fesetround(FE_DOWNWARD); feclearexcept(FE_ALL_EXCEPT); double Temp = X * Y + Z; if (fetestexcept(FE_ALL_EXCEPT)) std::cerr << "Something happened.\n"; fesetround(SaveRM); return Temp; } -=-=-=-=-=-=-=- Now suppose I compile that with "-O2 -ffp-contract=fast". We will need to generate constrained intrinsics for the X*Y+Z expression in doSomethingPrecise. The question is, should clang (a) generate a constrained version of the llvm.fmuladd instrinsic, (b) generate separate constrained.fmul and constrained.fadd instrinsics with the contract fast math flag set, or (c) generate separate constrained.fmul and constrained.fadd instrinsics with no fast math flag? I would argue for (b), because I think it's reasonable for a user who cares about precision and FP exceptions to still want FMA, which theoretically is more precise. I think not (a) because clang doesn't usually generate the fmuladd intrinsic with -ffp-contract=fast. On the other hand, if the code also contained an FP_CONTRACT pragma around doSomethingPrecise() I think clang should do (a). Supporting the FP_CONTRACT case is the point of the D72820 patch. But let's make this more interesting. Suppose I compile with "-O2 -fp-model=strict -ffp-contract=fast -fno-honor-nans -fno-honor-infinities -fno-signed-zeros" instead and the code does not contain the FENV_ACCESS pragma (I'll get back to that). Should the nnan, ninf, and nsz fast math flags be applied to the constrained intrinsics? I lean toward "yes". The way I see it, these command line options are a way for the user to tell the compiler that their data will not contain NaNs or infinities and that their algorithms do not depend on the sign of zero. These flags enable us to make some optimizations that will not affect rounding or exception semantics as long as the data is as the user claimed. This will be particularly useful for the strict exception semantics because there are cases where we have to execute additional instructions just to preserve the exception semantics in the case where one of the operands is a NaN. If the user knows that will never happen, we can produce better code. Now back to the reason I wanted to consider that without the pragma. Consider my code above with the pragma again, now imagine I compile it with "-O2 -fp-model=fast". In this case, the pragma almost certainly intends to remove some fast math flags. For instance, I don't think it makes sense to say you care about rounding mode but want to allow reassociation (because reassociation has a much bigger potential to change results than rounding mode). The flags discussed above could make sense, but we have no clear way to know what the user intended, so I think in this case we must clear all the fast math flags. This kind of conflicts with what I said about the contract flag above. I obviously have mixed feelings about that. In the presence of a pragma, we should probably block the contract flag too, but I don't like having to do that for that specific case. There's my opinion. I'd like to hear what others think. Thanks, Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200118/bab29b93/attachment.html>
Finkel, Hal J. via llvm-dev
2020-Jan-18 01:09 UTC
[llvm-dev] Combining fast math flags with constrained intrinsics
On 1/17/20 6:30 PM, Kaylor, Andrew wrote: Hi all, A question came up in a code review (https://reviews.llvm.org/D72820) about whether or not to allow fast-math flags to be applied to constrained floating point intrinsics (http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics). This has come up several times before, but I don’t think we’ve ever made a decision about it. By default, the optimizer assumes that floating point operations have no side effects and use the default rounding mode (round to nearest, ties to even). The constrained intrinsics are meant to prevent the optimizer from making these assumptions when the user wants to access the floating point environment -- to change the rounding mode, to check floating point status bits, or to unmask floating point exceptions. The intrinsics have an argument that either specify a rounding mode that may be assumed or specify that the rounding mode is unknown (this argument is omitted if it doesn’t apply to the operation) and an argument to specify whether the user wants precise exception semantics to be preserved, wants to prevent syntactically spurious exceptions from being raised, or doesn’t care about floating point exceptions. Because the constrained mode can be localized to a sub-region within a function, we also need to support the case where a constrained intrinsic is used but the default behavior (default rounding mode, exceptions ignored) is used. For this reason, I think our IR definition must allow fast math flags to be applied to constrained intrinsics. That makes this primarily a question about what combinations should be permitted by front ends and how constructs like pragmas should affect the various states. For example, I might have source code like this: -=-=-=-=-=-=-=- double doSomethingVague(double X, double Y, double Z) { // Some operation that doesn’t need to be precise. if (X/Y > SomeThreshold) return doSomethingPrecise(X, Y); else return Z; } #pragma STDC FENV_ACCESS ON double doSomethingPrecise(double X, double Y) { int SaveRM = fegetround(); fesetround(FE_DOWNWARD); feclearexcept(FE_ALL_EXCEPT); double Temp = X * Y + Z; if (fetestexcept(FE_ALL_EXCEPT)) std::cerr << “Something happened.\n”; fesetround(SaveRM); return Temp; } -=-=-=-=-=-=-=- Now suppose I compile that with “-O2 -ffp-contract=fast”. We will need to generate constrained intrinsics for the X*Y+Z expression in doSomethingPrecise. The question is, should clang (a) generate a constrained version of the llvm.fmuladd instrinsic, (b) generate separate constrained.fmul and constrained.fadd instrinsics with the contract fast math flag set, or (c) generate separate constrained.fmul and constrained.fadd instrinsics with no fast math flag? I would argue for (b), because I think it’s reasonable for a user who cares about precision and FP exceptions to still want FMA, which theoretically is more precise. I think not (a) because clang doesn’t usually generate the fmuladd intrinsic with -ffp-contract=fast. On the other hand, if the code also contained an FP_CONTRACT pragma around doSomethingPrecise() I think clang should do (a). Supporting the FP_CONTRACT case is the point of the D72820 patch. But let’s make this more interesting. Suppose I compile with “-O2 -fp-model=strict -ffp-contract=fast -fno-honor-nans -fno-honor-infinities -fno-signed-zeros” instead and the code does not contain the FENV_ACCESS pragma (I’ll get back to that). Should the nnan, ninf, and nsz fast math flags be applied to the constrained intrinsics? I lean toward “yes”. The way I see it, these command line options are a way for the user to tell the compiler that their data will not contain NaNs or infinities and that their algorithms do not depend on the sign of zero. These flags enable us to make some optimizations that will not affect rounding or exception semantics as long as the data is as the user claimed. This will be particularly useful for the strict exception semantics because there are cases where we have to execute additional instructions just to preserve the exception semantics in the case where one of the operands is a NaN. If the user knows that will never happen, we can produce better code. Now back to the reason I wanted to consider that without the pragma. Consider my code above with the pragma again, now imagine I compile it with “-O2 -fp-model=fast”. In this case, the pragma almost certainly intends to remove some fast math flags. For instance, I don’t think it makes sense to say you care about rounding mode but want to allow reassociation (because reassociation has a much bigger potential to change results than rounding mode). The flags discussed above could make sense, but we have no clear way to know what the user intended, so I think in this case we must clear all the fast math flags. This kind of conflicts with what I said about the contract flag above. I obviously have mixed feelings about that. In the presence of a pragma, we should probably block the contract flag too, but I don’t like having to do that for that specific case. There’s my opinion. I’d like to hear what others think. Andy, thanks for writing this up. A few thoughts: 1. The mental model that I have is that there is always an FP_CONTRACT pragma: there's some default (implicit) pragma at the beginning, and what it says (off/on/fast) is controlled by the command-line flags (or the driver's default if no flags are explicitly provided). Thus, unless there's some reason my model doesn't really work, I lead against differentiating between the there-is-a-pragma and there-is-not-a-pragma cases in some fundamental way. 2. I'm inclined to go with your choice (b) above because I think that we should treat these concepts as orthogonal (to the extent that is reasonable: by design, we don't want to reassociate constrained operations, so that flag just might have on effect on those intrinsics). This lets the later optimization passes decide how to treat the various combinations of flags and intrinsics (just as with all other intrinsics that might be present). 3. Your examples with the various fast-math flags speak to a granularity mismatch that we may wish to resolve: pragmas allow for local control, but the command-line parameters provide global settings, and we just have that for fast-math flags. I'd be very-much in favor of pragmas to allow local control of those too. There have been plenty of times that I've wanted that for fast-math flags. -Hal Thanks, Andy -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200118/3ee0cd75/attachment-0001.html>
Cameron McInally via llvm-dev
2020-Jan-18 15:25 UTC
[llvm-dev] Combining fast math flags with constrained intrinsics
On Fri, Jan 17, 2020 at 8:09 PM Finkel, Hal J. <hfinkel at anl.gov> wrote:> Andy, thanks for writing this up. A few thoughts: > > 1. The mental model that I have is that there is always an FP_CONTRACT pragma: there's some default (implicit) pragma at the beginning, and what it says (off/on/fast) is controlled by the command-line flags (or the driver's default if no flags are explicitly provided). Thus, unless there's some reason my model doesn't really work, I lead against differentiating between the there-is-a-pragma and there-is-not-a-pragma cases in some fundamental way. > > 2. I'm inclined to go with your choice (b) above because I think that we should treat these concepts as orthogonalAgreed.> (to the extent that is reasonable: by design, we don't want to reassociate constrained operations, so that flag just might have on effect on those intrinsics). This lets the later optimization passes decide how to treat the various combinations of flags and intrinsics (just as with all other intrinsics that might be present).I think I agree, but this needs clarification. My view is that we don't want to reassociate constrained operations when `-fp-model=strict`. When `-fp-model=fast`, we should reassociate and do pretty much all the reasonably safe FMF transformations, with the caveat that I don't think NNAN and NINF make sense for any trap-safe mode. We may want to trap on those NaNs and Infs we'd optimize away.