thr3ads.net - llvm dev - [llvm-dev] Combining fast math flags with constrained intrinsics [Jan 2020]

If this information is useful, please help other people find it:
Share via:

Kaylor, Andrew via llvm-dev

2020-Jan-18 00:30 UTC

[llvm-dev] Combining fast math flags with constrained intrinsics

Hi all,

A question came up in a code review (https://reviews.llvm.org/D72820) about
whether or not to allow fast-math flags to be applied to constrained floating
point intrinsics
(http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics). This
has come up several times before, but I don't think we've ever made a
decision about it.

By default, the optimizer assumes that floating point operations have no side
effects and use the default rounding mode (round to nearest, ties to even). The
constrained intrinsics are meant to prevent the optimizer from making these
assumptions when the user wants to access the floating point environment -- to
change the rounding mode, to check floating point status bits, or to unmask
floating point exceptions. The intrinsics have an argument that either specify a
rounding mode that may be assumed or specify that the rounding mode is unknown
(this argument is omitted if it doesn't apply to the operation) and an
argument to specify whether the user wants precise exception semantics to be
preserved, wants to prevent syntactically spurious exceptions from being raised,
or doesn't care about floating point exceptions.

Because the constrained mode can be localized to a sub-region within a function,
we also need to support the case where a constrained intrinsic is used but the
default behavior (default rounding mode, exceptions ignored) is used. For this
reason, I think our IR definition must allow fast math flags to be applied to
constrained intrinsics. That makes this primarily a question about what
combinations should be permitted by front ends and how constructs like pragmas
should affect the various states. For example, I might have source code like
this:

-=-=-=-=-=-=-=-
double doSomethingVague(double X, double Y, double Z) {
  // Some operation that doesn't need to be precise.
  if (X/Y > SomeThreshold)
    return doSomethingPrecise(X, Y);
  else
    return Z;
}

#pragma STDC FENV_ACCESS ON
double doSomethingPrecise(double X, double Y) {
  int SaveRM = fegetround();
  fesetround(FE_DOWNWARD);
  feclearexcept(FE_ALL_EXCEPT);
  double Temp = X * Y + Z;
  if (fetestexcept(FE_ALL_EXCEPT))
    std::cerr << "Something happened.\n";
  fesetround(SaveRM);
  return Temp;
}

-=-=-=-=-=-=-=-
Now suppose I compile that with "-O2 -ffp-contract=fast". We will need
to generate constrained intrinsics for the X*Y+Z expression in
doSomethingPrecise. The question is, should clang (a) generate a constrained
version of the llvm.fmuladd instrinsic, (b) generate separate constrained.fmul
and constrained.fadd instrinsics with the contract fast math flag set, or (c)
generate separate constrained.fmul and constrained.fadd instrinsics with no fast
math flag? I would argue for (b), because I think it's reasonable for a user
who cares about precision and FP exceptions to still want FMA, which
theoretically is more precise. I think not (a) because clang doesn't usually
generate the fmuladd intrinsic with -ffp-contract=fast. On the other hand, if
the code also contained an FP_CONTRACT pragma around doSomethingPrecise() I
think clang should do (a).

Supporting the FP_CONTRACT case is the point of the D72820 patch.

But let's make this more interesting. Suppose I compile with "-O2
-fp-model=strict -ffp-contract=fast -fno-honor-nans -fno-honor-infinities
-fno-signed-zeros" instead and the code does not contain the FENV_ACCESS
pragma (I'll get back to that). Should the nnan, ninf, and nsz fast math
flags be applied to the constrained intrinsics? I lean toward "yes".
The way I see it, these command line options are a way for the user to tell the
compiler that their data will not contain NaNs or infinities and that their
algorithms do not depend on the sign of zero. These flags enable us to make some
optimizations that will not affect rounding or exception semantics as long as
the data is as the user claimed. This will be particularly useful for the strict
exception semantics because there are cases where we have to execute additional
instructions just to preserve the exception semantics in the case where one of
the operands is a NaN. If the user knows that will never happen, we can produce
better code.

Now back to the reason I wanted to consider that without the pragma. Consider my
code above with the pragma again, now imagine I compile it with "-O2
-fp-model=fast". In this case, the pragma almost certainly intends to
remove some fast math flags. For instance, I don't think it makes sense to
say you care about rounding mode but want to allow reassociation (because
reassociation has a much bigger potential to change results than rounding mode).
The flags discussed above could make sense, but we have no clear way to know
what the user intended, so I think in this case we must clear all the fast math
flags. This kind of conflicts with what I said about the contract flag above. I
obviously have mixed feelings about that. In the presence of a pragma, we should
probably block the contract flag too, but I don't like having to do that for
that specific case.

There's my opinion. I'd like to hear what others think.

Thanks,
Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200118/bab29b93/attachment.html>

Finkel, Hal J. via llvm-dev

2020-Jan-18 01:09 UTC

head link

[llvm-dev] Combining fast math flags with constrained intrinsics

On 1/17/20 6:30 PM, Kaylor, Andrew wrote:
Hi all,

A question came up in a code review (https://reviews.llvm.org/D72820) about
whether or not to allow fast-math flags to be applied to constrained floating
point intrinsics
(http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics). This
has come up several times before, but I don’t think we’ve ever made a decision
about it.

By default, the optimizer assumes that floating point operations have no side
effects and use the default rounding mode (round to nearest, ties to even). The
constrained intrinsics are meant to prevent the optimizer from making these
assumptions when the user wants to access the floating point environment -- to
change the rounding mode, to check floating point status bits, or to unmask
floating point exceptions. The intrinsics have an argument that either specify a
rounding mode that may be assumed or specify that the rounding mode is unknown
(this argument is omitted if it doesn’t apply to the operation) and an argument
to specify whether the user wants precise exception semantics to be preserved,
wants to prevent syntactically spurious exceptions from being raised, or doesn’t
care about floating point exceptions.

Because the constrained mode can be localized to a sub-region within a function,
we also need to support the case where a constrained intrinsic is used but the
default behavior (default rounding mode, exceptions ignored) is used. For this
reason, I think our IR definition must allow fast math flags to be applied to
constrained intrinsics. That makes this primarily a question about what
combinations should be permitted by front ends and how constructs like pragmas
should affect the various states. For example, I might have source code like
this:

-=-=-=-=-=-=-=-
double doSomethingVague(double X, double Y, double Z) {
// Some operation that doesn’t need to be precise.
if (X/Y > SomeThreshold)
return doSomethingPrecise(X, Y);
else
return Z;
}

#pragma STDC FENV_ACCESS ON
double doSomethingPrecise(double X, double Y) {
int SaveRM = fegetround();
fesetround(FE_DOWNWARD);
feclearexcept(FE_ALL_EXCEPT);
double Temp = X * Y + Z;
if (fetestexcept(FE_ALL_EXCEPT))
std::cerr << “Something happened.\n”;
fesetround(SaveRM);
return Temp;
}

-=-=-=-=-=-=-=-
Now suppose I compile that with “-O2 -ffp-contract=fast”. We will need to
generate constrained intrinsics for the X*Y+Z expression in doSomethingPrecise.
The question is, should clang (a) generate a constrained version of the
llvm.fmuladd instrinsic, (b) generate separate constrained.fmul and
constrained.fadd instrinsics with the contract fast math flag set, or (c)
generate separate constrained.fmul and constrained.fadd instrinsics with no fast
math flag? I would argue for (b), because I think it’s reasonable for a user who
cares about precision and FP exceptions to still want FMA, which theoretically
is more precise. I think not (a) because clang doesn’t usually generate the
fmuladd intrinsic with -ffp-contract=fast. On the other hand, if the code also
contained an FP_CONTRACT pragma around doSomethingPrecise() I think clang should
do (a).

Supporting the FP_CONTRACT case is the point of the D72820 patch.

But let’s make this more interesting. Suppose I compile with “-O2
-fp-model=strict -ffp-contract=fast -fno-honor-nans -fno-honor-infinities
-fno-signed-zeros” instead and the code does not contain the FENV_ACCESS pragma
(I’ll get back to that). Should the nnan, ninf, and nsz fast math flags be
applied to the constrained intrinsics? I lean toward “yes”. The way I see it,
these command line options are a way for the user to tell the compiler that
their data will not contain NaNs or infinities and that their algorithms do not
depend on the sign of zero. These flags enable us to make some optimizations
that will not affect rounding or exception semantics as long as the data is as
the user claimed. This will be particularly useful for the strict exception
semantics because there are cases where we have to execute additional
instructions just to preserve the exception semantics in the case where one of
the operands is a NaN. If the user knows that will never happen, we can produce
better code.

Now back to the reason I wanted to consider that without the pragma. Consider my
code above with the pragma again, now imagine I compile it with “-O2
-fp-model=fast”. In this case, the pragma almost certainly intends to remove
some fast math flags. For instance, I don’t think it makes sense to say you care
about rounding mode but want to allow reassociation (because reassociation has a
much bigger potential to change results than rounding mode). The flags discussed
above could make sense, but we have no clear way to know what the user intended,
so I think in this case we must clear all the fast math flags. This kind of
conflicts with what I said about the contract flag above. I obviously have mixed
feelings about that. In the presence of a pragma, we should probably block the
contract flag too, but I don’t like having to do that for that specific case.

There’s my opinion. I’d like to hear what others think.

Andy, thanks for writing this up. A few thoughts:

1. The mental model that I have is that there is always an FP_CONTRACT pragma:
there's some default (implicit) pragma at the beginning, and what it says
(off/on/fast) is controlled by the command-line flags (or the driver's
default if no flags are explicitly provided). Thus, unless there's some
reason my model doesn't really work, I lead against differentiating between
the there-is-a-pragma and there-is-not-a-pragma cases in some fundamental way.

2. I'm inclined to go with your choice (b) above because I think that we
should treat these concepts as orthogonal (to the extent that is reasonable: by
design, we don't want to reassociate constrained operations, so that flag
just might have on effect on those intrinsics). This lets the later optimization
passes decide how to treat the various combinations of flags and intrinsics
(just as with all other intrinsics that might be present).

3. Your examples with the various fast-math flags speak to a granularity
mismatch that we may wish to resolve: pragmas allow for local control, but the
command-line parameters provide global settings, and we just have that for
fast-math flags. I'd be very-much in favor of pragmas to allow local control
of those too. There have been plenty of times that I've wanted that for
fast-math flags.

-Hal

Thanks,
Andy

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200118/3ee0cd75/attachment-0001.html>

Cameron McInally via llvm-dev

2020-Jan-18 15:25 UTC

head link

[llvm-dev] Combining fast math flags with constrained intrinsics

On Fri, Jan 17, 2020 at 8:09 PM Finkel, Hal J. <hfinkel at anl.gov>
wrote:> Andy, thanks for writing this up. A few thoughts:
>
>  1. The mental model that I have is that there is always an FP_CONTRACT
pragma: there's some default (implicit) pragma at the beginning, and what it
says (off/on/fast) is controlled by the command-line flags (or the driver's
default if no flags are explicitly provided). Thus, unless there's some
reason my model doesn't really work, I lead against differentiating between
the there-is-a-pragma and there-is-not-a-pragma cases in some fundamental way.
>
>  2. I'm inclined to go with your choice (b) above because I think that
we should treat these concepts as orthogonal
Agreed.
> (to the extent that is reasonable: by design, we don't want to
reassociate constrained operations, so that flag just might have on effect on
those intrinsics). This lets the later optimization passes decide how to treat
the various combinations of flags and intrinsics (just as with all other
intrinsics that might be present).
I think I agree, but this needs clarification. My view is that we
don't want to reassociate constrained operations when
`-fp-model=strict`. When `-fp-model=fast`, we should reassociate and
do pretty much all the reasonably safe FMF transformations, with the
caveat that I don't think NNAN and NINF make sense for any trap-safe
mode. We may want to trap on those NaNs and Infs we'd optimize away.

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Jan 2020 - Combining fast math flags with constrained intrinsics

[llvm-dev] Combining fast math flags with constrained intrinsics

[llvm-dev] Combining fast math flags with constrained intrinsics

[llvm-dev] Combining fast math flags with constrained intrinsics

Possibly Parallel Threads