thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] Why is #pragma STDC FENV

If this information is useful, please help other people find it:
Share via:

Kaylor, Andrew via llvm-dev

2018-Jan-09 18:53 UTC

[llvm-dev] [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

I think we're going to need to create a new mechanism to communicate strict
FP modes to the backend. I think we need to avoid doing anything that will
require re-inventing or duplicating all of the pattern matching that goes on in
instruction selection (which is the reason we're currently dropping that
information). I'm out of my depth on this transition, but I think maybe we
could handle it with some kind of attribute on the MBB.

In C/C++, at least, it's my understanding that the pragmas always apply at
the scope-level (as opposed to having the possibility of being
instruction-specific), and we've previously agreed that our implementation
will really need to apply the rules across entire functions in the sense that if
any part of a function uses the constrained intrinsics all FP operations in the
function will need to use them (though different metadata arguments may be used
in different scopes). So I think that opens our options a bit.

Regarding constant folding, I think you are correct that it isn't happening
anywhere in the backends at the moment. There is some constant folding done
during instruction selection, but the existing mechanism prevents that. My
concern is that given LLVM's development model, if there is nothing in place
to prevent constant folding and no consensus that it shouldn't be allowed
then we should probably believe that someone will eventually do it.

-Andy

From: Ulrich Weigand [mailto:Ulrich.Weigand at de.ibm.com]
Sent: Tuesday, January 09, 2018 9:59 AM
To: Kaylor, Andrew <andrew.kaylor at intel.com>; kpn at neutralgood.org
Cc: Hal Finkel <hfinkel at anl.gov>; Richard Smith <richard at
metafoo.co.uk>; bob.huemmer at sas.com; bumblebritches57 at gmail.com;
wei.ding2 at amd.com; cfe-dev at lists.llvm.org; llvm-dev <llvm-dev at
lists.llvm.org>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

Andrew Kaylor wrote:
>In general, the current "strict FP" handling stops at instruction
>selection. At the MachineIR level we don't currently have a mechanism
>to prevent inappropriate optimizations based on floating point
>constraints, or indeed to convey such constraints to the backend.
>Implicit register use modeling may provide some restriction on some
>architectures, but this is definitely lacking for X86 targets. On the
>other hand, I'm not aware of any specific current problems, so in many
>cases we may "get lucky" and have the correct thing happen by
chance.
>Obviously that's not a viable long term solution. I have a rough plan
>for adding improved register modeling to the X86 backend, which should
>take care of instruction scheduling issues, but we'd still need a
>mechanism to prevent constant folding optimizations and such.
Given that Kevin intends to target SystemZ, I'll be happy to work on the
SystemZ back-end support for this feature. I agree that we should be using
implicit control register dependencies, which will at least prevent moving
floating-point operations across instructions that e.g. change rounding modes.
However, the main property we need to model is that floating-point operations
may *trap*. I guess this can be done using UnmodeledSideEffects, but I'm not
quite clear on how to make this dependent on whether or not a "strict"
operation is requested (without duplicating all the instruction patterns ...).

Once we do use something like UnmodeledSideEffects, I think MachineIR passes
should handle everything correctly; in the end, the requirements are not really
different from those of other trapping instructions. B.t.w. I don't think
anybody does constant folding on floating-point constants at the MachineIR level
anyway ... have you seen this anywhere?

Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU/Linux compilers and toolchain
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB
243294
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180109/67283e12/attachment-0001.html>

Kaylor, Andrew via llvm-dev

2018-Feb-09 23:13 UTC

head link

[llvm-dev] [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

I think you have described the backend issues very well.

You are correct that Intel architecture machines behave roughly as you describe.
There are some wrinkles in that status and control bits are kept in the same
register and there are two such registers, one for MMX/SSE/AVX instructions and
one for X87 instructions. But that is all a matter of details, conceptually it
is just as you have described.

It is my understanding that some LLVM backends are already modeling the FP
control and status registers. The X86 backend does not. I attempted to add it
last year, but I ran into some complications and backed it out. I think I know
how to fix those problems now.

Everyone I’ve talked to up until now is happy to live with performance
degradations when using non-default FP modes. The sticking point is that we’d
really like to avoid doing anything that would restrict performance in the
default case, which we expect to be used in the vast majority of programs. I’m
not sure how much impact restricting FP scheduling in the backend would have. My
intuition is that it wouldn’t be particularly significant, but it would
certainly be something worth measuring.

You’re correct that we currently have no means of communicating the rounding
mode and exception behavior to the back end. I’m reluctant to rely on Selection
DAG pattern matching for the STRICT nodes because the existing pattern matching
has a large number of variations. If we can re-use those patterns, I definitely
want to. That’s the reason that the current implementation was written the way
it was.

To answer your other question, I am not going to be attending the LLVM
developers meeting in Bristol. I would, however, be happy to have some sort of
virtual meeting to discuss this with anyone who is interested.

Thanks,
Andy

From: Ulrich Weigand [mailto:Ulrich.Weigand at de.ibm.com]
Sent: Friday, February 09, 2018 6:42 AM
To: Kaylor, Andrew <andrew.kaylor at intel.com>
Cc: bob.huemmer at sas.com; bumblebritches57 at gmail.com; cfe-dev at
lists.llvm.org; Hal Finkel <hfinkel at anl.gov>; kpn at neutralgood.org;
llvm-dev <llvm-dev at lists.llvm.org>; Richard Smith <richard at
metafoo.co.uk>
Subject: RE: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

Hi Andrew,

sorry for the delay, I only now got some time to look into this a bit more. But
I still have a number of questions of how to actually implement this in the back
end. Looking at this bottom-up, starting with the behavior of the actual machine
instructions, we have (at least on SystemZ) the following things to consider:

A) Rounding mode

Most FP arithmetic instructions use the "current rounding mode" as
indicated in the floating-point control register. This is currently assumed to
never change. To fix this, we need to avoid scheduling FP arithmetic
instructions across instructions that modify the rounding mode. This may also
imply avoiding scheduling instructions across function calls, since those may
also modify the rounding mode. This can probably be done by modeling the
floating-point control register as LLVM register (or maybe model just the
rounding mode bits as its own "register"), have all FP arithmetic
instructions in question take this new register as implicit input, and have the
register by clobbered by the instructions that change the rounding mode (and
also function calls).

B) Floating-point status flags

FP instructions set a flag bit in the floating-point status register whenever an
IEEE exception condition is recognized. If these flag bits are later tested by
application code, we should ensure their value is unchanged by compiler
optimization. Naively modeling the status register is probably overkill here:
since every FP instruction would need to be considered to modify (i.e. use and
def) that register, this simply has the effect of creating a dependency chain
across *all* FP instructions and makes any kind of instruction scheduling
impossible. But this isn't really necessary since the flag bits actually
simply accumulate. So it would suffice to have special dependencies from each FP
instruction separately directly to the next instruction (or routine) that reads
the status flags. However, I don't really see any easy way to model this
type of dependency in the back-end (in particular on the MI level).

C) Floating-point exceptions

If a mask bit in the floating-point status register is set, then all FP
instructions will *trap* whenever an IEEE exception condition is recognized.
This means that we need to treat those instructions as having unmodelled side
effects, so that they cannot be speculatively executed. Also, we cannot schedule
FP instructions across instructions that set (those bits in) the FP status
register -- but the latter is probably automatically done as long as those
latter instructions are described as having unmodeled side effects. Note that
this will in effect again create a dependency chain across all FP instructions,
so that B) should be implicitly covered as well here.

Did I miss anything here? I'm assuming that the behavior on FP instructions
on Intel (and other architectures) will be roughly similar, given that this
behavior is mostly defined by the IEEE standard.

Now the question in my mind is, how this this all map onto the experimental
constrained intrinsics? They do have "rounding mode" and
"exception behavior" metadata, but I don't really see how that
maps onto the behavior of instructions as described above. Also, right now the
back-end doesn't even *get* at that data in the first place, since it is
just thown away when lowering the intrinsics to STRICT_... nodes. In fact,
I'm also not sure how the front-end is even supposed to be *setting* those
metadata flags -- is the compiler supposed to track calls to fesetround and the
like, and thereby determine which rounding and exception modes apply to any
given block of code? In fact, was the original intention even that the back-end
actually implements different behavior based on this level of detail, or was the
back-end supposed to support only two modes, the default behavior of today and a
fully strict implementation always satisfying all three of A), B), and C) above?

Looking again at a possible implementation in the back-end, I'm now
wondering if it wouldn't after all be better to just treat the STRICT_
opcodes like all other DAG nodes. That is, have them be associated with an
action (Legal, Expand, or Custom); set the default action to Expand, with a
default expander that just replaces them by the "normal" FP nodes; and
allow a back-end to set the action to Legal and/or Custom and then just handle
them in the back-end as it sees fit. This might indeed require multiple patterns
to match them, but it should be possible to generate those via multiclass
instantiations so it might not be all that big a deal. The benefit would be that
it allows the back-end the greatest freedom how to handle things (e.g.
interactions with target-specific control registers).

Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU/Linux compilers and toolchain
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB
243294

[Inactive hide details for "Kaylor, Andrew" ---09.01.2018 19:55:59---I
think we're going to need to create a new mechanism to co]"Kaylor,
Andrew" ---09.01.2018 19:55:59---I think we're going to need to create
a new mechanism to communicate strict FP modes to the backend.

From: "Kaylor, Andrew" <andrew.kaylor at
intel.com<mailto:andrew.kaylor at intel.com>>
To: Ulrich Weigand <Ulrich.Weigand at de.ibm.com<mailto:Ulrich.Weigand at
de.ibm.com>>, "kpn at neutralgood.org<mailto:kpn at
neutralgood.org>" <kpn at neutralgood.org<mailto:kpn at
neutralgood.org>>
Cc: Hal Finkel <hfinkel at anl.gov<mailto:hfinkel at anl.gov>>,
Richard Smith <richard at metafoo.co.uk<mailto:richard at
metafoo.co.uk>>, "bob.huemmer at sas.com<mailto:bob.huemmer at
sas.com>" <bob.huemmer at sas.com<mailto:bob.huemmer at
sas.com>>, "bumblebritches57 at gmail.com<mailto:bumblebritches57
at gmail.com>" <bumblebritches57 at
gmail.com<mailto:bumblebritches57 at gmail.com>>, "cfe-dev at
lists.llvm.org<mailto:cfe-dev at lists.llvm.org>" <cfe-dev at
lists.llvm.org<mailto:cfe-dev at lists.llvm.org>>, llvm-dev
<llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Date: 09.01.2018 19:55
Subject: RE: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

________________________________

I think we’re going to need to create a new mechanism to communicate strict FP
modes to the backend. I think we need to avoid doing anything that will require
re-inventing or duplicating all of the pattern matching that goes on in
instruction selection (which is the reason we’re currently dropping that
information). I’m out of my depth on this transition, but I think maybe we could
handle it with some kind of attribute on the MBB.

In C/C++, at least, it’s my understanding that the pragmas always apply at the
scope-level (as opposed to having the possibility of being
instruction-specific), and we’ve previously agreed that our implementation will
really need to apply the rules across entire functions in the sense that if any
part of a function uses the constrained intrinsics all FP operations in the
function will need to use them (though different metadata arguments may be used
in different scopes). So I think that opens our options a bit.

Regarding constant folding, I think you are correct that it isn’t happening
anywhere in the backends at the moment. There is some constant folding done
during instruction selection, but the existing mechanism prevents that. My
concern is that given LLVM’s development model, if there is nothing in place to
prevent constant folding and no consensus that it shouldn’t be allowed then we
should probably believe that someone will eventually do it.

-Andy

From: Ulrich Weigand [mailto:Ulrich.Weigand at de.ibm.com]
Sent: Tuesday, January 09, 2018 9:59 AM
To: Kaylor, Andrew <andrew.kaylor at intel.com<mailto:andrew.kaylor at
intel.com>>; kpn at neutralgood.org<mailto:kpn at neutralgood.org>
Cc: Hal Finkel <hfinkel at anl.gov<mailto:hfinkel at anl.gov>>;
Richard Smith <richard at metafoo.co.uk<mailto:richard at
metafoo.co.uk>>; bob.huemmer at sas.com<mailto:bob.huemmer at
sas.com>; bumblebritches57 at gmail.com<mailto:bumblebritches57 at
gmail.com>; wei.ding2 at amd.com<mailto:wei.ding2 at amd.com>; cfe-dev
at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>; llvm-dev <llvm-dev
at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

Andrew Kaylor wrote:
>In general, the current "strict FP" handling stops at instruction
>selection. At the MachineIR level we don't currently have a mechanism
>to prevent inappropriate optimizations based on floating point
>constraints, or indeed to convey such constraints to the backend.
>Implicit register use modeling may provide some restriction on some
>architectures, but this is definitely lacking for X86 targets. On the
>other hand, I'm not aware of any specific current problems, so in many
>cases we may "get lucky" and have the correct thing happen by
chance.
>Obviously that's not a viable long term solution. I have a rough plan
>for adding improved register modeling to the X86 backend, which should
>take care of instruction scheduling issues, but we'd still need a
>mechanism to prevent constant folding optimizations and such.
Given that Kevin intends to target SystemZ, I'll be happy to work on the
SystemZ back-end support for this feature. I agree that we should be using
implicit control register dependencies, which will at least prevent moving
floating-point operations across instructions that e.g. change rounding modes.
However, the main property we need to model is that floating-point operations
may *trap*. I guess this can be done using UnmodeledSideEffects, but I'm not
quite clear on how to make this dependent on whether or not a "strict"
operation is requested (without duplicating all the instruction patterns ...).

Once we do use something like UnmodeledSideEffects, I think MachineIR passes
should handle everything correctly; in the end, the requirements are not really
different from those of other trapping instructions. B.t.w. I don't think
anybody does constant folding on floating-point constants at the MachineIR level
anyway ... have you seen this anywhere?

Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
STSM, GNU/Linux compilers and toolchain
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB
243294

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180209/ac0efb3f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 105 bytes
Desc: image001.gif
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180209/ac0efb3f/attachment.gif>

Kevin P. Neal via llvm-dev

2018-Mar-06 14:01 UTC

head link

[llvm-dev] [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

I'm working with Andrew on D43515 right now, and some of these unanswered
questions are directly relevant to that patch. So....

On Fri, Feb 09, 2018 at 03:42:20PM +0100, Ulrich Weigand
wrote:>    C) Floating-point exceptions
>    If a mask bit in the floating-point status register is set, then all FP
>    instructions will *trap* whenever an IEEE exception condition is
>    recognized. This means that we need to treat those instructions as
>    having unmodelled side effects, so that they cannot be speculatively
>    executed. Also, we cannot schedule FP instructions across instructions
Does this mean that the problems with the default expansion of ISD::FP_TO_UINT
would be solved by the backend knowing that it should model traps?

In D43515 the issue of what to do with STRICT_FP_TO_UINT is still unsolved.
>    that set (those bits in) the FP status register -- but the latter is
>    probably automatically done as long as those latter instructions are
>    described as having unmodeled side effects. Note that this will in
>    effect again create a dependency chain across all FP instructions, so
>    that B) should be implicitly covered as well here.
>    Did I miss anything here? I'm assuming that the behavior on FP
>    instructions on Intel (and other architectures) will be roughly
>    similar, given that this behavior is mostly defined by the IEEE
>    standard.
>    Now the question in my mind is, how this this all map onto the
>    experimental constrained intrinsics? They do have "rounding
mode" and
>    "exception behavior" metadata, but I don't really see how
that maps
>    onto the behavior of instructions as described above. Also, right now
>    the back-end doesn't even *get* at that data in the first place,
since
>    it is just thown away when lowering the intrinsics to STRICT_... nodes.
>    In fact, I'm also not sure how the front-end is even supposed to be
>    *setting* those metadata flags -- is the compiler supposed to track
>    calls to fesetround and the like, and thereby determine which rounding
>    and exception modes apply to any given block of code? In fact, was the
>    original intention even that the back-end actually implements different
>    behavior based on this level of detail, or was the back-end supposed to
>    support only two modes, the default behavior of today and a fully
>    strict implementation always satisfying all three of A), B), and C)
>    above?
>    Looking again at a possible implementation in the back-end, I'm now
>    wondering if it wouldn't after all be better to just treat the
STRICT_
>    opcodes like all other DAG nodes. That is, have them be associated with
>    an action (Legal, Expand, or Custom); set the default action to Expand,
>    with a default expander that just replaces them by the
"normal" FP
>    nodes; and allow a back-end to set the action to Legal and/or Custom
>    and then just handle them in the back-end as it sees fit. This might
>    indeed require multiple patterns to match them, but it should be
>    possible to generate those via multiclass instantiations so it might
>    not be all that big a deal. The benefit would be that it allows the
>    back-end the greatest freedom how to handle things (e.g. interactions
>    with target-specific control registers).
Was there a consensus on what to do here? 

Are we exposing the strict SDAG nodes to the backend or not? Obviously
if we are this isn't going to take a while to implement, but it would
still be useful to know when coding the layers above the backend.

If we're not exposing the strict nodes to the backend, would using the
chain on expansions like ISD::FP_TO_UINT solve speculative execution issues?

-- 
Kevin P. Neal                                http://www.pobox.com/~kpn/

                    "A pig's gotta fly." - Crimson Pig

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Mar 2018 - [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

[llvm-dev] [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

[llvm-dev] [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

[llvm-dev] [cfe-dev] Why is #pragma STDC FENV_ACCESS not supported?

Apparently Analagous Threads