thr3ads.net - llvm dev - [llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment [Oct 2019]

If this information is useful, please help other people find it:
Share via:

Finkel, Hal J. via llvm-dev

2019-Oct-03 00:01 UTC

[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment

On 10/2/19 5:12 PM, Hal Finkel wrote:
On 10/1/19 12:35 AM, Serge Pavlov via llvm-dev wrote:
Hi all,

This proposal is aimed at support of floating point environment, in which some
properties like rounding mode or exception behavior differ from those used by
default. This include in particular support of 'pragma STDC
FENV_ACCESS', 'pragma STDC FENV_ROUND' as well as some other related
facilities.

Problem

On many processors use of non-default floating mode requires modification of
global state by writing into some register. It presents a difficulty for
implementation as a floating point instruction must not be move to code which
executes with different floating point state. To prevent from such moves, the
current solution represents FP operations with special (constrained)
instructions, which do not participate in optimizations
(http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html). It is
important that the constrained FP operations must be used everywhere in entire
function including inlined calls, if they are used in some part of it.

The main concern about such approach is performance drop. Using constrained FP
operations means that optimizations on FP operations are turned off, this is the
main reason of using them. Even if non-default FP environment is used in a small
piece of a function, optimizations are turned off in entire function. For many
practical application this is unacceptable.


The reason, as you're likely aware, that the constrained FP operations must
be used within the entire function is that, if you mix the constrained FP
operations with the normal ones, there's no way to prevent code motion from
intermixing them. The solution I recall being discussed to this problem of a
function which requires constrained operations only in part is outlining in
Clang - this does introduce function-call overhead (although perhaps some
MI-level inlining pass could mitigate that in part), but otherwise permits
normal optimization of the normal FP operations.


Johannes and I discussed the outlining here offline, and two notes:

 1. The outlining itself will prevent the undesired code motion today, but in
the future we'll have IPO transformations that will need to be specifically
taught to avoid moving FP operations into these outlined functions.

 2. The outlined functions will need to be marked with noinline and also
noimplicitfloat. In fact, all functions using the constrained intrinsics might
need to be marked with noimplicitfloat. The above-mentioned restrictions on IPO
passes might be conditioned on the noimplicitfloat attribute.

 -Hal



Although this approach prevents from moving instructions, it does not prevent
from moving basic blocks. The code that uses non-default FP environment at some
point must set appropriate state registers, do necessary operations and then
restore the original mode. If this activity is scattered by several basic
blocks, block-level optimizations can break these arrangement, for instance a
basic block with default FP operations can be moved after the block that sets
non-default FP environment.


Can you please provide some pseudocode to illustrate this problem? Moving basic
blocks moves the instructions within them, and I don't see how our current
semantics would prevent illegal reorderings of the instructions but not prevent
illegal reorderings of groups of those same instructions. At the LLVM level, we
currently model the FP-environment state as a kind of memory, and so the
operations which adjust the FP-environment state must also be marked as writing
to memory, but that's true with essentially all external program state, and
that should prevent all illegal reordering.

Thanks,

Hal


Solution

The proposed approach is based on extension of basic blocks. It is assumed that
code in basic block is executed in the same FP environment. The assumption is
consistent with the rules of using 'pragma STDC FENV_ACCESS' and similar
facilities. If the environment differs from default, such block has pointer to
some object that keeps the block attributes including FP settings. All basic
blocks, obtained from the same block where 'pragma STDC FENV_ACCESS' is
specified, share the same attribute object. In bytecode these attributes are
represented by metadata attached to the basic blocks.

With basic block attributes compiler can assert validity of an instruction move
by comparing attributes of source and destination BBs. An instruction should
keep pointer to BB attributes even if it is detached from BB, to support common
technique of moving instructions. Similarly compiler can verify validity of BB
movement.

Such approach allows to develop implementation in which constrained FP
operations are 'jailed' in their basic blocks. Other part of the
function can still use usual FP operations and get profit of optimizations.
Depending on the target hardware some FP operations may be allowed to cross the
'jail' boundary, for instance, it they correspond to instructions which
directly encode rounding mode and FP environment change rounding mode only.

Is this solution feasible? What are obstacles, difficulties or drawbacks for it?
Are there any improvements for it? Any feedback is welcome.

Thanks,
--Serge



_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191003/41aa6277/attachment.html>

Serge Pavlov via llvm-dev

2019-Oct-03 13:26 UTC

head link

[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment

On Thu, Oct 3, 2019 at 7:01 AM Finkel, Hal J. <hfinkel at anl.gov> wrote:
> On 10/2/19 5:12 PM, Hal Finkel wrote:
>
> On 10/1/19 12:35 AM, Serge Pavlov via llvm-dev wrote:
>
> The main concern about such approach is performance drop. Using
> constrained FP operations means that optimizations on FP operations are
> turned off, this is the main reason of using them. Even if non-default FP
> environment is used in a small piece of a function, optimizations are
> turned off in entire function. For many practical application this is
> unacceptable.
>
> The reason, as you're likely aware, that the constrained FP operations
> must be used within the entire function is that, if you mix the constrained
> FP operations with the normal ones, there's no way to prevent code
motion
> from intermixing them.
>
> This proposal presents a way to prevent such intermixing. In some basicblock we use normal FP operations, in others - constrained, BB attributes
allows to check validity of instruction moves.
> The solution I recall being discussed to this problem of a function which
> requires constrained operations only in part is outlining in Clang - this
> does introduce function-call overhead (although perhaps some MI-level
> inlining pass could mitigate that in part), but otherwise permits normal
> optimization of the normal FP operations.
>
> Johannes and I discussed the outlining here offline, and two notes:
>
>  1. The outlining itself will prevent the undesired code motion today, but
> in the future we'll have IPO transformations that will need to be
> specifically taught to avoid moving FP operations into these outlined
> functions.
>
>  2. The outlined functions will need to be marked with noinline and also
> noimplicitfloat. In fact, all functions using the constrained intrinsics
> might need to be marked with noimplicitfloat. The above-mentioned
> restrictions on IPO passes might be conditioned on the noimplicitfloat
> attribute.
>
Outlining is an interesting solution but unfortunately it is not an option
for processors for machine learning. Branching is expensive on them and
some processors do not have call instruction, all function calls must be
eventually inlined. On the other hand rounding control is especially
important in such processors, as they usually operate short data types and
using proper rounding mode can gain precision. They often allow encoding
rounding mode in instruction and making a call just to execute a couple of
instructions is not acceptable.

Although this approach prevents from moving instructions, it does
not> prevent from moving basic blocks. The code that uses non-default FP
> environment at some point must set appropriate state registers, do
> necessary operations and then restore the original mode. If this activity
> is scattered by several basic blocks, block-level optimizations can break
> these arrangement, for instance a basic block with default FP operations
> can be moved after the block that sets non-default FP environment.
>
> Can you please provide some pseudocode to illustrate this problem? Moving
> basic blocks moves the instructions within them, and I don't see how
our
> current semantics would prevent illegal reorderings of the instructions but
> not prevent illegal reorderings of groups of those same instructions. At
> the LLVM level, we currently model the FP-environment state as a kind of
> memory, and so the operations which adjust the FP-environment state must
> also be marked as writing to memory, but that's true with essentially
all
> external program state, and that should prevent all illegal reordering.
>
>Let' consider a transformation like LoopUnswitch. The source:

for (int i = 0; i < N, ++i) {
    #pragma STDC FENV_ACCESS
    set_fp_environment(X);
    if (i > K)
        some_func();
    // Basic block that calculates condition starts here.
    bool f = float_a < float_b;
    if (f)
        do_1(i);
    else
        do_2(i);
}

As basic block that calculates condition `f` does not depend on values
calculated in the loop, it can be hoisted:

bool f = float_a < float_b;
if (f) {
    for (int i = 0; i < N, ++i) {
        #pragma STDC FENV_ACCESS
        set_fp_environment(X);
        if (i > K)
            some_func();
        do_1(i);
    }
} else {
    for (int i = 0; i < N, ++i) {
        #pragma STDC FENV_ACCESS
        set_fp_environment(X);
        if (i > K)
            some_func();
        do_2(i);
    }
}

Nothing prevents from moving the BB that calculates condition. The BB being
moved does not have data dependencies that prohibit such relocation. The
code does not adjust the FP-environment state so may be moved ahead of
`set_fp_environment`. But the transformed code has different semantics, as
`f` is calculated in different FP environment. To prevent from such
transformations we would need to consider all FP operations as accessing FP
state modeled as memory. It would prevent from any code reordering and
result in performance drop.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191003/ffbf79ea/attachment.html>

Doerfert, Johannes via llvm-dev

2019-Oct-03 15:45 UTC

head link

[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment

On 10/03, Serge Pavlov wrote:> On Thu, Oct 3, 2019 at 7:01 AM Finkel, Hal J. <hfinkel at anl.gov>
wrote:
> 
> > On 10/2/19 5:12 PM, Hal Finkel wrote:
> >
> > > On 10/1/19 12:35 AM, Serge Pavlov via llvm-dev wrote:
> > >
> > > The main concern about such approach is performance drop. Using
> > > constrained FP operations means that optimizations on FP
operations are
> > > turned off, this is the main reason of using them. Even if
non-default FP
> > > environment is used in a small piece of a function, optimizations
are
> > > turned off in entire function. For many practical application
this is
> > > unacceptable.
> >
> > The reason, as you're likely aware, that the constrained FP
operations
> > must be used within the entire function is that, if you mix the
constrained
> > FP operations with the normal ones, there's no way to prevent code
motion
> > from intermixing them.
>
> This proposal presents a way to prevent such intermixing. In some basic
> block we use normal FP operations, in others - constrained, BB attributes
> allows to check validity of instruction moves. 
I'm really unsure how feasible it is to look at basic block annotations
all the time. It might also interfere with CFG simplifications, e.g.,
block merging. Having "implicit" dependences is generally bad (IMHO).
> > Johannes and I discussed the outlining here offline, and two notes:
> >
> >  1. The outlining itself will prevent the undesired code motion today,
but
> > in the future we'll have IPO transformations that will need to be
> > specifically taught to avoid moving FP operations into these outlined
> > functions.
> >
> >  2. The outlined functions will need to be marked with noinline and
also
> > noimplicitfloat. In fact, all functions using the constrained
intrinsics
> > might need to be marked with noimplicitfloat. The above-mentioned
> > restrictions on IPO passes might be conditioned on the noimplicitfloat
> > attribute.
> >
> 
> Outlining is an interesting solution but unfortunately it is not an option
> for processors for machine learning. Branching is expensive on them and
> some processors do not have call instruction, all function calls must be
> eventually inlined.
Would "really late" inlining be an option?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191003/4937c62c/attachment.sig>

llvm dev - Oct 2019 - [RFC] Using basic block attributes to implement non-default floating point environment

[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment

[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment

[llvm-dev] [RFC] Using basic block attributes to implement non-default floating point environment