thr3ads.net - llvm dev - [llvm-dev] Floating point operations with specific rounding and exception properties [Aug 2019]

If this information is useful, please help other people find it:
Share via:

Serge Pavlov via llvm-dev

2019-Aug-21 01:13 UTC

[llvm-dev] Floating point operations with specific rounding and exception properties

Which optimization did you find unsafe?

Thanks,
--Serge


ср, 21 авг. 2019 г. в 05:12, Cameron McInally <cameron.mcinally at
nyu.edu>:
> On Tue, Aug 20, 2019 at 1:02 PM Serge Pavlov via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>> During the review of https://reviews.llvm.org/D65997
>>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D65997&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=muQ0CzykdJ9Nhg0UshJTnEQXPSKHdFGptZXwFvVD2l0&e=>
>> an issue was revealed, which relates to the decision of how compiler
should
>> represents constrained floating point operations.
>>
>> If a floating point operation requires rounding mode or exception
>> behavior different from the default, it should be represented by
>> constrained intrinsic (
>> http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_LangRef.html-23constrained-2Dfloating-2Dpoint-2Dintrinsics&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=5LUnqvzxtBJcUMLBjiVfnEwD53rKH1ZIQmEcvXhbEdo&e=>).
>> An important point is that according to the current design decision, if
>> some part of a function contains such intrinsic, all floating point
>> operations in the function must be represented by constrained
intrinsics as
>> well. Such decision should prevent from undesired moves of fp
operations.
>> The discussion is in the thread
>> http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html
>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_pipermail_cfe-2Ddev_2017-2DAugust_055325.html&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=l5-qb-0vYUlkEbQT46x1HYz9WtpgOLaojeUkghA_QNg&e=>,
>> the relevant example is:
>>
>> double f(double a, double b, double c) {
>>   {
>> #pragma STDC FENV_ACCESS ON
>>     feenableexcept(FE_OVERFLOW);
>>     double d = a * b;
>>     fedisableexcept(FE_OVERFLOW);
>>   }
>>   return c * d;
>> }
>>
>>
>> The second fmul must not be hoisted up to before the fedisableexcept.
>> Using constrained intrinsics is expected to help in this case as they
are
>> not handled by optimization passes.
>>
>> The concern is that using constrained intrinsics in a small region of a
>> function results in using such intrinsics everywhere in the function
>> including functions that inline it. As constrained intrinsics prevent
from
>> optimizations, it can result in performance degradation.
>>
>> A couple of examples:
>> 1. There is a performance critical function that makes most of
>> calculations in default fp mode, but in some points it enables fp
>> exceptions and makes an action that can trigger such exception. Using
>> constrained intrinsics would result in performance loss, although the
code
>> that actually needs them is very compact.
>> 2. Cores that are used for machine learning usually work with short
data
>> (half, bfloat16 or even shorter). Rounding control in this case is much
>> more important than for big cores; using proper rounding in different
parts
>> of algorithm can gain precision. Constrained intrinsics is the only way
to
>> enforce particular rounding mode. However using them results in poor
>> optimization, which is intolerable. In such cores rounding mode may be
>> encoded in instructions, so code movements cannot break semantics.
>>
>> Representation of fp operations could be more flexible, so that a user
>> would not pay for rounding/exception control by performance
degradation.
>> For that we need to be able to mix constrained intrinsics and regular
fp
>> operation in a function.
>>
>> The question is: how can we prevent from moving fp operations through
>> boundaries of a region, where specific rounding and/or exception
behavior
>> are applied? Any ideas?
>>
>
> Okay, I'll bite...
>
> Preventing the hoisting of FP arithmetic was one of the driving factors in
> creating the constrained intrinsics. If we could solve that problem, then
> the constrained intrinsics would be *less* necessary (I say
"less" since
> there are other problems, but hoisting is one of the significant ones).
>
> That said, our out-of-tree FPEnv mode attempts to do just that --
> selectively throttle unsafe optimizations. Barring any YDKWYDK's, I
intend
> to blow the doors off of the constrained intrinsics, performance-wise. :P
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190821/24a0b818/attachment.html>

Cameron McInally via llvm-dev

2019-Aug-21 14:57 UTC

head link

[llvm-dev] Floating point operations with specific rounding and exception properties

On Tue, Aug 20, 2019 at 9:15 PM Serge Pavlov <sepavloff at gmail.com>
wrote:
> Which optimization did you find unsafe?
>
> Thanks,
> --Serge
>
>
> ср, 21 авг. 2019 г. в 05:12, Cameron McInally <cameron.mcinally at
nyu.edu>:
>
>> On Tue, Aug 20, 2019 at 1:02 PM Serge Pavlov via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi all,
>>>
>>> During the review of https://reviews.llvm.org/D65997
>>>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D65997&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=muQ0CzykdJ9Nhg0UshJTnEQXPSKHdFGptZXwFvVD2l0&e=>
>>> an issue was revealed, which relates to the decision of how
compiler should
>>> represents constrained floating point operations.
>>>
>>> If a floating point operation requires rounding mode or exception
>>> behavior different from the default, it should be represented by
>>> constrained intrinsic (
>>>
http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
>>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_LangRef.html-23constrained-2Dfloating-2Dpoint-2Dintrinsics&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=5LUnqvzxtBJcUMLBjiVfnEwD53rKH1ZIQmEcvXhbEdo&e=>).
>>> An important point is that according to the current design
decision, if
>>> some part of a function contains such intrinsic, all floating point
>>> operations in the function must be represented by constrained
intrinsics as
>>> well. Such decision should prevent from undesired moves of fp
operations.
>>> The discussion is in the thread
>>> http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html
>>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_pipermail_cfe-2Ddev_2017-2DAugust_055325.html&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=l5-qb-0vYUlkEbQT46x1HYz9WtpgOLaojeUkghA_QNg&e=>,
>>> the relevant example is:
>>>
>>> double f(double a, double b, double c) {
>>>   {
>>> #pragma STDC FENV_ACCESS ON
>>>     feenableexcept(FE_OVERFLOW);
>>>     double d = a * b;
>>>     fedisableexcept(FE_OVERFLOW);
>>>   }
>>>   return c * d;
>>> }
>>>
>>>
>>> The second fmul must not be hoisted up to before the
fedisableexcept.
>>> Using constrained intrinsics is expected to help in this case as
they are
>>> not handled by optimization passes.
>>>
>>> The concern is that using constrained intrinsics in a small region
of a
>>> function results in using such intrinsics everywhere in the
function
>>> including functions that inline it. As constrained intrinsics
prevent from
>>> optimizations, it can result in performance degradation.
>>>
>>> A couple of examples:
>>> 1. There is a performance critical function that makes most of
>>> calculations in default fp mode, but in some points it enables fp
>>> exceptions and makes an action that can trigger such exception.
Using
>>> constrained intrinsics would result in performance loss, although
the code
>>> that actually needs them is very compact.
>>> 2. Cores that are used for machine learning usually work with short
data
>>> (half, bfloat16 or even shorter). Rounding control in this case is
much
>>> more important than for big cores; using proper rounding in
different parts
>>> of algorithm can gain precision. Constrained intrinsics is the only
way to
>>> enforce particular rounding mode. However using them results in
poor
>>> optimization, which is intolerable. In such cores rounding mode may
be
>>> encoded in instructions, so code movements cannot break semantics.
>>>
>>> Representation of fp operations could be more flexible, so that a
user
>>> would not pay for rounding/exception control by performance
degradation.
>>> For that we need to be able to mix constrained intrinsics and
regular fp
>>> operation in a function.
>>>
>>> The question is: how can we prevent from moving fp operations
through
>>> boundaries of a region, where specific rounding and/or exception
behavior
>>> are applied? Any ideas?
>>>
>>
>> Okay, I'll bite...
>>
>> Preventing the hoisting of FP arithmetic was one of the driving factors
>> in creating the constrained intrinsics. If we could solve that problem,
>> then the constrained intrinsics would be *less* necessary (I say
"less"
>> since there are other problems, but hoisting is one of the significant
>> ones).
>>
>> That said, our out-of-tree FPEnv mode attempts to do just that --
>> selectively throttle unsafe optimizations. Barring any YDKWYDK's, I
intend
>> to blow the doors off of the constrained intrinsics, performance-wise.
:P
>>
>>Oh, there are quite a lot. I mentioned Hoisting already. Constant Folding
is a big one. InstCombine and DAGCombine have some issues, like preserving
masks (op+select masks -- this may be less of a problem with true
predication). The LoopVectorizer also needed proper masks (not just masked
loads/stores) for targets that support them. APFloat has some issues (I'm
intending to upstream fixes for signaling NaNs, if I ever have time). And a
host of others.

Stepping back a little, the goal of FPEnv-safe compilation is just that...
to avoid unsafe FP transformations. The constrained intrinsics
implementation seeks to prevent almost all FP optimizations at first, safe
and unsafe, and then later add safe optimizations back in. My alternative
implementation is to find and *very* selectively throttle unsafe
optimizations -- my intuition says that there are far less unsafe
optimization than there are safe optimizations. I believe this is the much
shorter path. So, the two competing implementations are really attacking
the problem from two different ends. Who gets to the goal first is TBD...

To be completely fair, is my alternative solution the best path for
upstream LLVM? Maybe, maybe not. The constrained intrinsics will be far
less buggy in the early stages, since essentially all optimizations are
quashed. But in the same breath, safe code running at the equivalent of -O0
is fairly useless (at least to our customers).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190821/3f5895fe/attachment-0001.html>

Serge Pavlov via llvm-dev

2019-Aug-21 16:46 UTC

head link

[llvm-dev] Floating point operations with specific rounding and exception properties

Thank you for sharing your experience. It seems that LICM may also be
unsafe.

 my intuition says that there are far less unsafe optimization than
there> are safe optimizations.
...

 safe code running at the equivalent of -O0 is fairly useless


I believe this is the right viewpoint.There must be a way without
sacrificing performance.

Thanks,
--Serge


ср, 21 авг. 2019 г. в 21:57, Cameron McInally <cameron.mcinally at
nyu.edu>:
> On Tue, Aug 20, 2019 at 9:15 PM Serge Pavlov <sepavloff at gmail.com>
wrote:
>
>> Which optimization did you find unsafe?
>>
>> Thanks,
>> --Serge
>>
>>
>> ср, 21 авг. 2019 г. в 05:12, Cameron McInally <cameron.mcinally at
nyu.edu>:
>>
>>> On Tue, Aug 20, 2019 at 1:02 PM Serge Pavlov via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Hi all,
>>>>
>>>> During the review of https://reviews.llvm.org/D65997
>>>>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D65997&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=muQ0CzykdJ9Nhg0UshJTnEQXPSKHdFGptZXwFvVD2l0&e=>
>>>> an issue was revealed, which relates to the decision of how
compiler should
>>>> represents constrained floating point operations.
>>>>
>>>> If a floating point operation requires rounding mode or
exception
>>>> behavior different from the default, it should be represented
by
>>>> constrained intrinsic (
>>>>
http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
>>>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_LangRef.html-23constrained-2Dfloating-2Dpoint-2Dintrinsics&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=5LUnqvzxtBJcUMLBjiVfnEwD53rKH1ZIQmEcvXhbEdo&e=>).
>>>> An important point is that according to the current design
decision, if
>>>> some part of a function contains such intrinsic, all floating
point
>>>> operations in the function must be represented by constrained
intrinsics as
>>>> well. Such decision should prevent from undesired moves of fp
operations.
>>>> The discussion is in the thread
>>>> http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html
>>>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_pipermail_cfe-2Ddev_2017-2DAugust_055325.html&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=l5-qb-0vYUlkEbQT46x1HYz9WtpgOLaojeUkghA_QNg&e=>,
>>>> the relevant example is:
>>>>
>>>> double f(double a, double b, double c) {
>>>>   {
>>>> #pragma STDC FENV_ACCESS ON
>>>>     feenableexcept(FE_OVERFLOW);
>>>>     double d = a * b;
>>>>     fedisableexcept(FE_OVERFLOW);
>>>>   }
>>>>   return c * d;
>>>> }
>>>>
>>>>
>>>> The second fmul must not be hoisted up to before the
fedisableexcept.
>>>> Using constrained intrinsics is expected to help in this case
as they are
>>>> not handled by optimization passes.
>>>>
>>>> The concern is that using constrained intrinsics in a small
region of a
>>>> function results in using such intrinsics everywhere in the
function
>>>> including functions that inline it. As constrained intrinsics
prevent from
>>>> optimizations, it can result in performance degradation.
>>>>
>>>> A couple of examples:
>>>> 1. There is a performance critical function that makes most of
>>>> calculations in default fp mode, but in some points it enables
fp
>>>> exceptions and makes an action that can trigger such exception.
Using
>>>> constrained intrinsics would result in performance loss,
although the code
>>>> that actually needs them is very compact.
>>>> 2. Cores that are used for machine learning usually work with
short
>>>> data (half, bfloat16 or even shorter). Rounding control in this
case is
>>>> much more important than for big cores; using proper rounding
in different
>>>> parts of algorithm can gain precision. Constrained intrinsics
is the only
>>>> way to enforce particular rounding mode. However using them
results in poor
>>>> optimization, which is intolerable. In such cores rounding mode
may be
>>>> encoded in instructions, so code movements cannot break
semantics.
>>>>
>>>> Representation of fp operations could be more flexible, so that
a user
>>>> would not pay for rounding/exception control by performance
degradation.
>>>> For that we need to be able to mix constrained intrinsics and
regular fp
>>>> operation in a function.
>>>>
>>>> The question is: how can we prevent from moving fp operations
through
>>>> boundaries of a region, where specific rounding and/or
exception behavior
>>>> are applied? Any ideas?
>>>>
>>>
>>> Okay, I'll bite...
>>>
>>> Preventing the hoisting of FP arithmetic was one of the driving
factors
>>> in creating the constrained intrinsics. If we could solve that
problem,
>>> then the constrained intrinsics would be *less* necessary (I say
"less"
>>> since there are other problems, but hoisting is one of the
significant
>>> ones).
>>>
>>> That said, our out-of-tree FPEnv mode attempts to do just that --
>>> selectively throttle unsafe optimizations. Barring any
YDKWYDK's, I intend
>>> to blow the doors off of the constrained intrinsics,
performance-wise. :P
>>>
>>>
> Oh, there are quite a lot. I mentioned Hoisting already. Constant Folding
> is a big one. InstCombine and DAGCombine have some issues, like preserving
> masks (op+select masks -- this may be less of a problem with true
> predication). The LoopVectorizer also needed proper masks (not just masked
> loads/stores) for targets that support them. APFloat has some issues
(I'm
> intending to upstream fixes for signaling NaNs, if I ever have time). And a
> host of others.
>
> Stepping back a little, the goal of FPEnv-safe compilation is just that...
> to avoid unsafe FP transformations. The constrained intrinsics
> implementation seeks to prevent almost all FP optimizations at first, safe
> and unsafe, and then later add safe optimizations back in. My alternative
> implementation is to find and *very* selectively throttle unsafe
> optimizations -- my intuition says that there are far less unsafe
> optimization than there are safe optimizations. I believe this is the much
> shorter path. So, the two competing implementations are really attacking
> the problem from two different ends. Who gets to the goal first is TBD...
>
> To be completely fair, is my alternative solution the best path for
> upstream LLVM? Maybe, maybe not. The constrained intrinsics will be far
> less buggy in the early stages, since essentially all optimizations are
> quashed. But in the same breath, safe code running at the equivalent of -O0
> is fairly useless (at least to our customers).
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190821/9cb71360/attachment.html>

llvm dev - Aug 2019 - Floating point operations with specific rounding and exception properties

[llvm-dev] Floating point operations with specific rounding and exception properties

[llvm-dev] Floating point operations with specific rounding and exception properties

[llvm-dev] Floating point operations with specific rounding and exception properties