thr3ads.net - llvm dev - [llvm-dev] Floating point operations with specific rounding and exception properties [Aug 2019]

If this information is useful, please help other people find it:
Share via:

Serge Pavlov via llvm-dev

2019-Aug-20 17:00 UTC

[llvm-dev] Floating point operations with specific rounding and exception properties

Hi all,

During the review of https://reviews.llvm.org/D65997 an issue was revealed,
which relates to the decision of how compiler should represents constrained
floating point operations.

If a floating point operation requires rounding mode or exception behavior
different from the default, it should be represented by constrained
intrinsic (
http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics).
An important point is that according to the current design decision, if
some part of a function contains such intrinsic, all floating point
operations in the function must be represented by constrained intrinsics as
well. Such decision should prevent from undesired moves of fp operations.
The discussion is in the thread
http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html, the
relevant example is:

double f(double a, double b, double c) {
  {
#pragma STDC FENV_ACCESS ON
    feenableexcept(FE_OVERFLOW);
    double d = a * b;
    fedisableexcept(FE_OVERFLOW);
  }
  return c * d;
}


The second fmul must not be hoisted up to before the fedisableexcept. Using
constrained intrinsics is expected to help in this case as they are not
handled by optimization passes.

The concern is that using constrained intrinsics in a small region of a
function results in using such intrinsics everywhere in the function
including functions that inline it. As constrained intrinsics prevent from
optimizations, it can result in performance degradation.

A couple of examples:
1. There is a performance critical function that makes most of calculations
in default fp mode, but in some points it enables fp exceptions and makes
an action that can trigger such exception. Using constrained intrinsics
would result in performance loss, although the code that actually needs
them is very compact.
2. Cores that are used for machine learning usually work with short data
(half, bfloat16 or even shorter). Rounding control in this case is much
more important than for big cores; using proper rounding in different parts
of algorithm can gain precision. Constrained intrinsics is the only way to
enforce particular rounding mode. However using them results in poor
optimization, which is intolerable. In such cores rounding mode may be
encoded in instructions, so code movements cannot break semantics.

Representation of fp operations could be more flexible, so that a user
would not pay for rounding/exception control by performance degradation.
For that we need to be able to mix constrained intrinsics and regular fp
operation in a function.

The question is: how can we prevent from moving fp operations through
boundaries of a region, where specific rounding and/or exception behavior
are applied? Any ideas?

Thanks,
--Serge
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190821/de9e6993/attachment.html>

Cameron McInally via llvm-dev

2019-Aug-20 22:12 UTC

head link

[llvm-dev] Floating point operations with specific rounding and exception properties

On Tue, Aug 20, 2019 at 1:02 PM Serge Pavlov via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
> During the review of https://reviews.llvm.org/D65997
>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D65997&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=muQ0CzykdJ9Nhg0UshJTnEQXPSKHdFGptZXwFvVD2l0&e=>
> an issue was revealed, which relates to the decision of how compiler should
> represents constrained floating point operations.
>
> If a floating point operation requires rounding mode or exception behavior
> different from the default, it should be represented by constrained
> intrinsic (
> http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_LangRef.html-23constrained-2Dfloating-2Dpoint-2Dintrinsics&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=5LUnqvzxtBJcUMLBjiVfnEwD53rKH1ZIQmEcvXhbEdo&e=>).
> An important point is that according to the current design decision, if
> some part of a function contains such intrinsic, all floating point
> operations in the function must be represented by constrained intrinsics as
> well. Such decision should prevent from undesired moves of fp operations.
> The discussion is in the thread
> http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html
>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_pipermail_cfe-2Ddev_2017-2DAugust_055325.html&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=l5-qb-0vYUlkEbQT46x1HYz9WtpgOLaojeUkghA_QNg&e=>,
> the relevant example is:
>
> double f(double a, double b, double c) {
>   {
> #pragma STDC FENV_ACCESS ON
>     feenableexcept(FE_OVERFLOW);
>     double d = a * b;
>     fedisableexcept(FE_OVERFLOW);
>   }
>   return c * d;
> }
>
>
> The second fmul must not be hoisted up to before the fedisableexcept.
> Using constrained intrinsics is expected to help in this case as they are
> not handled by optimization passes.
>
> The concern is that using constrained intrinsics in a small region of a
> function results in using such intrinsics everywhere in the function
> including functions that inline it. As constrained intrinsics prevent from
> optimizations, it can result in performance degradation.
>
> A couple of examples:
> 1. There is a performance critical function that makes most of
> calculations in default fp mode, but in some points it enables fp
> exceptions and makes an action that can trigger such exception. Using
> constrained intrinsics would result in performance loss, although the code
> that actually needs them is very compact.
> 2. Cores that are used for machine learning usually work with short data
> (half, bfloat16 or even shorter). Rounding control in this case is much
> more important than for big cores; using proper rounding in different parts
> of algorithm can gain precision. Constrained intrinsics is the only way to
> enforce particular rounding mode. However using them results in poor
> optimization, which is intolerable. In such cores rounding mode may be
> encoded in instructions, so code movements cannot break semantics.
>
> Representation of fp operations could be more flexible, so that a user
> would not pay for rounding/exception control by performance degradation.
> For that we need to be able to mix constrained intrinsics and regular fp
> operation in a function.
>
> The question is: how can we prevent from moving fp operations through
> boundaries of a region, where specific rounding and/or exception behavior
> are applied? Any ideas?
>
Okay, I'll bite...

Preventing the hoisting of FP arithmetic was one of the driving factors in
creating the constrained intrinsics. If we could solve that problem, then
the constrained intrinsics would be *less* necessary (I say "less"
since
there are other problems, but hoisting is one of the significant ones).

That said, our out-of-tree FPEnv mode attempts to do just that --
selectively throttle unsafe optimizations. Barring any YDKWYDK's, I intend
to blow the doors off of the constrained intrinsics, performance-wise. :P
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190820/fd4c2b5b/attachment.html>

Serge Pavlov via llvm-dev

2019-Aug-21 01:13 UTC

head link

[llvm-dev] Floating point operations with specific rounding and exception properties

Which optimization did you find unsafe?

Thanks,
--Serge


ср, 21 авг. 2019 г. в 05:12, Cameron McInally <cameron.mcinally at
nyu.edu>:
> On Tue, Aug 20, 2019 at 1:02 PM Serge Pavlov via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>> During the review of https://reviews.llvm.org/D65997
>>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D65997&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=muQ0CzykdJ9Nhg0UshJTnEQXPSKHdFGptZXwFvVD2l0&e=>
>> an issue was revealed, which relates to the decision of how compiler
should
>> represents constrained floating point operations.
>>
>> If a floating point operation requires rounding mode or exception
>> behavior different from the default, it should be represented by
>> constrained intrinsic (
>> http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics
>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_LangRef.html-23constrained-2Dfloating-2Dpoint-2Dintrinsics&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=5LUnqvzxtBJcUMLBjiVfnEwD53rKH1ZIQmEcvXhbEdo&e=>).
>> An important point is that according to the current design decision, if
>> some part of a function contains such intrinsic, all floating point
>> operations in the function must be represented by constrained
intrinsics as
>> well. Such decision should prevent from undesired moves of fp
operations.
>> The discussion is in the thread
>> http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html
>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_pipermail_cfe-2Ddev_2017-2DAugust_055325.html&d=DwMFaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=O_4M49EtSpZ_-BQYeigzGv0P4__noMcSu2RYEjS1vKs&m=fTfAlQ0FHnQez3xiw8VnBL1XaxmBqn_-WD5E0mh4GrY&s=l5-qb-0vYUlkEbQT46x1HYz9WtpgOLaojeUkghA_QNg&e=>,
>> the relevant example is:
>>
>> double f(double a, double b, double c) {
>>   {
>> #pragma STDC FENV_ACCESS ON
>>     feenableexcept(FE_OVERFLOW);
>>     double d = a * b;
>>     fedisableexcept(FE_OVERFLOW);
>>   }
>>   return c * d;
>> }
>>
>>
>> The second fmul must not be hoisted up to before the fedisableexcept.
>> Using constrained intrinsics is expected to help in this case as they
are
>> not handled by optimization passes.
>>
>> The concern is that using constrained intrinsics in a small region of a
>> function results in using such intrinsics everywhere in the function
>> including functions that inline it. As constrained intrinsics prevent
from
>> optimizations, it can result in performance degradation.
>>
>> A couple of examples:
>> 1. There is a performance critical function that makes most of
>> calculations in default fp mode, but in some points it enables fp
>> exceptions and makes an action that can trigger such exception. Using
>> constrained intrinsics would result in performance loss, although the
code
>> that actually needs them is very compact.
>> 2. Cores that are used for machine learning usually work with short
data
>> (half, bfloat16 or even shorter). Rounding control in this case is much
>> more important than for big cores; using proper rounding in different
parts
>> of algorithm can gain precision. Constrained intrinsics is the only way
to
>> enforce particular rounding mode. However using them results in poor
>> optimization, which is intolerable. In such cores rounding mode may be
>> encoded in instructions, so code movements cannot break semantics.
>>
>> Representation of fp operations could be more flexible, so that a user
>> would not pay for rounding/exception control by performance
degradation.
>> For that we need to be able to mix constrained intrinsics and regular
fp
>> operation in a function.
>>
>> The question is: how can we prevent from moving fp operations through
>> boundaries of a region, where specific rounding and/or exception
behavior
>> are applied? Any ideas?
>>
>
> Okay, I'll bite...
>
> Preventing the hoisting of FP arithmetic was one of the driving factors in
> creating the constrained intrinsics. If we could solve that problem, then
> the constrained intrinsics would be *less* necessary (I say
"less" since
> there are other problems, but hoisting is one of the significant ones).
>
> That said, our out-of-tree FPEnv mode attempts to do just that --
> selectively throttle unsafe optimizations. Barring any YDKWYDK's, I
intend
> to blow the doors off of the constrained intrinsics, performance-wise. :P
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190821/24a0b818/attachment.html>

Simon Moll via llvm-dev

2019-Aug-21 08:52 UTC

head link

[llvm-dev] Floating point operations with specific rounding and exception properties

Hi,

The LLVM-VP extension (https://reviews.llvm.org/D57504) generalizes 
PatternMatch.h to match FP intrinsics as well as regular fp (vector) 
instructions with the same pattern. We use this to lift the pattern 
rewrites in InstSimplify and InstCombine to predicated vector 
instructions. The same logic could be applied to "scalar" constrained
FP
intrinsics. Hal has requested that the VP intrinsics model fp 
exception/rounding too.

So the suggestions is to keep using fp exception/rounding mode arguments 
but teaching LLVM to handle them in its optimizations and analysis.

Example
-----------

PatternMatch.h changes: https://reviews.llvm.org/D57504#change-cWgJ3XBlLNvs
AddSub in code in InstCombine: 
https://reviews.llvm.org/D57504#change-24P4gqRF9sNj
Note that "visitPredicatedFSub" will match either the regular FSub 
instruction or the llvm.vp.fsub intrinsic.


- Simon


On 8/20/19 7:00 PM, Serge Pavlov via llvm-dev wrote:> Hi all,
>
> During the review of https://reviews.llvm.org/D65997 an issue was 
> revealed, which relates to the decision of how compiler should 
> represents constrained floating point operations.
>
> If a floating point operation requires rounding mode or exception 
> behavior different from the default, it should be represented by 
> constrained intrinsic 
> (http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics). 
> An important point is that according to the current design decision, 
> if some part of a function contains such intrinsic, all floating point 
> operations in the function must be represented by constrained 
> intrinsics as well. Such decision should prevent from undesired moves 
> of fp operations. The discussion is in the thread 
> http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html, the 
> relevant example is:
>
>     double f(double a, double b, double c) {
>       {
>     #pragma STDC FENV_ACCESS ON
>         feenableexcept(FE_OVERFLOW);
>         double d = a * b;
>         fedisableexcept(FE_OVERFLOW);
>       }
>       return c * d;
>     }
>
>
> The second fmul must not be hoisted up to before the fedisableexcept. 
> Using constrained intrinsics is expected to help in this case as they 
> are not handled by optimization passes.
>
> The concern is that using constrained intrinsics in a small region of 
> a function results in using such intrinsics everywhere in the function 
> including functions that inline it. As constrained intrinsics prevent 
> from optimizations, it can result in performance degradation.
>
> A couple of examples:
> 1. There is a performance critical function that makes most of 
> calculations in default fp mode, but in some points it enables fp 
> exceptions and makes an action that can trigger such exception. Using 
> constrained intrinsics would result in performance loss, although the 
> code that actually needs them is very compact.
> 2. Cores that are used for machine learning usually work with short 
> data (half, bfloat16 or even shorter). Rounding control in this case 
> is much more important than for big cores; using proper rounding in 
> different parts of algorithm can gain precision. Constrained 
> intrinsics is the only way to enforce particular rounding mode. 
> However using them results in poor optimization, which is intolerable. 
> In such cores rounding mode may be encoded in instructions, so code 
> movements cannot break semantics.
>
> Representation of fp operations could be more flexible, so that a user 
> would not pay for rounding/exception control by performance 
> degradation. For that we need to be able to mix constrained intrinsics 
> and regular fp operation in a function.
>
> The question is: how can we prevent from moving fp operations through 
> boundaries of a region, where specific rounding and/or exception 
> behavior are applied? Any ideas?
>
> Thanks,
> --Serge
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190821/e52e9db5/attachment.html>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Aug 2019 - Floating point operations with specific rounding and exception properties

[llvm-dev] Floating point operations with specific rounding and exception properties

[llvm-dev] Floating point operations with specific rounding and exception properties

[llvm-dev] Floating point operations with specific rounding and exception properties

[llvm-dev] Floating point operations with specific rounding and exception properties

Apparently Analagous Threads