thr3ads.net - llvm dev - [llvm-dev] what does -ffp-contract=fast allow? [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2016-Nov-18 18:19 UTC

[llvm-dev] what does -ffp-contract=fast allow?

----- Original Message -----
> From: "Sanjay Patel" <spatel at rotateright.com>
> To: "Hal J. Finkel" <hfinkel at anl.gov>
> Cc: "Mehdi Amini" <mehdi.amini at apple.com>,
"llvm-dev"
> <llvm-dev at lists.llvm.org>, "cfe-dev" <cfe-dev at
lists.llvm.org>,
> "andrew kaylor" <andrew.kaylor at intel.com>, "Nicolai
Hähnle"
> <nhaehnle at gmail.com>, "Warren Ristow" <warren.ristow
at sony.com>
> Sent: Friday, November 18, 2016 10:37:08 AM
> Subject: Re: what does -ffp-contract=fast allow?
> fp-contract is confusing, so let me try to summarize that and the
> underlying implementation:
> 1. -ffp-contract=on means honor the compiler's default FP_CONTRACT
> setting or any FP_CONTRACT pragmas in the source. Currently, clang
> defaults to "OFF". The shouting is not an accident; this is not
the
> same as the flag's "off" setting. This is described nicely
here:
> https://reviews.llvm.org/D24481
> If we set "on" in the invocation *and* we set "ON" in
the source,
> clang will generate @llvm.fmuladd intrinsics for expressions like
> x*y+z. If you split that into 2 lines in C with a temp variable
> assignment, it's no longer a single expression, so no FMA for you.
> The @llvm.fmuladd intrinsic is our way of preserving the C source
> information through the optimizer. If we don't end up producing an
> FMA instruction for the target in this case, it's a bug.
This is not correct. 

First, the behavior of -ffp-contract=on/off should just set the default state of
the pragma. Once we finish fixing up the test suite to allow us to actually flip
the default, this will actually be the case (the review description referenced
above is not clear on the desired end state in this regard). Hopefully, this
work will be done soon.

Second, it is specifically *not* a bug if @llvm.fmuladd does not become an FMA
on the target. It only represents an allowable place to form an FMA. The LangRef
specifically states, "Fusion is not guaranteed, even if the target platform
supports it." The @llvm.fma intrinsic should become an FMA if the target
supports it.
> 2. -ffp-contract=fast means override the compiler's default
"OFF"
> setting and override source pragmas to generate FMA when possible,
> even across C expressions. The "fast" naming is unfortunate
because
> this does *not* enable most fast-math. Ie, as everyone in this
> thread agrees so far, we are not allowed to do the reassociation in
> the example. It's not strict math though because of that trailing
> clause that let's us generate FMA across expressions.
> Here's where it gets more complicated and possibly buggy. Clang does
> not generate llvm.fmuladd intrinsics with this setting. In this
> mode, clang generates individual fmul and fadd instructions and
> relies on the backend to fuse those back together.This is definitely not a bug. The problem with the C rules for contraction,
which only allow fusion within a C-language statement, don't allow fusion
opportunities that appear only after function inlining (or, obviously, across
statements in any other sense). This is a real problem, especially in C++ code,
where there are a lot of small inline functions in abstraction layers that users
expect the compiler to see through before deciding on fusion. Even within a
function, the fusions allowed by the C rules are not necessarily
performance-optimal.
> More background here:
> https://llvm.org/bugs/show_bug.cgi?id=17211
> I don't know if it's possible, but if we're in this mode and
some IR
> transform pass managed to move/kill an fmul or fadd that was
> destined to be part of an FMA, I think that would be a bug.No, this also would not be a bug (although could be bad for performance on some
architectures).
> This mode is also completely broken with LTO because we're using a
> TargetOption to communicate the FMA mode to the backend; there is no
> instruction-level or function-level attribute/metadata for FMA-ness:
> https://llvm.org/bugs/show_bug.cgi?id=25721
Interesting; we should at least have a function-attribute for this that Clang
uses.

Thanks again, 
Hal 
> To tie this back to the earlier thread about changes to IR FMF, the
> possibility of adding FMA bits to FMF (as well as storing all FMF in
> metadata) was discussed here:
> https://llvm.org/bugs/show_bug.cgi?id=13118
> 3. The backend needs a thread of its own. We have at least these
> mechanisms to handle FMA codegen:
> a. TargetOptions for LessPreciseFPMADOption, UnsafeFPMath,
> NoInfsFPMath, NoNaNsFPMath, AllowFPOpFusion (Fast, Standard, Strict)
> b. SDNodeFlags for UnsafeAlgebra, NoNaNs, NoInfs, NoSignedZeros (but
> nothing for FMA since IR FMF has nothing for FMA)
> c. SelectionDAGTargetInfo::generateFMAsInMachineCombiner()
> d. TargetLoweringBase::isFMAFasterThanFMulAndFAdd()
> e. TargetLoweringBase::enableAggressiveFMAFusion()
> f. ISD::FMA (no intermediate rounding step) and ISD::FMAD (has
> intermediate rounding) nodes
> On Thu, Nov 17, 2016 at 6:03 PM, Finkel, Hal J. < hfinkel at anl.gov
>
> wrote:
> > Sent from my Verizon Wireless 4G LTE DROID
> 
> > On Nov 17, 2016 5:53 PM, Mehdi Amini < mehdi.amini at apple.com
>
> > wrote:
> 
> > >
> 
> > >
> 
> > >> On Nov 17, 2016, at 4:33 PM, Hal Finkel < hfinkel at
anl.gov >
> > >> wrote:
> 
> > >>
> 
> > >>
> 
> > >> ________________________________
> 
> > >>>
> 
> > >>> From: "Warren Ristow" < warren.ristow at
sony.com >
> 
> > >>> To: "Sanjay Patel" < spatel at
rotateright.com >, "cfe-dev" <
> > >>> cfe-dev at lists.llvm.org >, "llvm-dev" <
llvm-dev at lists.llvm.org
> > >>> >
> 
> > >>> Cc: "Nicolai Hähnle" < nhaehnle at gmail.com
>, "Hal Finkel" <
> > >>> hfinkel at anl.gov >, "Mehdi Amini" <
mehdi.amini at apple.com >,
> > >>> "andrew kaylor" < andrew.kaylor at intel.com
>
> 
> > >>> Sent: Thursday, November 17, 2016 5:58:58 PM
> 
> > >>> Subject: RE: what does -ffp-contract=fast allow?
> 
> > >>>
> 
> > >>> > Is this a bug? We transformed the original
expression into:
> 
> > >>> > x * y + x
> 
> > >>>
> 
> > >>> I’d say yes, it’s a bug.
> 
> > >>>
> 
> > >>>
> 
> > >>>
> 
> > >>> Unless ‑ffast‑math is used (or some appropriate subset
that
> > >>> gives
> > >>> us leeway, like ‑fno‑honor‑infinities or ‑fno‑honor‑nans,
or
> > >>> somesuch), the re-association isn’t allowed, and that
blocks
> > >>> the
> > >>> madd contraction.
> 
> > >>
> 
> > >> I agree. FP contraction alone only allows us to do x*y+z
->
> > >> fma(x,y,z).
> 
> > >
> 
> > >
> 
> > > I agree too, but the more difficult question is "which flags
are
> > > needed here?”
> 
> > > Would FPContract + no-inf be enough? If not why and how to
> > > document
> > > it?
> 
> > I think that the relevant question is: Is the contracted form more
> > precise for all inputs (or the same precision as the original)? If
> > so, then this should be allowed with just fp-contract+no-inf.
> > Otherwise, more is required.
> 
> > -Hal
> 
> > >
> 
> > >
> 
> > > —
> 
> > > Mehdi
> 
> > >
> 
> > >
> 
> > >
> 
> > >>>
> 
> > >>>
> 
> > >>> From: Sanjay Patel [mailto: spatel at rotateright.com ]
> 
> > >>> Sent: Thursday, November 17, 2016 3:22 PM
> 
> > >>> To: cfe-dev < cfe-dev at lists.llvm.org >; llvm-dev
<
> > >>> llvm-dev at lists.llvm.org >
> 
> > >>> Cc: Nicolai Hähnle < nhaehnle at gmail.com >; Hal
Finkel <
> > >>> hfinkel at anl.gov >; Mehdi Amini < mehdi.amini at
apple.com >;
> > >>> Ristow, Warren < warren.ristow at sony.com >;
> > >>> andrew.kaylor at intel.com
> 
> > >>> Subject: what does -ffp-contract=fast allow?
> 
> > >>>
> 
> > >>>
> 
> > >>>
> 
> > >>> This is just paraphrasing from D26602, so credit to
Nicolai for
> > >>> first raising the issue there.
> 
> > >>>
> 
> > >>> float foo(float x, float y) {
> 
> > >>> return x * (y + 1);
> 
> > >>> }
> 
> > >>>
> 
> > >>> $ ./clang -O2 xy1.c -S -o - -target aarch64
-ffp-contract=fast
> > >>> |
> > >>> grep fm
> 
> > >>> fmadd s0, s1, s0, s0
> 
> > >>>
> 
> > >>> Is this a bug? We transformed the original expression
into:
> 
> > >>> x * y + x
> 
> > >>>
> 
> > >>> When x=INF and y=0, the code returns INF if we don't
> > >>> reassociate.
> > >>> With reassociation to FMA, it returns NAN because 0 * INF
> > >>> NAN.
> 
> > >>>
> 
> > >>> 1. I used aarch64 as the example target, but this is not
> > >>> target-dependent (as long as the target has FMA).
> 
> > >>>
> 
> > >>> 2. This is *not* -ffast-math...or is it? The C standard
only
> > >>> shows on/off settings for the associated FP_CONTRACT
pragma.
> 
> > >>>
> 
> > >>> 3. AFAIK, clang has no documentation for -ffp-contract:
> 
> > >>> http://clang.llvm.org/docs/UsersManual.html
> 
> > >>>
> 
> > >>> 4. GCC says:
> 
> > >>>
https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/Optimize-Options.html#Optimize-Options
> 
> > >>> "-ffp-contract=fast enables floating-point
expression
> > >>> contraction
> > >>> such as forming of fused multiply-add operations if the
target
> > >>> has native support for them."
> 
> > >>>
> 
> > >>> 5. The LLVM backend (where this reassociation currently
> > >>> happens)
> > >>> shows:
> 
> > >>> FPOpFusion::Fast - Enable fusion of FP ops wherever
it's
> > >>> profitable.
> 
> > >>
> 
> > >>
> 
> > >>
> 
> > >>
> 
> > >> --
> 
> > >> Hal Finkel
> 
> > >> Lead, Compiler Technology and Programming Languages
> 
> > >> Leadership Computing Facility
> 
> > >> Argonne National Laboratory
> 
> > >
> 
> > >
> 
-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161118/870e1fd5/attachment.html>

Sanjay Patel via llvm-dev

2016-Nov-18 19:35 UTC

head link

[llvm-dev] what does -ffp-contract=fast allow?

On Fri, Nov 18, 2016 at 11:19 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>
> ------------------------------
>
> *From: *"Sanjay Patel" <spatel at rotateright.com>
> *To: *"Hal J. Finkel" <hfinkel at anl.gov>
> *Cc: *"Mehdi Amini" <mehdi.amini at apple.com>,
"llvm-dev" <
> llvm-dev at lists.llvm.org>, "cfe-dev" <cfe-dev at
lists.llvm.org>, "andrew
> kaylor" <andrew.kaylor at intel.com>, "Nicolai Hähnle"
<nhaehnle at gmail.com>,
> "Warren Ristow" <warren.ristow at sony.com>
> *Sent: *Friday, November 18, 2016 10:37:08 AM
> *Subject: *Re: what does -ffp-contract=fast allow?
>
> fp-contract is confusing, so let me try to summarize that and the
> underlying implementation:
>
> 1. -ffp-contract=on means honor the compiler's default FP_CONTRACT
setting
> or any FP_CONTRACT pragmas in the source. Currently, clang defaults to
> "OFF". The shouting is not an accident; this is not the same as
the flag's
> "off" setting. This is described nicely here:
> https://reviews.llvm.org/D24481
>
> If we set "on" in the invocation *and* we set "ON" in
the source, clang
> will generate @llvm.fmuladd intrinsics for expressions like x*y+z. If you
> split that into 2 lines in C with a temp variable assignment, it's no
> longer a single expression, so no FMA for you. The @llvm.fmuladd intrinsic
> is our way of preserving the C source information through the optimizer. If
> we don't end up producing an FMA instruction for the target in this
case,
> it's a bug.
>
> This is not correct.
>
> First, the behavior of -ffp-contract=on/off should just set the default
> state of the pragma. Once we finish fixing up the test suite to allow us to
> actually flip the default, this will actually be the case (the review
> description referenced above is not clear on the desired end state in this
> regard). Hopefully, this work will be done soon.
>
> Second, it is specifically *not* a bug if @llvm.fmuladd does not become an
> FMA on the target. It only represents an allowable place to form an FMA.
> The LangRef specifically states, "Fusion is not guaranteed, even if
the
> target platform supports it." The @llvm.fma intrinsic should become an
FMA
> if the target supports it.
>
Ah, I mixed up llvm.fma and llvm.fmuladd. The FP_CONTRACT ON setting allows
- but does not require - FMA codegen within a C statement. So the use of
llvm.fmuladd is our way of preserving the C statement boundary and is the
"blessed" op that the backend recognizes when operating in
FPOpFusionMode::Standard.




>
>
> 2. -ffp-contract=fast means override the compiler's default
"OFF" setting
> and override source pragmas to generate FMA when possible, even across C
> expressions. The "fast" naming is unfortunate because this does
*not*
> enable most fast-math. Ie, as everyone in this thread agrees so far, we are
> not allowed to do the reassociation in the example. It's not strict
math
> though because of that trailing clause that let's us generate FMA
across
> expressions.
>
> Here's where it gets more complicated and possibly buggy. Clang does
not
> generate llvm.fmuladd intrinsics with this setting. In this mode, clang
> generates individual fmul and fadd instructions and relies on the backend
> to fuse those back together.
>
> This is definitely not a bug. The problem with the C rules for
> contraction, which only allow fusion within a C-language statement,
don't
> allow fusion opportunities that appear only after function inlining (or,
> obviously, across statements in any other sense). This is a real problem,
> especially in C++ code, where there are a lot of small inline functions in
> abstraction layers that users expect the compiler to see through before
> deciding on fusion. Even within a function, the fusions allowed by the C
> rules are not necessarily performance-optimal.
>
> More background here:
> https://llvm.org/bugs/show_bug.cgi?id=17211
>
> I don't know if it's possible, but if we're in this mode and
some IR
> transform pass managed to move/kill an fmul or fadd that was destined to be
> part of an FMA, I think that would be a bug.
>
> No, this also would not be a bug (although could be bad for performance on
> some architectures).
>
> This mode is also completely broken with LTO because we're using a
> TargetOption to communicate the FMA mode to the backend; there is no
> instruction-level or function-level attribute/metadata for FMA-ness:
> https://llvm.org/bugs/show_bug.cgi?id=25721
>
> Interesting; we should at least have a function-attribute for this that
> Clang uses.
>
> Thanks again,
> Hal
>
> To tie this back to the earlier thread about changes to IR FMF, the
> possibility of adding FMA bits to FMF (as well as storing all FMF in
> metadata) was discussed here:
> https://llvm.org/bugs/show_bug.cgi?id=13118
>
> 3. The backend needs a thread of its own. We have at least these
> mechanisms to handle FMA codegen:
> a. TargetOptions for LessPreciseFPMADOption, UnsafeFPMath, NoInfsFPMath,
> NoNaNsFPMath, AllowFPOpFusion (Fast, Standard, Strict)
> b. SDNodeFlags for UnsafeAlgebra, NoNaNs, NoInfs, NoSignedZeros (but
> nothing for FMA since IR FMF has nothing for FMA)
> c. SelectionDAGTargetInfo::generateFMAsInMachineCombiner()
> d. TargetLoweringBase::isFMAFasterThanFMulAndFAdd()
> e. TargetLoweringBase::enableAggressiveFMAFusion()
> f. ISD::FMA (no intermediate rounding step) and ISD::FMAD (has
> intermediate rounding) nodes
>
>
> On Thu, Nov 17, 2016 at 6:03 PM, Finkel, Hal J. <hfinkel at anl.gov>
wrote:
>
>> *Sent from my Verizon Wireless 4G LTE DROID*
>> *On Nov 17, 2016 5:53 PM, Mehdi Amini <**mehdi.amini at apple.com*
>> <mehdi.amini at apple.com>*> wrote:*
>> *>*
>> *>*
>> *>> On Nov 17, 2016, at 4:33 PM, Hal Finkel <**hfinkel at
anl.gov*
>> <hfinkel at anl.gov>*> wrote:*
>> *>>*
>> *>>*
>> *>> ________________________________*
>> *>>>*
>> *>>> From: "Warren Ristow" <**warren.ristow at
sony.com*
>> <warren.ristow at sony.com>*>*
>> *>>> To: "Sanjay Patel" <**spatel at
rotateright.com*
>> <spatel at rotateright.com>*>, "cfe-dev"
<**cfe-dev at lists.llvm.org*
>> <cfe-dev at lists.llvm.org>*>, "llvm-dev"
<**llvm-dev at lists.llvm.org*
>> <llvm-dev at lists.llvm.org>*>*
>> *>>> Cc: "Nicolai Hähnle" <**nhaehnle at
gmail.com* <nhaehnle at gmail.com>*>,
>> "Hal Finkel" <**hfinkel at anl.gov* <hfinkel at
anl.gov>*>, "Mehdi Amini" <*
>> *mehdi.amini at apple.com* <mehdi.amini at apple.com>*>,
"andrew kaylor" <*
>> *andrew.kaylor at intel.com* <andrew.kaylor at intel.com>*>*
>> *>>> Sent: Thursday, November 17, 2016 5:58:58 PM*
>> *>>> Subject: RE: what does -ffp-contract=fast allow?*
>> *>>>*
>> *>>> > Is this a bug? We transformed the original
expression into:*
>> *>>> > x * y + x*
>> *>>>*
>> *>>> I’d say yes, it’s a bug.*
>> *>>>*
>> *>>>  *
>> *>>>*
>> *>>> Unless ‑ffast‑math is used (or some appropriate subset
that gives us
>> leeway, like ‑fno‑honor‑infinities or ‑fno‑honor‑nans, or somesuch),
the
>> re-association isn’t allowed, and that blocks the madd contraction.*
>> *>>*
>> *>> I agree. FP contraction alone only allows us to do x*y+z
->
>> fma(x,y,z).*
>> *>*
>> *>*
>> *> I agree too, but the more difficult question is "which flags
are
>> needed here?”*
>> *> Would FPContract + no-inf be enough? If not why and how to
document
>> it?*
>>
>> *I think that the relevant question is: Is the contracted form more
>> precise for all inputs (or the same precision as the original)? If so,
then
>> this should be allowed with just fp-contract+no-inf. Otherwise, more is
>> required.*
>>
>> *-Hal*
>>
>> *>*
>> *>*
>> *> — *
>> *> Mehdi*
>> *>*
>> *>*
>> *>*
>> *>>>  *
>> *>>>*
>> *>>> From: Sanjay Patel [mailto:**spatel at rotateright.com*
>> <spatel at rotateright.com>*] *
>> *>>> Sent: Thursday, November 17, 2016 3:22 PM*
>> *>>> To: cfe-dev <**cfe-dev at lists.llvm.org* <cfe-dev
at lists.llvm.org>*>;
>> llvm-dev <**llvm-dev at lists.llvm.org* <llvm-dev at
lists.llvm.org>*>*
>> *>>> Cc: Nicolai Hähnle <**nhaehnle at gmail.com*
<nhaehnle at gmail.com>*>;
>> Hal Finkel <**hfinkel at anl.gov* <hfinkel at anl.gov>*>;
Mehdi Amini <*
>> *mehdi.amini at apple.com* <mehdi.amini at apple.com>*>;
Ristow, Warren <*
>> *warren.ristow at sony.com* <warren.ristow at sony.com>*>; *
>> *andrew.kaylor at intel.com* <andrew.kaylor at intel.com>
>> *>>> Subject: what does -ffp-contract=fast allow?*
>> *>>>*
>> *>>>  *
>> *>>>*
>> *>>> This is just paraphrasing from D26602, so credit to
Nicolai for
>> first raising the issue there.*
>> *>>>*
>> *>>> float foo(float x, float y) {*
>> *>>>   return x * (y + 1);*
>> *>>> }*
>> *>>>*
>> *>>> $ ./clang -O2 xy1.c -S -o - -target aarch64 
-ffp-contract=fast |
>> grep fm*
>> *>>>     fmadd    s0, s1, s0, s0*
>> *>>>*
>> *>>> Is this a bug? We transformed the original expression
into:*
>> *>>> x * y + x*
>> *>>>*
>> *>>> When x=INF and y=0, the code returns INF if we don't
reassociate.
>> With reassociation to FMA, it returns NAN because 0 * INF = NAN.*
>> *>>>*
>> *>>> 1. I used aarch64 as the example target, but this is not
>> target-dependent (as long as the target has FMA).*
>> *>>>*
>> *>>> 2. This is *not* -ffast-math...or is it? The C standard
only shows
>> on/off settings for the associated FP_CONTRACT pragma.*
>> *>>>*
>> *>>> 3. AFAIK, clang has no documentation for -ffp-contract:*
>> *>>> **http://clang.llvm.org/docs/UsersManual.html*
>> <http://clang.llvm.org/docs/UsersManual.html>
>> *>>>*
>> *>>> 4. GCC says:*
>> *>>> *
>>
*https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/Optimize-Options.html#Optimize-Options*
>>
<https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/Optimize-Options.html#Optimize-Options>
>> *>>> "-ffp-contract=fast enables floating-point
expression contraction
>> such as forming of fused multiply-add operations if the target has
native
>> support for them."*
>> *>>>*
>> *>>> 5. The LLVM backend (where this reassociation currently
happens)
>> shows:*
>> *>>> FPOpFusion::Fast - Enable fusion of FP ops wherever
it's profitable.*
>> *>>*
>> *>>*
>> *>>*
>> *>>*
>> *>> -- *
>> *>> Hal Finkel*
>> *>> Lead, Compiler Technology and Programming Languages*
>> *>> Leadership Computing Facility*
>> *>> Argonne National Laboratory*
>> *>*
>> *>*
>>
>>
>
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161118/261d3801/attachment.html>

Hal Finkel via llvm-dev

2016-Nov-18 22:53 UTC

head link

[llvm-dev] what does -ffp-contract=fast allow?

----- Original Message -----
> From: "Sanjay Patel" <spatel at rotateright.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Mehdi Amini" <mehdi.amini at apple.com>,
"llvm-dev"
> <llvm-dev at lists.llvm.org>, "cfe-dev" <cfe-dev at
lists.llvm.org>,
> "andrew kaylor" <andrew.kaylor at intel.com>, "Nicolai
Hähnle"
> <nhaehnle at gmail.com>, "Warren Ristow" <warren.ristow
at sony.com>
> Sent: Friday, November 18, 2016 1:35:44 PM
> Subject: Re: what does -ffp-contract=fast allow?
> On Fri, Nov 18, 2016 at 11:19 AM, Hal Finkel < hfinkel at anl.gov >
> wrote:
> > > From: "Sanjay Patel" < spatel at rotateright.com
>
> > 
> 
> > > To: "Hal J. Finkel" < hfinkel at anl.gov >
> > 
> 
> > > Cc: "Mehdi Amini" < mehdi.amini at apple.com >,
"llvm-dev" <
> > > llvm-dev at lists.llvm.org >, "cfe-dev" < cfe-dev
at lists.llvm.org >,
> > > "andrew kaylor" < andrew.kaylor at intel.com >,
"Nicolai Hähnle" <
> > > nhaehnle at gmail.com >, "Warren Ristow" <
warren.ristow at sony.com >
> > 
> 
> > > Sent: Friday, November 18, 2016 10:37:08 AM
> > 
> 
> > > Subject: Re: what does -ffp-contract=fast allow?
> > 
> 
> > > fp-contract is confusing, so let me try to summarize that and the
> > > underlying implementation:
> > 
> 
> > > 1. -ffp-contract=on means honor the compiler's default
> > > FP_CONTRACT
> > > setting or any FP_CONTRACT pragmas in the source. Currently,
> > > clang
> > > defaults to "OFF". The shouting is not an accident;
this is not
> > > the
> > > same as the flag's "off" setting. This is described
nicely here:
> > 
> 
> > > https://reviews.llvm.org/D24481
> > 
> 
> > > If we set "on" in the invocation *and* we set
"ON" in the source,
> > > clang will generate @llvm.fmuladd intrinsics for expressions like
> > > x*y+z. If you split that into 2 lines in C with a temp variable
> > > assignment, it's no longer a single expression, so no FMA for
> > > you.
> > > The @llvm.fmuladd intrinsic is our way of preserving the C source
> > > information through the optimizer. If we don't end up
producing
> > > an
> > > FMA instruction for the target in this case, it's a bug.
> > 
> 
> > This is not correct.
> 
> > First, the behavior of -ffp-contract=on/off should just set the
> > default state of the pragma. Once we finish fixing up the test
> > suite
> > to allow us to actually flip the default, this will actually be the
> > case (the review description referenced above is not clear on the
> > desired end state in this regard). Hopefully, this work will be
> > done
> > soon.
> 
> > Second, it is specifically *not* a bug if @llvm.fmuladd does not
> > become an FMA on the target. It only represents an allowable place
> > to form an FMA. The LangRef specifically states, "Fusion is not
> > guaranteed, even if the target platform supports it." The
@llvm.fma
> > intrinsic should become an FMA if the target supports it.
> 
> Ah, I mixed up llvm.fma and llvm.fmuladd. The FP_CONTRACT ON setting
> allows - but does not require - FMA codegen within a C statement. So
> the use of llvm.fmuladd is our way of preserving the C statement
> boundary and is the "blessed" op that the backend recognizes when
> operating in FPOpFusionMode::Standard.
That's correct. 

Thanks again, 
Hal 
> > > 2. -ffp-contract=fast means override the compiler's default
"OFF"
> > > setting and override source pragmas to generate FMA when
> > > possible,
> > > even across C expressions. The "fast" naming is
unfortunate
> > > because
> > > this does *not* enable most fast-math. Ie, as everyone in this
> > > thread agrees so far, we are not allowed to do the reassociation
> > > in
> > > the example. It's not strict math though because of that
trailing
> > > clause that let's us generate FMA across expressions.
> > 
> 
> > > Here's where it gets more complicated and possibly buggy.
Clang
> > > does
> > > not generate llvm.fmuladd intrinsics with this setting. In this
> > > mode, clang generates individual fmul and fadd instructions and
> > > relies on the backend to fuse those back together.
> > 
> 
> > This is definitely not a bug. The problem with the C rules for
> > contraction, which only allow fusion within a C-language statement,
> > don't allow fusion opportunities that appear only after function
> > inlining (or, obviously, across statements in any other sense).
> > This
> > is a real problem, especially in C++ code, where there are a lot of
> > small inline functions in abstraction layers that users expect the
> > compiler to see through before deciding on fusion. Even within a
> > function, the fusions allowed by the C rules are not necessarily
> > performance-optimal.
> 
> > > More background here:
> > 
> 
> > > https://llvm.org/bugs/show_bug.cgi?id=17211
> > 
> 
> > > I don't know if it's possible, but if we're in this
mode and some
> > > IR
> > > transform pass managed to move/kill an fmul or fadd that was
> > > destined to be part of an FMA, I think that would be a bug.
> > 
> 
> > No, this also would not be a bug (although could be bad for
> > performance on some architectures).
> 
> > > This mode is also completely broken with LTO because we're
using
> > > a
> > > TargetOption to communicate the FMA mode to the backend; there is
> > > no
> > > instruction-level or function-level attribute/metadata for
> > > FMA-ness:
> > 
> 
> > > https://llvm.org/bugs/show_bug.cgi?id=25721
> > 
> 
> > Interesting; we should at least have a function-attribute for this
> > that Clang uses.
> 
> > Thanks again,
> 
> > Hal
> 
> > > To tie this back to the earlier thread about changes to IR FMF,
> > > the
> > > possibility of adding FMA bits to FMF (as well as storing all FMF
> > > in
> > > metadata) was discussed here:
> > 
> 
> > > https://llvm.org/bugs/show_bug.cgi?id=13118
> > 
> 
> > > 3. The backend needs a thread of its own. We have at least these
> > > mechanisms to handle FMA codegen:
> > 
> 
> > > a. TargetOptions for LessPreciseFPMADOption, UnsafeFPMath,
> > > NoInfsFPMath, NoNaNsFPMath, AllowFPOpFusion (Fast, Standard,
> > > Strict)
> > 
> 
> > > b. SDNodeFlags for UnsafeAlgebra, NoNaNs, NoInfs, NoSignedZeros
> > > (but
> > > nothing for FMA since IR FMF has nothing for FMA)
> > 
> 
> > > c. SelectionDAGTargetInfo::generateFMAsInMachineCombiner()
> > 
> 
> > > d. TargetLoweringBase::isFMAFasterThanFMulAndFAdd()
> > 
> 
> > > e. TargetLoweringBase::enableAggressiveFMAFusion()
> > 
> 
> > > f. ISD::FMA (no intermediate rounding step) and ISD::FMAD (has
> > > intermediate rounding) nodes
> > 
> 
> > > On Thu, Nov 17, 2016 at 6:03 PM, Finkel, Hal J. < hfinkel at
anl.gov
> > > >
> > > wrote:
> > 
> 
> > > > Sent from my Verizon Wireless 4G LTE DROID
> > > 
> > 
> 
> > > > On Nov 17, 2016 5:53 PM, Mehdi Amini < mehdi.amini at
apple.com >
> > > > wrote:
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >> On Nov 17, 2016, at 4:33 PM, Hal Finkel <
hfinkel at anl.gov >
> > > > >> wrote:
> > > 
> > 
> 
> > > > >>
> > > 
> > 
> 
> > > > >>
> > > 
> > 
> 
> > > > >> ________________________________
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> From: "Warren Ristow" <
warren.ristow at sony.com >
> > > 
> > 
> 
> > > > >>> To: "Sanjay Patel" < spatel at
rotateright.com >, "cfe-dev" <
> > > > >>> cfe-dev at lists.llvm.org >,
"llvm-dev" <
> > > > >>> llvm-dev at lists.llvm.org
> > > > >>> >
> > > 
> > 
> 
> > > > >>> Cc: "Nicolai Hähnle" < nhaehnle at
gmail.com >, "Hal Finkel" <
> > > > >>> hfinkel at anl.gov >, "Mehdi
Amini" < mehdi.amini at apple.com >,
> > > > >>> "andrew kaylor" < andrew.kaylor at
intel.com >
> > > 
> > 
> 
> > > > >>> Sent: Thursday, November 17, 2016 5:58:58 PM
> > > 
> > 
> 
> > > > >>> Subject: RE: what does -ffp-contract=fast
allow?
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> > Is this a bug? We transformed the original
expression
> > > > >>> > into:
> > > 
> > 
> 
> > > > >>> > x * y + x
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> I’d say yes, it’s a bug.
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> Unless ‑ffast‑math is used (or some appropriate
subset that
> > > > >>> gives
> > > > >>> us leeway, like ‑fno‑honor‑infinities or
‑fno‑honor‑nans,
> > > > >>> or
> > > > >>> somesuch), the re-association isn’t allowed,
and that
> > > > >>> blocks
> > > > >>> the
> > > > >>> madd contraction.
> > > 
> > 
> 
> > > > >>
> > > 
> > 
> 
> > > > >> I agree. FP contraction alone only allows us to do
x*y+z ->
> > > > >> fma(x,y,z).
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > I agree too, but the more difficult question is
"which flags
> > > > > are
> > > > > needed here?”
> > > 
> > 
> 
> > > > > Would FPContract + no-inf be enough? If not why and how
to
> > > > > document
> > > > > it?
> > > 
> > 
> 
> > > > I think that the relevant question is: Is the contracted
form
> > > > more
> > > > precise for all inputs (or the same precision as the
original)?
> > > > If
> > > > so, then this should be allowed with just
fp-contract+no-inf.
> > > > Otherwise, more is required.
> > > 
> > 
> 
> > > > -Hal
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > —
> > > 
> > 
> 
> > > > > Mehdi
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> From: Sanjay Patel [mailto: spatel at
rotateright.com ]
> > > 
> > 
> 
> > > > >>> Sent: Thursday, November 17, 2016 3:22 PM
> > > 
> > 
> 
> > > > >>> To: cfe-dev < cfe-dev at lists.llvm.org
>; llvm-dev <
> > > > >>> llvm-dev at lists.llvm.org >
> > > 
> > 
> 
> > > > >>> Cc: Nicolai Hähnle < nhaehnle at gmail.com
>; Hal Finkel <
> > > > >>> hfinkel at anl.gov >; Mehdi Amini <
mehdi.amini at apple.com >;
> > > > >>> Ristow, Warren < warren.ristow at sony.com
>;
> > > > >>> andrew.kaylor at intel.com
> > > 
> > 
> 
> > > > >>> Subject: what does -ffp-contract=fast allow?
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> This is just paraphrasing from D26602, so
credit to Nicolai
> > > > >>> for
> > > > >>> first raising the issue there.
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> float foo(float x, float y) {
> > > 
> > 
> 
> > > > >>> return x * (y + 1);
> > > 
> > 
> 
> > > > >>> }
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> $ ./clang -O2 xy1.c -S -o - -target aarch64
> > > > >>> -ffp-contract=fast
> > > > >>> |
> > > > >>> grep fm
> > > 
> > 
> 
> > > > >>> fmadd s0, s1, s0, s0
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> Is this a bug? We transformed the original
expression into:
> > > 
> > 
> 
> > > > >>> x * y + x
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> When x=INF and y=0, the code returns INF if we
don't
> > > > >>> reassociate.
> > > > >>> With reassociation to FMA, it returns NAN
because 0 * INF > > > > >>> NAN.
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> 1. I used aarch64 as the example target, but
this is not
> > > > >>> target-dependent (as long as the target has
FMA).
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> 2. This is *not* -ffast-math...or is it? The C
standard
> > > > >>> only
> > > > >>> shows on/off settings for the associated
FP_CONTRACT
> > > > >>> pragma.
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> 3. AFAIK, clang has no documentation for
-ffp-contract:
> > > 
> > 
> 
> > > > >>> http://clang.llvm.org/docs/UsersManual.html
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> 4. GCC says:
> > > 
> > 
> 
> > > > >>>
https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/Optimize-Options.html#Optimize-Options
> > > 
> > 
> 
> > > > >>> "-ffp-contract=fast enables floating-point
expression
> > > > >>> contraction
> > > > >>> such as forming of fused multiply-add
operations if the
> > > > >>> target
> > > > >>> has native support for them."
> > > 
> > 
> 
> > > > >>>
> > > 
> > 
> 
> > > > >>> 5. The LLVM backend (where this reassociation
currently
> > > > >>> happens)
> > > > >>> shows:
> > > 
> > 
> 
> > > > >>> FPOpFusion::Fast - Enable fusion of FP ops
wherever it's
> > > > >>> profitable.
> > > 
> > 
> 
> > > > >>
> > > 
> > 
> 
> > > > >>
> > > 
> > 
> 
> > > > >>
> > > 
> > 
> 
> > > > >>
> > > 
> > 
> 
> > > > >> --
> > > 
> > 
> 
> > > > >> Hal Finkel
> > > 
> > 
> 
> > > > >> Lead, Compiler Technology and Programming Languages
> > > 
> > 
> 
> > > > >> Leadership Computing Facility
> > > 
> > 
> 
> > > > >> Argonne National Laboratory
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > --
> 
> > Hal Finkel
> 
> > Lead, Compiler Technology and Programming Languages
> 
> > Leadership Computing Facility
> 
> > Argonne National Laboratory
> 
-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161118/0b7a3331/attachment.html>

llvm dev - Nov 2016 - what does -ffp-contract=fast allow?

[llvm-dev] what does -ffp-contract=fast allow?

[llvm-dev] what does -ffp-contract=fast allow?

[llvm-dev] what does -ffp-contract=fast allow?