thr3ads.net - llvm dev - [llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Ristow, Warren via llvm-dev

2016-Nov-16 01:15 UTC

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

Hi all,

This is about https://reviews.llvm.org/D26708

Currently when the command-line switch '-ffast-math' is specified, the
IR-level fast-math-flag 'fast' gets attached to appropriate FP math
instructions.  That flag acts as an "umbrella" to implicitly turn on
all the
other fast-math-flags ('nnan', 'ninf', 'nsz' and
'arcp'):

http://llvm.org/docs/LangRef.html#fast-math-flags

This approach has the shortcoming that when there is a desire to disable one
of these fast-math-flags, if the 'fast' flag remains, it implicitly
re-enables the one being disabled.  For example, compiling this test-case:

    extern void use(float x, float y);
    void test(float a, float b, float c) {
      float q1 = a / c;
      float q2 = b / c;
      use(q1, q2);
    }

at '-O2 -ffast-math' does a reciprocal-transformation, so only one
division
is done (as desired with fast-math).  Compiling it with:

  -O2 -ffast-math -fno-reciprocal-math

should disable the reciprocal transformations (the flag 'arcp'), but
leave
all the other fast-math transformations enabled.  The current implementation
doesn't do that, since the 'fast' IR-level flag still gets set.

Motivation of this discussion: https://llvm.org/bugs/show_bug.cgi?id=27372#c2

As an aside, when '-ffast-math' is specified on the command-line, the
following six switches are all passed to cc1:

    -menable-no-infs
    -menable-no-nans
    -fno-signed-zeros
    -freciprocal-math
    -fno-trapping-math
    -ffp-contract=fast

and '-ffast-math' itself is also passed cc1 (the act of passing
'-ffast-math'
to cc1 results in the macro '__FAST_MATH__' being defined).  When (for
example) '-fno-reciprocal-math' is passed in addition to
'-ffast-math', then
'-freciprocal-math' is no longer passed to cc1 (and the other five
listed
above still are passed, along with '-ffast-math' still being passed). 
It
seems like the intention was that these individual switches were to enable
the individual floating-point transformations (and so the lack of any of
those switches would suppress the relevant transformations), but the
'-ffast-math' "umbrella" is over-riding the attempted
suppression.

The change proposed at https://reviews.llvm.org/D26708 deals with this issue
just for the reciprocal-transformation case, but it changes the semantics of
the 'fast' IR-level flag so that it no longer implies all the others. 
With
that proposed approach, rather than an "umbrella" flag such as
'fast' being
checked in the back-end (along with an individual flag like 'arcp'),
just
checking the individual flag ('arcp') would be done.  Any
fast-math-related
transformation that doesn't have an individual flag (e.g., re-association
currently doesn't), should eventually have an individual flag defined for
it, and then that individual flag should be checked.

What do people think?

Thanks,
-Warren
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/9337607e/attachment.html>

Mehdi Amini via llvm-dev

2016-Nov-16 05:10 UTC

head link

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

Hi,
> On Nov 15, 2016, at 5:15 PM, Ristow, Warren via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi all,
>  
> This is about https://reviews.llvm.org/D26708
<https://reviews.llvm.org/D26708>
>  
> Currently when the command-line switch '-ffast-math' is specified,
the
> IR-level fast-math-flag 'fast' gets attached to appropriate FP math
> instructions.  That flag acts as an "umbrella" to implicitly turn
on all the
> other fast-math-flags ('nnan', 'ninf', 'nsz' and
'arcp'):
>  
> http://llvm.org/docs/LangRef.html#fast-math-flags
<http://llvm.org/docs/LangRef.html#fast-math-flags>
>  
> This approach has the shortcoming that when there is a desire to disable
one
> of these fast-math-flags, if the 'fast' flag remains, it implicitly
> re-enables the one being disabled.  For example, compiling this test-case:
>  
>     extern void use(float x, float y);
>     void test(float a, float b, float c) {
>       float q1 = a / c;
>       float q2 = b / c;
>       use(q1, q2);
>     }
>  
> at '-O2 -ffast-math' does a reciprocal-transformation, so only one
division
> is done (as desired with fast-math).  Compiling it with:
>  
>   -O2 -ffast-math -fno-reciprocal-math
>  
> should disable the reciprocal transformations (the flag 'arcp'),
but leave
> all the other fast-math transformations enabled.  The current
implementation
> doesn't do that, since the 'fast' IR-level flag still gets set.
>  
> Motivation of this discussion:
https://llvm.org/bugs/show_bug.cgi?id=27372#c2
<https://llvm.org/bugs/show_bug.cgi?id=27372#c2>
>  
> As an aside, when '-ffast-math' is specified on the command-line,
the
> following six switches are all passed to cc1:
>  
>     -menable-no-infs
>     -menable-no-nans
>     -fno-signed-zeros
>     -freciprocal-math
>     -fno-trapping-math
>     -ffp-contract=fast
>  
> and '-ffast-math' itself is also passed cc1 (the act of passing
'-ffast-math'
> to cc1 results in the macro '__FAST_MATH__' being defined).  When
(for
> example) '-fno-reciprocal-math' is passed in addition to
'-ffast-math', then
> '-freciprocal-math' is no longer passed to cc1 (and the other five
listed
> above still are passed, along with '-ffast-math' still being
passed).  It
> seems like the intention was that these individual switches were to enable
> the individual floating-point transformations (and so the lack of any of
> those switches would suppress the relevant transformations), but the
> '-ffast-math' "umbrella" is over-riding the attempted
suppression.
Sure, this looks like a bug, disable an individual fast-math flag on the command
line should be possible and override a prior -ffast-math (usually the last one
on the command line “wins”/override).

The Cfe-dev mailing list would be more appropriate to discuss the behavior of
clang command line flags though.

>  
> The change proposed at https://reviews.llvm.org/D26708
<https://reviews.llvm.org/D26708> deals with this issue
This patch seems to modify on LLVM, it does not deal at all with the issue you
describe above.
I don’t see why the issue with the clang command line flags need to be dealt
with at the LLVM level.

> just for the reciprocal-transformation case, but it changes the semantics
of
> the 'fast' IR-level flag so that it no longer implies all the
others.
The starting point for any change is:
http://llvm.org/docs/LangRef.html#fast-math-flags
You would need to write a new definition for what “fast” would mean.

However I don’t need anything need to be changed here to address the use-case
you want to fix.

> With
> that proposed approach, rather than an "umbrella" flag such as
'fast' being
> checked in the back-end (along with an individual flag like
'arcp'), just
> checking the individual flag ('arcp') would be done. 
There is already no need to check the “fast” *and* arcp flag, if a
transformation is about reciprocal, then you only need to check arcp (fast
implies arcp, checking for fast would be redundant).

Be careful also that the fast-math flags are mainly an IR level definition, the
backend only inherited these per instruction flag very recently. It has been
entirely converted to use these, and it still uses a global flag in some places.
The line you’re touching in your patch for instance is about this legacy:

  if (!UnsafeMath && !Flags->hasAllowReciprocal())

The first flag is the global “fast-math” mode on the backend, which is not as
fine grain as the per-instruction model.
The second flag is the “per instruction” flag, which is the model we aim at.

We should get rid of the “global” UnsafeMath in the backend, but that does not
require any change to the IR or the individual fast-math flags.

> Any fast-math-related
> transformation that doesn't have an individual flag (e.g.,
re-association
> currently doesn't), should eventually have an individual flag defined
for
> it, and then that individual flag should be checked.
>  
> What do people think?
I think these are valuable problems to solve, but you should tackle them piece
by piece:

1) the clang part of overriding the individual FMF and emitting the right IR is
the first thing to fix.
2) the backend is still using the global UnsafeFPMath and it should be killed.

Hope this makes sense.

— 
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161115/770d324b/attachment.html>

Ristow, Warren via llvm-dev

2016-Nov-16 07:59 UTC

head link

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

Hi,

Thanks for the quick feedback. I see your points, but I have a few
questions/comments. I'll start at the end of the previous post:
> ...
> I think these are valuable problems to solve, but you should tackle them
piece by piece:
>
> 1) the clang part of overriding the individual FMF and emitting the right
IR is the first thing to fix.
> 2) the backend is still using the global UnsafeFPMath and it should be
killed.
I addressed this point (2) for the reciprocal aspect in the patch, but of course
that wasn't useful without doing something about (1).

Regarding (1), over at https://reviews.llvm.org/D26708#596610, David made the
same point that it should be done in Clang. I can understand that, but I wonder
whether having the concept of the 'fast' flag in the IR that implies all
the other FMF makes sense? I'm not seeing a good reason for it, but since
this is very new to me, I can easily imagine I'm missing the big picture.

For example, in the LLVM IR (http://llvm.org/docs/LangRef.html#fast-math-flags)
the fast-math flags 'nnan', 'ninf', 'nsz',
'arcp' and 'fast’ are defined. Except for 'fast', each of
these has a fairly specific definition of what they mean. For example, for
'arcp':

arcp => "Allow optimizations to use the reciprocal of an argument
rather
than perform division."

'fast' is unusual, in that it describes a fairly generic set of
aggressive floating-point optimizations:

fast => "Allow algebraically equivalent transformations that may
dramatically
change results in floating point (e.g. reassociate). This flag
implies
all the others."

Very loosely, 'fast' means "all the aggressive FP-transformations
that are not controlled by one of the other 4, plus it implies all the other
4". If for terminology, we call those additional aggressive optimizations
'aggr', then we have:

'fast' == 'aggr' + 'nnan' + 'ninf' +
'nsz' + 'arcp'

So as I see it, if we want to disable only one of the other ones (like
'arcp', in my case), there isn't any way to express that with these
IR flags defined this way. In short, we cannot turn on all the flags besides
'arcp'. To do that, what we want is that somehow for the Clang
switches:

'-ffast-math -fno-reciprocal-math'

to ultimately result in LLVM IR that has the following flags on in appropriate
FP ops:

'aggr' + 'nnan' + 'ninf' + 'nsz'

But I don't see a way to express 'aggr' in the IR. We could do
this, if we change the definition of the IR 'fast' flag to remove that
sentence about implying all the others:

fast => "Allow algebraically equivalent transformations that may
dramatically
change results in floating point (e.g. reassociate).

(If we do something like that, we may want to change the name from
'fast' to something else (like 'aggr'), to avoid tying it too
closely to the concept of the '-ffast-math' switch.)

As an aside, I don't know if the "reassociate" example is the only
other transformation that's allowed by 'fast' (I presume it
isn't), but I think reassociation would be better expressed by a separate
flag, which could then be controlled independently via
'-f[no]-associative-math' switch. Not having that flag exist separately
in the FMF is the origin of PR27372. But creating that flag and using it in the
appropriate places would still run into these problems of 'fast'
implying all the others, which would make it impossible to disable reassociation
while leaving all the other FMF transformations enabled.

To ask a concrete question using the current definition of 'fast' (which
includes enabling reassociation, as the LLVM IR documentation of FMF says), how
can we express in the IR that reciprocal-transformations are not allowed, but
reassociation is allowed?

So the bottom line is that I do see there are issues in Clang that are relevant.
But as long as 'fast' means "'aggr' plus all the other FMF
transformations", I don't see how we can effectively disable a subset
of those other FMF transformations (while leaving 'aggr'
transformations, such as reassociation, enabled). With that in mind, my patch
took one step in having 'fast' no longer imply all the others.

Thanks,
-Warren
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/fc2754a5/attachment-0001.html>

Hal Finkel via llvm-dev

2016-Nov-16 07:59 UTC

head link

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

----- Original Message -----
> From: "Mehdi Amini via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "Warren Ristow" <warren.ristow at sony.com>
> Cc: llvm-dev at lists.llvm.org
> Sent: Tuesday, November 15, 2016 11:10:48 PM
> Subject: Re: [llvm-dev] RFC: Consider changing the semantics of
> 'fast' flag implying all fast-math-flags
> Hi,
> > On Nov 15, 2016, at 5:15 PM, Ristow, Warren via llvm-dev <
> > llvm-dev at lists.llvm.org > wrote:
> 
> > Hi all,
> 
> > This is about https://reviews.llvm.org/D26708
> 
> > Currently when the command-line switch '-ffast-math' is
specified,
> > the
> 
> > IR-level fast-math-flag 'fast' gets attached to appropriate FP
math
> 
> > instructions. That flag acts as an "umbrella" to implicitly
turn on
> > all the
> 
> > other fast-math-flags ('nnan', 'ninf', 'nsz'
and 'arcp'):
> 
> > http://llvm.org/docs/LangRef.html#fast-math-flags
> 
> > This approach has the shortcoming that when there is a desire to
> > disable one
> 
> > of these fast-math-flags, if the 'fast' flag remains, it
implicitly
> 
> > re-enables the one being disabled. For example, compiling this
> > test-case:
> 
> > extern void use(float x, float y);
> 
> > void test(float a, float b, float c) {
> 
> > float q1 = a / c;
> 
> > float q2 = b / c;
> 
> > use(q1, q2);
> 
> > }
> 
> > at '-O2 -ffast-math' does a reciprocal-transformation, so only
one
> > division
> 
> > is done (as desired with fast-math). Compiling it with:
> 
> > -O2 -ffast-math -fno-reciprocal-math
> 
> > should disable the reciprocal transformations (the flag
'arcp'),
> > but
> > leave
> 
> > all the other fast-math transformations enabled. The current
> > implementation
> 
> > doesn't do that, since the 'fast' IR-level flag still gets
set.
> 
> > Motivation of this discussion:
> > https://llvm.org/bugs/show_bug.cgi?id=27372#c2
> 
> > As an aside, when '-ffast-math' is specified on the
command-line,
> > the
> 
> > following six switches are all passed to cc1:
> 
> > -menable-no-infs
> 
> > -menable-no-nans
> 
> > -fno-signed-zeros
> 
> > -freciprocal-math
> 
> > -fno-trapping-math
> 
> > -ffp-contract=fast
> 
> > and '-ffast-math' itself is also passed cc1 (the act of
passing
> > '-ffast-math'
> 
> > to cc1 results in the macro '__FAST_MATH__' being defined).
When
> > (for
> 
> > example) '-fno-reciprocal-math' is passed in addition to
> > '-ffast-math', then
> 
> > '-freciprocal-math' is no longer passed to cc1 (and the other
five
> > listed
> 
> > above still are passed, along with '-ffast-math' still being
> > passed).
> > It
> 
> > seems like the intention was that these individual switches were to
> > enable
> 
> > the individual floating-point transformations (and so the lack of
> > any
> > of
> 
> > those switches would suppress the relevant transformations), but
> > the
> 
> > '-ffast-math' "umbrella" is over-riding the
attempted suppression.
> 
> Sure, this looks like a bug, disable an individual fast-math flag on
> the command line should be possible and override a prior -ffast-math
> (usually the last one on the command line “wins”/override).
> The Cfe-dev mailing list would be more appropriate to discuss the
> behavior of clang command line flags though.
> > The change proposed at https://reviews.llvm.org/D26708 deals with
> > this issue
> 
> This patch seems to modify on LLVM, it does not deal at all with the
> issue you describe above.
> I don’t see why the issue with the clang command line flags need to
> be dealt with at the LLVM level.
> > just for the reciprocal-transformation case, but it changes the
> > semantics of
> 
> > the 'fast' IR-level flag so that it no longer implies all the
> > others.
> 
> The starting point for any change is:
> http://llvm.org/docs/LangRef.html#fast-math-flags
> You would need to write a new definition for what “fast” would mean.
> However I don’t need anything need to be changed here to address the
> use-case you want to fix.I suspect that we want to start by getting rid of 'fast' on the IR level
and replacing it with individual flags for the various optimization classes - Do
we have only allowing reassociation and libm optimizations? Then we can readjust
the Clang flags in a straightforward way.

-Hal 
> > With
> 
> > that proposed approach, rather than an "umbrella" flag such
as
> > 'fast'
> > being
> 
> > checked in the back-end (along with an individual flag like
> > 'arcp'),
> > just
> 
> > checking the individual flag ('arcp') would be done.
> 
> There is already no need to check the “fast” *and* arcp flag, if a
> transformation is about reciprocal, then you only need to check arcp
> (fast implies arcp, checking for fast would be redundant).
> Be careful also that the fast-math flags are mainly an IR level
> definition, the backend only inherited these per instruction flag
> very recently. It has been entirely converted to use these, and it
> still uses a global flag in some places.
> The line you’re touching in your patch for instance is about this
> legacy:
> if (!UnsafeMath && !Flags->hasAllowReciprocal())
> The first flag is the global “fast-math” mode on the backend, which
> is not as fine grain as the per-instruction model.
> The second flag is the “per instruction” flag, which is the model we
> aim at.
> We should get rid of the “global” UnsafeMath in the backend, but that
> does not require any change to the IR or the individual fast-math
> flags.
> > Any fast-math-related
> 
> > transformation that doesn't have an individual flag (e.g.,
> > re-association
> 
> > currently doesn't), should eventually have an individual flag
> > defined
> > for
> 
> > it, and then that individual flag should be checked.
> 
> > What do people think?
> 
> I think these are valuable problems to solve, but you should tackle
> them piece by piece:
> 1) the clang part of overriding the individual FMF and emitting the
> right IR is the first thing to fix.
> 2) the backend is still using the global UnsafeFPMath and it should
> be killed.
> Hope this makes sense.
> —
> Mehdi
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/d3a077b2/attachment.html>

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Nov 2016 - RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

Reasonably Related Threads