thr3ads.net - llvm dev - [llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Mehdi Amini via llvm-dev

2016-Nov-17 06:45 UTC

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

> On Nov 16, 2016, at 10:04 PM, Ristow, Warren <warren.ristow at
sony.com> wrote:
> 
> > Can you elaborate what kind of runtime failure is the reciprocal
transformation triggering?
>  
> Yes.  It was along the lines of:
>  
>     {
>       float x = a / c;
>       float y = b / c;
>  
>       if (y == 1.0f) {
>         // do some processing for when 'b' and 'c' are
equal
>       } else {
>         // do other processing
>       }
>  
>       use(x, y);
>     }
>  
> Of course they understood they could easily change this code once they
understood the issue.
>  
> But the fact that it "failed" for non-edge-case values of
'c', they were worried.  As an example of the non-edge-case aspect, when
'c' is 41.0f (and so of course 'b' is 41.0f), intuitively they
felt that this “would work precisely”, even with fast-math.  Once they
understood more, they agreed this was reasonable with fast-math, but they had
the underlying concern that if they encountered one case where 'num' and
'den' were equal (and non-edge-case), yet 'num / den' wasn't
precisely 1.0f, then even if they fixed this situation where they encountered
it, it might be lurking elsewhere in their code, and so they wanted to disable
that transformation.
Thanks for elaborating.

I’d be reluctant to call this situation a real use-case though.
Is the the distinction on reciprocal really make sense here? This user can have
the same “surprising" anywhere in their code-base with reassociation as
well:

void foo (float a, float b) {
  float x = a - b; 
  if (x == 0) 
     … // only if a == b
}

That would sound totally reasonable, unless foo is inlined and reassociation
would lead to a non-zero value for x even when a and b passed in to foo "if
it wasn’t inlined" would be identical!

(Reminds me somehow of a client that was bitten by nnan: their assumption was
that as long as they didn’t introduce NaN in the program everything was fine.
However with fast-math some transformations were introducing NaN where there
wasn’t before and propagating to other computation that were transformed under
the assumption that no NaN would show up, it also turns out that making the code
safe against NaN and efficient at the same time is hard, especially when the
code itself it compiled with fast-math).

— 
Mehdi



> 
>  
> From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com] 
> Sent: Wednesday, November 16, 2016 7:11 PM
> To: Ristow, Warren <warren.ristow at sony.com>
> Cc: Kaylor, Andrew <andrew.kaylor at intel.com>; llvm-dev at
lists.llvm.org; Nicolai Hähnle <nhaehnle at gmail.com>
> Subject: Re: [llvm-dev] RFC: Consider changing the semantics of
'fast' flag implying all fast-math-flags
>  
>  
> On Nov 16, 2016, at 6:22 PM, Ristow, Warren <warren.ristow at sony.com
<mailto:warren.ristow at sony.com>> wrote:
>  
> > ... except that Warren’s proposal that started this discussion seems
to imply that he
> > has a use case that requires reciprocals to be turned off separately.
>  
> Just to close this loose end, yes I have a use case.
>  
> Specifically, we have a customer that turns on '‑ffast‑math', but
was getting a runtime failure due to the reciprocal-transformation being done.
>  
> Can you elaborate what kind of runtime failure is the reciprocal
transformation triggering?
>  
> — 
> Mehdi
>  
> 
> 
> They don't want turn off fast‑math because they like the performance
improvement, and can live with the imprecision in most cases.  So they wanted to
suppress just the reciprocal-transformation.  I intended to tell them the
solution was simple: use '‑ffast‑math ‑fno‑reciprocal‑math'.  But on
trying it myself, I ran into the issue here.
>  
> Thanks,
> -Warren
>  
> From: Kaylor, Andrew [mailto:andrew.kaylor at intel.com
<mailto:andrew.kaylor at intel.com>]
> Sent: Wednesday, November 16, 2016 4:13 PM
> To: Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at
apple.com>>; Ristow, Warren <warren.ristow at sony.com
<mailto:warren.ristow at sony.com>>; llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>; Nicolai Hähnle <nhaehnle at
gmail.com <mailto:nhaehnle at gmail.com>>
> Subject: RE: [llvm-dev] RFC: Consider changing the semantics of
'fast' flag implying all fast-math-flags
>  
> I don’t really like the idea of updating checks of UnsafeAlgebra() to
depend on all of the other flags.  It seems like it would be preferable to look
at each optimization and figure out which flags it actually requires.  I suspect
that in many cases the “new” flag (i.e. allowing reassociation, etc.) will be
what is actually needed anyway.
>  
> I would be inclined to agree with Niolai’s suggestion of combining all the
flags related to value safety, except that Warren’s proposal that started this
discussion seems to imply that he has a use case that requires reciprocals to be
turned off separately.
>  
> -Andy
>  <> 
>  <>From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
<mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of Mehdi Amini via
llvm-dev
> Sent: Wednesday, November 16, 2016 8:55 AM
> To: Ristow, Warren <warren.ristow at sony.com <mailto:warren.ristow
at sony.com>>
> Cc: llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> Subject: Re: [llvm-dev] RFC: Consider changing the semantics of
'fast' flag implying all fast-math-flags
>  
>  
> On Nov 15, 2016, at 11:59 PM, Ristow, Warren <warren.ristow at sony.com
<mailto:warren.ristow at sony.com>> wrote:
>  
> Hi,
>  
> Thanks for the quick feedback.  I see your points, but I have a few
questions/comments.  I'll start at the end of the previous post:
>  
> > ...
> > I think these are valuable problems to solve, but you should tackle
them piece by piece:
> > 
> > 1) the clang part of overriding the individual FMF and emitting the
right IR is the first thing to fix.
> > 2) the backend is still using the global UnsafeFPMath and it should be
killed.
>  
> I addressed this point (2) for the reciprocal aspect in the patch, but of
course that wasn't useful without doing something about (1).
>  
> Regarding (1), over at https://reviews.llvm.org/D26708#596610
<https://reviews.llvm.org/D26708#596610>, David made the same point that
it should be done in Clang.  I can understand that, but I wonder whether having
the concept of the 'fast' flag in the IR that implies all the other FMF
makes sense?  I'm not seeing a good reason for it, but since this is very
new to me, I can easily imagine I'm missing the big picture.
>  
> For example, in the LLVM IR
(http://llvm.org/docs/LangRef.html#fast-math-flags
<http://llvm.org/docs/LangRef.html#fast-math-flags>) the fast-math flags
'nnan', 'ninf', 'nsz', 'arcp' and 'fast’ are
defined.  Except for 'fast', each of these has a fairly specific
definition of what they mean.  For example, for 'arcp':
>  
>     arcp => "Allow optimizations to use the reciprocal of an
argument rather
>              than perform division."
>  
> 'fast' is unusual, in that it describes a fairly generic set of
aggressive floating-point optimizations:
>  
>     fast => "Allow algebraically equivalent transformations that
may dramatically
>             change results in floating point (e.g. reassociate). This flag
implies
>             all the others."
>  
> Very loosely, 'fast' means "all the aggressive
FP-transformations that are not controlled by one of the other 4, plus it
implies all the other 4".  If for terminology, we call those additional
aggressive optimizations 'aggr', then we have:
>  
>     'fast' == 'aggr' + 'nnan' + 'ninf' +
'nsz' + 'arcp'
>  
> So as I see it, if we want to disable only one of the other ones (like
'arcp', in my case), there isn't any way to express that with these
IR flags defined this way.  In short, we cannot turn on all the flags besides
'arcp'.  To do that, what we want is that somehow for the Clang
switches:
>  
>   '-ffast-math -fno-reciprocal-math'
>  
> to ultimately result in LLVM IR that has the following flags on in
appropriate FP ops:
>  
>   'aggr' + 'nnan' + 'ninf' + ‘nsz'
>  
> Make sense, I missed that we can’t *subtract* from fast at the IR level.
>  
> I wouldn’t be opposed to have something along the line of “aggr”, but there
is a tradeoff: some transformation may be harder to guard with this model.
>  
> Maybe that could be a starting point: changing the “UnsafeAlgebra” bit in
the FMF to be “aggr” you mention and replace all the query to
FastMathFlags::UnsafeAlgebra() to return true if all the bits are set in the
Flags. This alone should be nothing more than a mechanical change I believe.
> The important part is then auditing all the users of UnsafeAlgebra() in the
middle end and check if they can be “downgraded” to aggr safely: i.e. if they
don’t need aggr *and* another flag.
>  
> — 
> Mehdi
>  
>  
>  
>  
>  
>  
> 
>  
> But I don't see a way to express 'aggr' in the IR.  We could do
this, if we change the definition of the IR 'fast' flag to remove that
sentence about implying all the others:
>  
>     fast => "Allow algebraically equivalent transformations that
may dramatically
>             change results in floating point (e.g. reassociate).
>  
> (If we do something like that, we may want to change the name from
'fast' to something else (like 'aggr'), to avoid tying it too
closely to the concept of the '-ffast-math' switch.)
>  
> As an aside, I don't know if the "reassociate" example is the
only other transformation that's allowed by 'fast' (I presume it
isn't), but I think reassociation would be better expressed by a separate
flag, which could then be controlled independently via
'-f[no]-associative-math' switch.  Not having that flag exist separately
in the FMF is the origin of PR27372.  But creating that flag and using it in the
appropriate places would still run into these problems of 'fast'
implying all the others, which would make it impossible to disable reassociation
while leaving all the other FMF transformations enabled.
>  
> To ask a concrete question using the current definition of 'fast'
(which includes enabling reassociation, as the LLVM IR documentation of FMF
says), how can we express in the IR that reciprocal-transformations are not
allowed, but reassociation is allowed?
>  
> So the bottom line is that I do see there are issues in Clang that are
relevant.  But as long as 'fast' means "'aggr' plus all the
other FMF transformations", I don't see how we can effectively disable
a subset of those other FMF transformations (while leaving 'aggr'
transformations, such as reassociation, enabled).  With that in mind, my patch
took one step in having 'fast' no longer imply all the others.
>  
> Thanks,
> -Warren
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161116/bc0b668c/attachment.html>

Ristow, Warren via llvm-dev

2016-Nov-17 08:51 UTC

head link

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

Those are all good points.  Your reassociation point in the context of inlining
is particularly interesting.

FWIW, we also have a case where a customer wants '-fno-associative-math'
to suppress reassociation under '-ffastmath'.  It would take me a while
to find the specifics of the issue, but it was (if my memory is right) more of a
real use-case.  (That is to say, the code that was "failing" due to
reassociation didn't have an obvious fix like the reciprocal situation,
here, other than to turn off fast-math.)  In fact, the request to suppress
reassociation was the motivation for creating PR27372 in the first place (which
eventually fed into this thread).  I have to say that on the reassociation
point, my concern is that to really suppress that, we will have to suppress so
much, that there will hardly be any point in using -ffast-math.

I'd say your comments here are very similar to what Nicolai said in another
subthread of this discussion:
>> I'd be really curious to know if there is anybody who really needs
arcp
>> without fp-contract=fast or vice versa, or who needs both of these but
>> not the X*log2(0.5*Y) transform you mentioned, and so on.[1]
>> ...
>> [1] One case I _can_ think of (and which may have been a reason for the
>> proliferation of flags in the first place) is somebody who enables fast
>> math, but then doesn't want their results to change when they
update the
>> compiler and get a new set of optimizations. But IMO that's a use
case
>> that should be explicitly rejected.
I think those are all really good points, and an argument can be made that when
-ffast-math gives you results you don't want, then you just have to turn it
off.  Essentially, the user can't "have his cake and eat it too".

All that said, I think we (the company I work for, Sony) will have to implement
support for these switches.  It comes down to GCC has these switches (e.g.,
-fno-reciprocal-math and -fno-associative-math), and they do suppress the
transformations for our customers.  They switch to Clang/LLVM, they use the same
switches, and it doesn't "work".  So as a practical matter, I
think we will support them.  Whether the LLVM community in general feels that
that's required, is another question.  Until for your recent comments here,
and Nicolai's comments above, I would have thought the answer was clearly
yes.  But maybe that's not the case.

In summary, irrespective of any (subjective?) assessment of how legitimate a
particular use-case is, do we want switches like:

    -ffast-math -fno-reciprocal-math
     -ffast-math -fno-associative-math

to work?

For me, the answer is yes, because I have multiple customers that tell me they
really want to leave -ffast-math on, but they want to be able to disable these
sub-categories.  I've been approaching this under the assumption that the
answer is yes for the Clang/LLVM community in general.

Thanks,
-Warren

From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
Sent: Wednesday, November 16, 2016 10:46 PM
To: Ristow, Warren <warren.ristow at sony.com>
Cc: Kaylor, Andrew <andrew.kaylor at intel.com>; llvm-dev at
lists.llvm.org; Nicolai Hähnle <nhaehnle at gmail.com>
Subject: Re: [llvm-dev] RFC: Consider changing the semantics of 'fast'
flag implying all fast-math-flags

On Nov 16, 2016, at 10:04 PM, Ristow, Warren <warren.ristow at
sony.com<mailto:warren.ristow at sony.com>> wrote:
> Can you elaborate what kind of runtime failure is the reciprocal
transformation triggering?
Yes.  It was along the lines of:

    {
      float x = a / c;
      float y = b / c;

      if (y == 1.0f) {
        // do some processing for when 'b' and 'c' are equal
      } else {
        // do other processing
      }

      use(x, y);
    }

Of course they understood they could easily change this code once they
understood the issue.

But the fact that it "failed" for non-edge-case values of 'c',
they were worried.  As an example of the non-edge-case aspect, when 'c'
is 41.0f (and so of course 'b' is 41.0f), intuitively they felt that
this “would work precisely”, even with fast-math.  Once they understood more,
they agreed this was reasonable with fast-math, but they had the underlying
concern that if they encountered one case where 'num' and 'den'
were equal (and non-edge-case), yet 'num / den' wasn't precisely
1.0f, then even if they fixed this situation where they encountered it, it might
be lurking elsewhere in their code, and so they wanted to disable that
transformation.

Thanks for elaborating.

I’d be reluctant to call this situation a real use-case though.
Is the the distinction on reciprocal really make sense here? This user can have
the same “surprising" anywhere in their code-base with reassociation as
well:

void foo (float a, float b) {
  float x = a - b;
  if (x == 0)
     … // only if a == b
}

That would sound totally reasonable, unless foo is inlined and reassociation
would lead to a non-zero value for x even when a and b passed in to foo "if
it wasn’t inlined" would be identical!

(Reminds me somehow of a client that was bitten by nnan: their assumption was
that as long as they didn’t introduce NaN in the program everything was fine.
However with fast-math some transformations were introducing NaN where there
wasn’t before and propagating to other computation that were transformed under
the assumption that no NaN would show up, it also turns out that making the code
safe against NaN and efficient at the same time is hard, especially when the
code itself it compiled with fast-math).

—
Mehdi

From: mehdi.amini at apple.com<mailto:mehdi.amini at apple.com>
[mailto:mehdi.amini at apple.com]
Sent: Wednesday, November 16, 2016 7:11 PM
To: Ristow, Warren <warren.ristow at sony.com<mailto:warren.ristow at
sony.com>>
Cc: Kaylor, Andrew <andrew.kaylor at intel.com<mailto:andrew.kaylor at
intel.com>>; llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>; Nicolai Hähnle <nhaehnle at gmail.com<mailto:nhaehnle
at gmail.com>>
Subject: Re: [llvm-dev] RFC: Consider changing the semantics of 'fast'
flag implying all fast-math-flags

On Nov 16, 2016, at 6:22 PM, Ristow, Warren <warren.ristow at
sony.com<mailto:warren.ristow at sony.com>> wrote:
> ... except that Warren’s proposal that started this discussion seems to
imply that he
> has a use case that requires reciprocals to be turned off separately.
Just to close this loose end, yes I have a use case.

Specifically, we have a customer that turns on '‑ffast‑math', but was
getting a runtime failure due to the reciprocal-transformation being done.

Can you elaborate what kind of runtime failure is the reciprocal transformation
triggering?

—
Mehdi

They don't want turn off fast‑math because they like the performance
improvement, and can live with the imprecision in most cases.  So they wanted to
suppress just the reciprocal-transformation.  I intended to tell them the
solution was simple: use '‑ffast‑math ‑fno‑reciprocal‑math'.  But on
trying it myself, I ran into the issue here.

Thanks,
-Warren

From: Kaylor, Andrew [mailto:andrew.kaylor at intel.com]
Sent: Wednesday, November 16, 2016 4:13 PM
To: Mehdi Amini <mehdi.amini at apple.com<mailto:mehdi.amini at
apple.com>>; Ristow, Warren <warren.ristow at
sony.com<mailto:warren.ristow at sony.com>>; llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; Nicolai Hähnle
<nhaehnle at gmail.com<mailto:nhaehnle at gmail.com>>
Subject: RE: [llvm-dev] RFC: Consider changing the semantics of 'fast'
flag implying all fast-math-flags

I don’t really like the idea of updating checks of UnsafeAlgebra() to depend on
all of the other flags.  It seems like it would be preferable to look at each
optimization and figure out which flags it actually requires.  I suspect that in
many cases the “new” flag (i.e. allowing reassociation, etc.) will be what is
actually needed anyway.

I would be inclined to agree with Niolai’s suggestion of combining all the flags
related to value safety, except that Warren’s proposal that started this
discussion seems to imply that he has a use case that requires reciprocals to be
turned off separately.

-Andy

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Mehdi
Amini via llvm-dev
Sent: Wednesday, November 16, 2016 8:55 AM
To: Ristow, Warren <warren.ristow at sony.com<mailto:warren.ristow at
sony.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] RFC: Consider changing the semantics of 'fast'
flag implying all fast-math-flags

On Nov 15, 2016, at 11:59 PM, Ristow, Warren <warren.ristow at
sony.com<mailto:warren.ristow at sony.com>> wrote:

Hi,

Thanks for the quick feedback.  I see your points, but I have a few
questions/comments.  I'll start at the end of the previous post:
> ...
> I think these are valuable problems to solve, but you should tackle them
piece by piece:
>
> 1) the clang part of overriding the individual FMF and emitting the right
IR is the first thing to fix.
> 2) the backend is still using the global UnsafeFPMath and it should be
killed.
I addressed this point (2) for the reciprocal aspect in the patch, but of course
that wasn't useful without doing something about (1).

Regarding (1), over at https://reviews.llvm.org/D26708#596610, David made the
same point that it should be done in Clang.  I can understand that, but I wonder
whether having the concept of the 'fast' flag in the IR that implies all
the other FMF makes sense?  I'm not seeing a good reason for it, but since
this is very new to me, I can easily imagine I'm missing the big picture.

For example, in the LLVM IR (http://llvm.org/docs/LangRef.html#fast-math-flags)
the fast-math flags 'nnan', 'ninf', 'nsz',
'arcp' and 'fast’ are defined.  Except for 'fast', each of
these has a fairly specific definition of what they mean.  For example, for
'arcp':

    arcp => "Allow optimizations to use the reciprocal of an argument
rather
             than perform division."

'fast' is unusual, in that it describes a fairly generic set of
aggressive floating-point optimizations:

    fast => "Allow algebraically equivalent transformations that may
dramatically
            change results in floating point (e.g. reassociate). This flag
implies
            all the others."

Very loosely, 'fast' means "all the aggressive FP-transformations
that are not controlled by one of the other 4, plus it implies all the other
4".  If for terminology, we call those additional aggressive optimizations
'aggr', then we have:

    'fast' == 'aggr' + 'nnan' + 'ninf' +
'nsz' + 'arcp'

So as I see it, if we want to disable only one of the other ones (like
'arcp', in my case), there isn't any way to express that with these
IR flags defined this way.  In short, we cannot turn on all the flags besides
'arcp'.  To do that, what we want is that somehow for the Clang
switches:

  '-ffast-math -fno-reciprocal-math'

to ultimately result in LLVM IR that has the following flags on in appropriate
FP ops:

  'aggr' + 'nnan' + 'ninf' + ‘nsz'

Make sense, I missed that we can’t *subtract* from fast at the IR level.

I wouldn’t be opposed to have something along the line of “aggr”, but there is a
tradeoff: some transformation may be harder to guard with this model.

Maybe that could be a starting point: changing the “UnsafeAlgebra” bit in the
FMF to be “aggr” you mention and replace all the query to
FastMathFlags::UnsafeAlgebra() to return true if all the bits are set in the
Flags. This alone should be nothing more than a mechanical change I believe.
The important part is then auditing all the users of UnsafeAlgebra() in the
middle end and check if they can be “downgraded” to aggr safely: i.e. if they
don’t need aggr *and* another flag.

—
Mehdi

But I don't see a way to express 'aggr' in the IR.  We could do
this, if we change the definition of the IR 'fast' flag to remove that
sentence about implying all the others:

    fast => "Allow algebraically equivalent transformations that may
dramatically
            change results in floating point (e.g. reassociate).

(If we do something like that, we may want to change the name from
'fast' to something else (like 'aggr'), to avoid tying it too
closely to the concept of the '-ffast-math' switch.)

As an aside, I don't know if the "reassociate" example is the only
other transformation that's allowed by 'fast' (I presume it
isn't), but I think reassociation would be better expressed by a separate
flag, which could then be controlled independently via
'-f[no]-associative-math' switch.  Not having that flag exist separately
in the FMF is the origin of PR27372.  But creating that flag and using it in the
appropriate places would still run into these problems of 'fast'
implying all the others, which would make it impossible to disable reassociation
while leaving all the other FMF transformations enabled.

To ask a concrete question using the current definition of 'fast' (which
includes enabling reassociation, as the LLVM IR documentation of FMF says), how
can we express in the IR that reciprocal-transformations are not allowed, but
reassociation is allowed?

So the bottom line is that I do see there are issues in Clang that are relevant.
But as long as 'fast' means "'aggr' plus all the other FMF
transformations", I don't see how we can effectively disable a subset
of those other FMF transformations (while leaving 'aggr'
transformations, such as reassociation, enabled).  With that in mind, my patch
took one step in having 'fast' no longer imply all the others.

Thanks,
-Warren

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161117/8500827b/attachment-0001.html>

Nicolai Hähnle via llvm-dev

2016-Nov-17 09:31 UTC

head link

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

On 17.11.2016 09:51, Ristow, Warren wrote:> Those are all good points.  Your reassociation point in the context of
> inlining is particularly interesting.
>
>
>
> FWIW, we also have a case where a customer wants
'-fno-associative-math'
> to suppress reassociation under '-ffastmath'.  It would take me a
while
> to find the specifics of the issue, but it was (if my memory is right)
> more of a real use-case.  (That is to say, the code that was
"failing"
> due to reassociation didn't have an obvious fix like the reciprocal
> situation, here, other than to turn off fast-math.)  In fact, the
> request to suppress reassociation was the motivation for creating
> PR27372 in the first place (which eventually fed into this thread).  I
> have to say that on the reassociation point, my concern is that to
> really suppress that, we will have to suppress so much, that there will
> hardly be any point in using -ffast-math.
>
>
>
> I'd say your comments here are very similar to what Nicolai said in
> another subthread of this discussion:
>
>
>
>>> I'd be really curious to know if there is anybody who really
needs arcp
>
>>> without fp-contract=fast or vice versa, or who needs both of these
but
>
>>> not the X*log2(0.5*Y) transform you mentioned, and so on.[1]
>
>>> ...
>
>>> [1] One case I _can_ think of (and which may have been a reason for
the
>
>>> proliferation of flags in the first place) is somebody who enables
fast
>
>>> math, but then doesn't want their results to change when they
update the
>
>>> compiler and get a new set of optimizations. But IMO that's a
use case
>
>>> that should be explicitly rejected.
>
>
>
> I think those are all really good points, and an argument can be made
> that when -ffast-math gives you results you don't want, then you just
> have to turn it off.  Essentially, the user can't "have his cake
and eat
> it too".
>
>
>
> All that said, I think we (the company I work for, Sony) will have to
> implement support for these switches.  It comes down to GCC has these
> switches (e.g., -fno-reciprocal-math and -fno-associative-math), and
> they do suppress the transformations for our customers.  They switch to
> Clang/LLVM, they use the same switches, and it doesn't
"work".  So as a
> practical matter, I think we will support them.  Whether the LLVM
> community in general feels that that's required, is another question.
> Until for your recent comments here, and Nicolai's comments above, I
> would have thought the answer was clearly yes.  But maybe that's not
the
> case.
>
>
>
> In summary, irrespective of any (subjective?) assessment of how
> legitimate a particular use-case is, do we want switches like:
>
>
>
>     -ffast-math -fno-reciprocal-math
>
>      -ffast-math -fno-associative-math
>
>
>
> to work?
>
>
>
> For me, the answer is yes, because I have multiple customers that tell
> me they really want to leave -ffast-math on, but they want to be able to
> disable these sub-categories.  I've been approaching this under the
> assumption that the answer is yes for the Clang/LLVM community in general.
I feel your pain, but I'm not convinced yet that this is really the 
right approach.

It sounds like the customers (a) want fast-math in general but (b) have 
some specific parts of the code where it breaks things. What about 
having them disable fast-math on a more fine-grained scope, e.g. via 
something like an __attribute__(no_fast_math) function attribute at the 
C++ source level?

Then the problematic piece of code might be slower (since all of 
fast-math is disabled), but the rest of the code would likely be faster 
(since it benefits from all of fast-math instead of just a subset).

Cheers,
Nicolai

Kaylor, Andrew via llvm-dev

2016-Nov-17 18:54 UTC

head link

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

>All that said, I think we (the company I work for, Sony) will have to
implement support
>for these switches.  It comes down to GCC has these switches (e.g.,
-fno-reciprocal-math
>and -fno-associative-math), and they do suppress the transformations for our
customers.
>They switch to Clang/LLVM, they use the same switches, and it doesn't
"work".  So as a
>practical matter, I think we will support them.  Whether the LLVM community
in general
>feels that that's required, is another question.  Until for your recent
comments here, and
>Nicolai's comments above, I would have thought the answer was clearly
yes.  But maybe
>that's not the case.
I think this is a very good point.  You (Sony) are not the only ones who are
concerned with GCC-command line compatibility.  It definitely should hold some
weight.  Given that this is something we could do with just a little more
effort, I’m not sure mere simplicity is enough reason not to do it.

Also, on a slight tangent...
>> I'd be really curious to know if there is anybody who really needs
arcp
>> without fp-contract=fast or vice versa, or who needs both of these but
>> not the X*log2(0.5*Y) transform you mentioned, and so on.[1]
I just wanted to mention that fp-contract relates to things like FMA and
shouldn’t be confused with fast-math.

-Andy

From: Ristow, Warren [mailto:warren.ristow at sony.com]
Sent: Thursday, November 17, 2016 12:51 AM
To: mehdi.amini at apple.com
Cc: Kaylor, Andrew <andrew.kaylor at intel.com>; llvm-dev at
lists.llvm.org; Nicolai Hähnle <nhaehnle at gmail.com>
Subject: RE: [llvm-dev] RFC: Consider changing the semantics of 'fast'
flag implying all fast-math-flags

Those are all good points.  Your reassociation point in the context of inlining
is particularly interesting.

FWIW, we also have a case where a customer wants '-fno-associative-math'
to suppress reassociation under '-ffastmath'.  It would take me a while
to find the specifics of the issue, but it was (if my memory is right) more of a
real use-case.  (That is to say, the code that was "failing" due to
reassociation didn't have an obvious fix like the reciprocal situation,
here, other than to turn off fast-math.)  In fact, the request to suppress
reassociation was the motivation for creating PR27372 in the first place (which
eventually fed into this thread).  I have to say that on the reassociation
point, my concern is that to really suppress that, we will have to suppress so
much, that there will hardly be any point in using -ffast-math.

I'd say your comments here are very similar to what Nicolai said in another
subthread of this discussion:
>> I'd be really curious to know if there is anybody who really needs
arcp
>> without fp-contract=fast or vice versa, or who needs both of these but
>> not the X*log2(0.5*Y) transform you mentioned, and so on.[1]
>> ...
>> [1] One case I _can_ think of (and which may have been a reason for the
>> proliferation of flags in the first place) is somebody who enables fast
>> math, but then doesn't want their results to change when they
update the
>> compiler and get a new set of optimizations. But IMO that's a use
case
>> that should be explicitly rejected.
I think those are all really good points, and an argument can be made that when
-ffast-math gives you results you don't want, then you just have to turn it
off.  Essentially, the user can't "have his cake and eat it too".

All that said, I think we (the company I work for, Sony) will have to implement
support for these switches.  It comes down to GCC has these switches (e.g.,
-fno-reciprocal-math and -fno-associative-math), and they do suppress the
transformations for our customers.  They switch to Clang/LLVM, they use the same
switches, and it doesn't "work".  So as a practical matter, I
think we will support them.  Whether the LLVM community in general feels that
that's required, is another question.  Until for your recent comments here,
and Nicolai's comments above, I would have thought the answer was clearly
yes.  But maybe that's not the case.

In summary, irrespective of any (subjective?) assessment of how legitimate a
particular use-case is, do we want switches like:

    -ffast-math -fno-reciprocal-math
     -ffast-math -fno-associative-math

to work?

For me, the answer is yes, because I have multiple customers that tell me they
really want to leave -ffast-math on, but they want to be able to disable these
sub-categories.  I've been approaching this under the assumption that the
answer is yes for the Clang/LLVM community in general.

Thanks,
-Warren

From: mehdi.amini at apple.com<mailto:mehdi.amini at apple.com>
[mailto:mehdi.amini at apple.com]
Sent: Wednesday, November 16, 2016 10:46 PM
To: Ristow, Warren <warren.ristow at sony.com<mailto:warren.ristow at
sony.com>>
Cc: Kaylor, Andrew <andrew.kaylor at intel.com<mailto:andrew.kaylor at
intel.com>>; llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>; Nicolai Hähnle <nhaehnle at gmail.com<mailto:nhaehnle
at gmail.com>>
Subject: Re: [llvm-dev] RFC: Consider changing the semantics of 'fast'
flag implying all fast-math-flags

On Nov 16, 2016, at 10:04 PM, Ristow, Warren <warren.ristow at
sony.com<mailto:warren.ristow at sony.com>> wrote:
> Can you elaborate what kind of runtime failure is the reciprocal
transformation triggering?
Yes.  It was along the lines of:

    {
      float x = a / c;
      float y = b / c;

      if (y == 1.0f) {
        // do some processing for when 'b' and 'c' are equal
      } else {
        // do other processing
      }

      use(x, y);
    }

Of course they understood they could easily change this code once they
understood the issue.

But the fact that it "failed" for non-edge-case values of 'c',
they were worried.  As an example of the non-edge-case aspect, when 'c'
is 41.0f (and so of course 'b' is 41.0f), intuitively they felt that
this “would work precisely”, even with fast-math.  Once they understood more,
they agreed this was reasonable with fast-math, but they had the underlying
concern that if they encountered one case where 'num' and 'den'
were equal (and non-edge-case), yet 'num / den' wasn't precisely
1.0f, then even if they fixed this situation where they encountered it, it might
be lurking elsewhere in their code, and so they wanted to disable that
transformation.

Thanks for elaborating.

I’d be reluctant to call this situation a real use-case though.
Is the the distinction on reciprocal really make sense here? This user can have
the same “surprising" anywhere in their code-base with reassociation as
well:

void foo (float a, float b) {
  float x = a - b;
  if (x == 0)
     … // only if a == b
}

That would sound totally reasonable, unless foo is inlined and reassociation
would lead to a non-zero value for x even when a and b passed in to foo "if
it wasn’t inlined" would be identical!

(Reminds me somehow of a client that was bitten by nnan: their assumption was
that as long as they didn’t introduce NaN in the program everything was fine.
However with fast-math some transformations were introducing NaN where there
wasn’t before and propagating to other computation that were transformed under
the assumption that no NaN would show up, it also turns out that making the code
safe against NaN and efficient at the same time is hard, especially when the
code itself it compiled with fast-math).

—
Mehdi

From: mehdi.amini at apple.com<mailto:mehdi.amini at apple.com>
[mailto:mehdi.amini at apple.com]
Sent: Wednesday, November 16, 2016 7:11 PM
To: Ristow, Warren <warren.ristow at sony.com<mailto:warren.ristow at
sony.com>>
Cc: Kaylor, Andrew <andrew.kaylor at intel.com<mailto:andrew.kaylor at
intel.com>>; llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>; Nicolai Hähnle <nhaehnle at gmail.com<mailto:nhaehnle
at gmail.com>>
Subject: Re: [llvm-dev] RFC: Consider changing the semantics of 'fast'
flag implying all fast-math-flags

On Nov 16, 2016, at 6:22 PM, Ristow, Warren <warren.ristow at
sony.com<mailto:warren.ristow at sony.com>> wrote:
> ... except that Warren’s proposal that started this discussion seems to
imply that he
> has a use case that requires reciprocals to be turned off separately.
Just to close this loose end, yes I have a use case.

Specifically, we have a customer that turns on '‑ffast‑math', but was
getting a runtime failure due to the reciprocal-transformation being done.

Can you elaborate what kind of runtime failure is the reciprocal transformation
triggering?

—
Mehdi

They don't want turn off fast‑math because they like the performance
improvement, and can live with the imprecision in most cases.  So they wanted to
suppress just the reciprocal-transformation.  I intended to tell them the
solution was simple: use '‑ffast‑math ‑fno‑reciprocal‑math'.  But on
trying it myself, I ran into the issue here.

Thanks,
-Warren

From: Kaylor, Andrew [mailto:andrew.kaylor at intel.com]
Sent: Wednesday, November 16, 2016 4:13 PM
To: Mehdi Amini <mehdi.amini at apple.com<mailto:mehdi.amini at
apple.com>>; Ristow, Warren <warren.ristow at
sony.com<mailto:warren.ristow at sony.com>>; llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>; Nicolai Hähnle
<nhaehnle at gmail.com<mailto:nhaehnle at gmail.com>>
Subject: RE: [llvm-dev] RFC: Consider changing the semantics of 'fast'
flag implying all fast-math-flags

I don’t really like the idea of updating checks of UnsafeAlgebra() to depend on
all of the other flags.  It seems like it would be preferable to look at each
optimization and figure out which flags it actually requires.  I suspect that in
many cases the “new” flag (i.e. allowing reassociation, etc.) will be what is
actually needed anyway.

I would be inclined to agree with Niolai’s suggestion of combining all the flags
related to value safety, except that Warren’s proposal that started this
discussion seems to imply that he has a use case that requires reciprocals to be
turned off separately.

-Andy

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Mehdi
Amini via llvm-dev
Sent: Wednesday, November 16, 2016 8:55 AM
To: Ristow, Warren <warren.ristow at sony.com<mailto:warren.ristow at
sony.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] RFC: Consider changing the semantics of 'fast'
flag implying all fast-math-flags

On Nov 15, 2016, at 11:59 PM, Ristow, Warren <warren.ristow at
sony.com<mailto:warren.ristow at sony.com>> wrote:

Hi,

Thanks for the quick feedback.  I see your points, but I have a few
questions/comments.  I'll start at the end of the previous post:
> ...
> I think these are valuable problems to solve, but you should tackle them
piece by piece:
>
> 1) the clang part of overriding the individual FMF and emitting the right
IR is the first thing to fix.
> 2) the backend is still using the global UnsafeFPMath and it should be
killed.
I addressed this point (2) for the reciprocal aspect in the patch, but of course
that wasn't useful without doing something about (1).

Regarding (1), over at https://reviews.llvm.org/D26708#596610, David made the
same point that it should be done in Clang.  I can understand that, but I wonder
whether having the concept of the 'fast' flag in the IR that implies all
the other FMF makes sense?  I'm not seeing a good reason for it, but since
this is very new to me, I can easily imagine I'm missing the big picture.

For example, in the LLVM IR (http://llvm.org/docs/LangRef.html#fast-math-flags)
the fast-math flags 'nnan', 'ninf', 'nsz',
'arcp' and 'fast’ are defined.  Except for 'fast', each of
these has a fairly specific definition of what they mean.  For example, for
'arcp':

    arcp => "Allow optimizations to use the reciprocal of an argument
rather
             than perform division."

'fast' is unusual, in that it describes a fairly generic set of
aggressive floating-point optimizations:

    fast => "Allow algebraically equivalent transformations that may
dramatically
            change results in floating point (e.g. reassociate). This flag
implies
            all the others."

Very loosely, 'fast' means "all the aggressive FP-transformations
that are not controlled by one of the other 4, plus it implies all the other
4".  If for terminology, we call those additional aggressive optimizations
'aggr', then we have:

    'fast' == 'aggr' + 'nnan' + 'ninf' +
'nsz' + 'arcp'

So as I see it, if we want to disable only one of the other ones (like
'arcp', in my case), there isn't any way to express that with these
IR flags defined this way.  In short, we cannot turn on all the flags besides
'arcp'.  To do that, what we want is that somehow for the Clang
switches:

  '-ffast-math -fno-reciprocal-math'

to ultimately result in LLVM IR that has the following flags on in appropriate
FP ops:

  'aggr' + 'nnan' + 'ninf' + ‘nsz'

Make sense, I missed that we can’t *subtract* from fast at the IR level.

I wouldn’t be opposed to have something along the line of “aggr”, but there is a
tradeoff: some transformation may be harder to guard with this model.

Maybe that could be a starting point: changing the “UnsafeAlgebra” bit in the
FMF to be “aggr” you mention and replace all the query to
FastMathFlags::UnsafeAlgebra() to return true if all the bits are set in the
Flags. This alone should be nothing more than a mechanical change I believe.
The important part is then auditing all the users of UnsafeAlgebra() in the
middle end and check if they can be “downgraded” to aggr safely: i.e. if they
don’t need aggr *and* another flag.

—
Mehdi

But I don't see a way to express 'aggr' in the IR.  We could do
this, if we change the definition of the IR 'fast' flag to remove that
sentence about implying all the others:

    fast => "Allow algebraically equivalent transformations that may
dramatically
            change results in floating point (e.g. reassociate).

(If we do something like that, we may want to change the name from
'fast' to something else (like 'aggr'), to avoid tying it too
closely to the concept of the '-ffast-math' switch.)

As an aside, I don't know if the "reassociate" example is the only
other transformation that's allowed by 'fast' (I presume it
isn't), but I think reassociation would be better expressed by a separate
flag, which could then be controlled independently via
'-f[no]-associative-math' switch.  Not having that flag exist separately
in the FMF is the origin of PR27372.  But creating that flag and using it in the
appropriate places would still run into these problems of 'fast'
implying all the others, which would make it impossible to disable reassociation
while leaving all the other FMF transformations enabled.

To ask a concrete question using the current definition of 'fast' (which
includes enabling reassociation, as the LLVM IR documentation of FMF says), how
can we express in the IR that reciprocal-transformations are not allowed, but
reassociation is allowed?

So the bottom line is that I do see there are issues in Clang that are relevant.
But as long as 'fast' means "'aggr' plus all the other FMF
transformations", I don't see how we can effectively disable a subset
of those other FMF transformations (while leaving 'aggr'
transformations, such as reassociation, enabled).  With that in mind, my patch
took one step in having 'fast' no longer imply all the others.

Thanks,
-Warren

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161117/910dae84/attachment-0001.html>

Mehdi Amini via llvm-dev

2016-Nov-17 19:38 UTC

head link

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

> On Nov 17, 2016, at 12:51 AM, Ristow, Warren <warren.ristow at
sony.com> wrote:
> 
> Those are all good points.  Your reassociation point in the context of
inlining is particularly interesting.
>  
> FWIW, we also have a case where a customer wants
'-fno-associative-math' to suppress reassociation under
'-ffastmath'.  It would take me a while to find the specifics of the
issue, but it was (if my memory is right) more of a real use-case.  (That is to
say, the code that was "failing" due to reassociation didn't have
an obvious fix like the reciprocal situation, here, other than to turn off
fast-math.)  In fact, the request to suppress reassociation was the motivation
for creating PR27372 in the first place (which eventually fed into this thread).
I have to say that on the reassociation point, my concern is that to really
suppress that, we will have to suppress so much, that there will hardly be any
point in using -ffast-math.
>  
> I'd say your comments here are very similar to what Nicolai said in
another subthread of this discussion:
>  
> >> I'd be really curious to know if there is anybody who really
needs arcp
> >> without fp-contract=fast or vice versa, or who needs both of these
but
> >> not the X*log2(0.5*Y) transform you mentioned, and so on.[1]
> >> ...
> >> [1] One case I _can_ think of (and which may have been a reason
for the
> >> proliferation of flags in the first place) is somebody who enables
fast
> >> math, but then doesn't want their results to change when they
update the
> >> compiler and get a new set of optimizations. But IMO that's a
use case
> >> that should be explicitly rejected.
>  
> I think those are all really good points, and an argument can be made that
when -ffast-math gives you results you don't want, then you just have to
turn it off.  Essentially, the user can't "have his cake and eat it
too".
>  
> All that said, I think we (the company I work for, Sony) will have to
implement support for these switches.  It comes down to GCC has these switches
(e.g., -fno-reciprocal-math and -fno-associative-math), and they do suppress the
transformations for our customers.  They switch to Clang/LLVM, they use the same
switches, and it doesn't "work".  So as a practical matter, I
think we will support them.
My point was that supporting these switch are not a guarantee for a fast-math
user that his code will work, even the same command line flags is enough to make
it work with GCC.
If you are providing these and saying that we are “compatible” with GCC to your
users, in the sense that their code will continue to work, that seems incorrect
to me.
What are you gonna answer them when they’ll use such flag but it won’t be enough
for their code to work with clang even though it works with GCC? (Possibly
because reassociation mess up another part of the code that GCC didn’t mess,
because of different inlining decisions for instance).

>   Whether the LLVM community in general feels that that's required, is
another question.  Until for your recent comments here, and Nicolai's
comments above, I would have thought the answer was clearly yes.  But maybe
that's not the case.
>  
> In summary, irrespective of any (subjective?) assessment of how legitimate
a particular use-case is, do we want switches like:
>  
>     -ffast-math -fno-reciprocal-math
>      -ffast-math -fno-associative-math
>  
> to work?
>  
> For me, the answer is yes, because I have multiple customers that tell me
they really want to leave -ffast-math on, but they want to be able to disable
these sub-categories.  I've been approaching this under the assumption that
the answer is yes for the Clang/LLVM community in general.
The multiple customers may want a pony, we’re not gonna try to give them one
just because they ask. I’d push back on such customer request for the reason I
gave earlier.
If what they want does not make sense or we can’t provide the guarantee they
really want, it is also our job to *not* provide them this and guide them toward
an alternative model that is more controlled, understood, and solve the
underlying problem they have.
As an example of “pony” request: I had a customer that wanted their
floating-point “conformance test” to pass with fast-math:  "float
test_div(float a, float b) { return a/b; }” ; they didn’t see any reason why the
compiler would do anything wrong on such a simple test (except that the HW
didn’t have a division instruction…).

That being said, even though I’m not convinced by your “pony” use case, I don’t
see any reason to not preserve the arcp flag in the IR at this point (Nicolai
may disagree, let see his opinion), and it still make sense to me to try to
change the *fast* flag to “reassociation” (or similar) in the IR (provided that
we don’t find clients of the API that want “more” than reassociation + a
combination of the other flags).

This should be enough to provide these command line switches at the clang level,
and this should avoid you (Sony) to have to maintain any out-of-tree support for
this.

Hope this clarify where I see the direction going, and even if you don’t agree
with my reasoning, the conclusion should be satisfactory on your side :)


— 
Mehdi


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161117/5b6518b0/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Nov 2016 - RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

[llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

Reasonably Related Threads