thr3ads.net - llvm dev - [llvm-dev] Trouble when suppressing a portion of fast-math-transformations [Oct 2017]

If this information is useful, please help other people find it:
Share via:

Ristow, Warren via llvm-dev

2017-Oct-02 13:45 UTC

[llvm-dev] Trouble when suppressing a portion of fast-math-transformations

I'm not aware of any additional bits needed.  But putting us right at the
edge leaves me uncomfortable.  So an implementation that isn't limited by
the 7 bits in SubclassOptionalData seems sensible.

Thanks,
-Warren

From: Sanjay Patel [mailto:spatel at rotateright.com]
Sent: Monday, October 2, 2017 12:06 AM
To: Ristow, Warren
Cc: Hal Finkel; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Trouble when suppressing a portion of
fast-math-transformations

Are we confident that we just need those 7 bits to represent all of the relaxed
FP states that we need/want to support?

I'm asking because FMF in IR is currently mapped onto the
SubclassOptionalData of Value...and we have exactly 7 bits there. :)
If we're redoing the definitions, I'm wondering if we can share the
struct with the backend's SDNodeFlags, but that already has one extra bit
for vector reduction. Should we give up on SubclassOptionalData for FMF? We have
a MD_fpmath enum value for metadata, so we could move things over there?

On Fri, Sep 29, 2017 at 8:16 PM, Ristow, Warren via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi Hal,
>> 4. To fix this, I think that additional fast-math-flags are likely
>> needed in the IR.  Instead of the following set:
>>
>> 'nnan' + 'ninf' + 'nsz' + 'arcp' +
'contract'
>>
>> something like this:
>>
>> 'reassoc' + 'libm' + 'nnan' + 'ninf' +
'nsz' + 'arcp' + 'contract'
>>
>> would be more useful.  Related to this, the current 'fast' flag
which acts
>> as an umbrella (enabling 'nnan' + 'ninf' +
'nsz' + 'arcp' + 'contract') may
>> not be needed.  A discussion on this point was raised last November on
the
>> mailing list:
>>
>> http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
>
> I agree. I'm happy to help review the patches. It will be best to have
> only the finer-grained flags where there's no "fast" flag
that implies
> all of the others.
Thanks for the quick response, and for the willingness to review.  I won't
let
this languish so long, like the post from last November.

Happy to hear that you feel it's best not to have the umbrella
"fast" flag.

Thanks again,
-Warren
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171002/d69e4328/attachment.html>

Bruce Hoult via llvm-dev

2017-Oct-02 16:10 UTC

head link

[llvm-dev] Trouble when suppressing a portion of fast-math-transformations

Is there anything that means, in particular, "go fast, even if it means not
all bits are significant"?

I'm currently working on an llvm-based compiler for a GPU that is optomised
for OpenGL, where 16 bit FP may not be quite accurate enough (or may be in
some cases), but 32 bit FP is overkill. A lot of the fast, built in,
operations end up with a few junk bits at the end (not add/sub/mul . but
divide is available *only* using reciprocal).

When implementing OpenCL, the specs and conformance tests require full IEEE
accuracy. In some cases this requires a round of Newton-Raphson to clean up
the accuracy, which is a significant though maybe not crippling performance
penalty. But in other cases we need to do a lot of range reduction, some
polynomial, and then generalise the result again. This can be an order of
magnitude or more slower than using the not-quite-accurate-enough built in
instruction.

The OpenCL spec defines a number of compile flags controlling
optimizartions. Some seem to map well onto the flags already discussed here:

-cl-mad-enable
-cl-no-signed-zeros
-cl-finite-math-only

However it looks to me that the following ones don't presently map well to
LLVM:

-cl-unsafe-math-optimizations
Allow optimizations for floating-point arithmetic that (a) assume that
arguments and results are valid, (b) may violate IEEE 754 standard and (c)
may violate the OpenCL numerical compliance requirements as defined in the
SPIR-V OpenCL environment specification for single precision and double
precision floating-point, and edge case behavior in the SPIR-V OpenCL
environment specification. This option includes the -clno-signed-zeros and
-cl-mad-enable options.

-cl-fast-relaxed-math
Sets the optimization options -cl-finite-math-only and
-cl-unsafe-math-optimizations. This allows optimizations for floating-point
arithmetic that may violate the IEEE 754 standard and the OpenCL numerical
compliance requirements for single precision and double precision
floating-point, as well as floating point edge case behavior. This option
also relaxes the precision of commonly used math functions. This option
causes the preprocessor macro __FAST_RELAXED_MATH__ to be defined in the
OpenCL program. The original and modified values are defined in the SPIR-V
OpenCL environment specification

I'd like to emphasise in the latter one: "This option also relaxes the
precision of commonly used math functions."


On Mon, Oct 2, 2017 at 4:45 PM, Ristow, Warren via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I'm not aware of any additional bits needed.  But putting us right at
the
> edge leaves me uncomfortable.  So an implementation that isn't limited
by
> the 7 bits in SubclassOptionalData seems sensible.
>
>
>
> Thanks,
>
> -Warren
>
>
>
> *From:* Sanjay Patel [mailto:spatel at rotateright.com]
> *Sent:* Monday, October 2, 2017 12:06 AM
> *To:* Ristow, Warren
> *Cc:* Hal Finkel; llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] Trouble when suppressing a portion of
> fast-math-transformations
>
>
>
> Are we confident that we just need those 7 bits to represent all of the
> relaxed FP states that we need/want to support?
>
>
>
> I'm asking because FMF in IR is currently mapped onto the
> SubclassOptionalData of Value...and we have exactly 7 bits there. :)
>
> If we're redoing the definitions, I'm wondering if we can share the
struct
> with the backend's SDNodeFlags, but that already has one extra bit for
> vector reduction. Should we give up on SubclassOptionalData for FMF? We
> have a MD_fpmath enum value for metadata, so we could move things over
> there?
>
>
>
>
>
> On Fri, Sep 29, 2017 at 8:16 PM, Ristow, Warren via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi Hal,
>
> >> 4. To fix this, I think that additional fast-math-flags are likely
> >> needed in the IR.  Instead of the following set:
> >>
> >> 'nnan' + 'ninf' + 'nsz' + 'arcp' +
'contract'
> >>
> >> something like this:
> >>
> >> 'reassoc' + 'libm' + 'nnan' +
'ninf' + 'nsz' + 'arcp' + 'contract'
> >>
> >> would be more useful.  Related to this, the current 'fast'
flag which
> acts
> >> as an umbrella (enabling 'nnan' + 'ninf' +
'nsz' + 'arcp' + 'contract')
> may
> >> not be needed.  A discussion on this point was raised last
November on
> the
> >> mailing list:
> >>
> >> http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
> >
> > I agree. I'm happy to help review the patches. It will be best to
have
> > only the finer-grained flags where there's no "fast"
flag that implies
> > all of the others.
>
> Thanks for the quick response, and for the willingness to review.  I
won't
> let
> this languish so long, like the post from last November.
>
> Happy to hear that you feel it's best not to have the umbrella
"fast" flag.
>
> Thanks again,
>
> -Warren
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171002/d8dcbd93/attachment.html>

Hal Finkel via llvm-dev

2017-Oct-03 01:03 UTC

head link

[llvm-dev] Trouble when suppressing a portion of fast-math-transformations

On 10/02/2017 11:10 AM, Bruce Hoult via llvm-dev wrote:> Is there anything that means, in particular, "go fast, even if it 
> means not all bits are significant"?
>
> I'm currently working on an llvm-based compiler for a GPU that is 
> optomised for OpenGL, where 16 bit FP may not be quite accurate enough 
> (or may be in some cases), but 32 bit FP is overkill. A lot of the 
> fast, built in, operations end up with a few junk bits at the end (not 
> add/sub/mul . but divide is available *only* using reciprocal).
>
> When implementing OpenCL, the specs and conformance tests require full 
> IEEE accuracy. In some cases this requires a round of Newton-Raphson 
> to clean up the accuracy, which is a significant though maybe not 
> crippling performance penalty. But in other cases we need to do a lot 
> of range reduction, some polynomial, and then generalise the result 
> again. This can be an order of magnitude or more slower than using the 
> not-quite-accurate-enough built in instruction.
This is what arcp is for (implying that you can use the reciprocal 
estimate and not worry about getting the exact answer). Now there's a 
separate question about how many Newton iterations to use, and we have a 
separate flag for that (-mrecip=...). Check out the implementation of  
TargetLoweringBase::getRecipEstimateSqrtEnabled to see how it's setup in 
backend. This is, however, per function, so we don't currently have a 
per-operation control on this.
>
> The OpenCL spec defines a number of compile flags controlling 
> optimizartions. Some seem to map well onto the flags already discussed 
> here:
>
> -cl-mad-enable
> -cl-no-signed-zeros
> -cl-finite-math-only
>
> However it looks to me that the following ones don't presently map 
> well to LLVM:
>
> -cl-unsafe-math-optimizations
> Allow optimizations for floating-point arithmetic that (a) assume that 
> arguments and results are valid, (b) may violate IEEE 754 standard and 
> (c) may violate the OpenCL numerical compliance requirements as 
> defined in the SPIR-V OpenCL environment specification for single 
> precision and double precision floating-point, and edge case behavior 
> in the SPIR-V OpenCL environment specification. This option includes 
> the -clno-signed-zeros and -cl-mad-enable options.
I think the idea is that this flag, like -funsafe-math-optimizations, 
gets mapped to an appropriate collection of finer-grained flags internally.
>
> -cl-fast-relaxed-math
> Sets the optimization options -cl-finite-math-only and 
> -cl-unsafe-math-optimizations. This allows optimizations for 
> floating-point arithmetic that may violate the IEEE 754 standard and 
> the OpenCL numerical compliance requirements for single precision and 
> double precision floating-point, as well as floating point edge case 
> behavior. This option also relaxes the precision of commonly used math 
> functions. This option causes the preprocessor macro 
> __FAST_RELAXED_MATH__ to be defined in the OpenCL program. The 
> original and modified values are defined in the SPIR-V OpenCL 
> environment specification
>
> I'd like to emphasise in the latter one: "This option also relaxes
the
> precision of commonly used math functions."
Isn't this the "libm" flag that is proposed in this thread?

  -Hal
>
>
> On Mon, Oct 2, 2017 at 4:45 PM, Ristow, Warren via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     I'm not aware of any additional bits needed.  But putting us right
>     at the edge leaves me uncomfortable.  So an implementation that
>     isn't limited by the 7 bits in SubclassOptionalData seems sensible.
>
>     Thanks,
>
>     -Warren
>
>     *From:*Sanjay Patel [mailto:spatel at rotateright.com
>     <mailto:spatel at rotateright.com>]
>     *Sent:* Monday, October 2, 2017 12:06 AM
>     *To:* Ristow, Warren
>     *Cc:* Hal Finkel; llvm-dev at lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org>
>     *Subject:* Re: [llvm-dev] Trouble when suppressing a portion of
>     fast-math-transformations
>
>     Are we confident that we just need those 7 bits to represent all
>     of the relaxed FP states that we need/want to support?
>
>     I'm asking because FMF in IR is currently mapped onto the
>     SubclassOptionalData of Value...and we have exactly 7 bits there. :)
>
>     If we're redoing the definitions, I'm wondering if we can share
>     the struct with the backend's SDNodeFlags, but that already has
>     one extra bit for vector reduction. Should we give up on
>     SubclassOptionalData for FMF? We have a MD_fpmath enum value for
>     metadata, so we could move things over there?
>
>     On Fri, Sep 29, 2017 at 8:16 PM, Ristow, Warren via llvm-dev
>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     Hi Hal,
>
>     >> 4. To fix this, I think that additional fast-math-flags are
likely
>     >> needed in the IR.  Instead of the following set:
>     >>
>     >> 'nnan' + 'ninf' + 'nsz' +
'arcp' + 'contract'
>     >>
>     >> something like this:
>     >>
>     >> 'reassoc' + 'libm' + 'nnan' +
'ninf' + 'nsz' + 'arcp' + 'contract'
>     >>
>     >> would be more useful.  Related to this, the current
'fast' flag
>     which acts
>     >> as an umbrella (enabling 'nnan' + 'ninf' +
'nsz' + 'arcp' +
>     'contract') may
>     >> not be needed.  A discussion on this point was raised last
>     November on the
>     >> mailing list:
>     >>
>     >>
>     http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
>    
<http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html>
>     >
>     > I agree. I'm happy to help review the patches. It will be best
>     to have
>     > only the finer-grained flags where there's no "fast"
flag that
>     implies
>     > all of the others.
>
>     Thanks for the quick response, and for the willingness to review. 
>     I won't let
>     this languish so long, like the post from last November.
>
>     Happy to hear that you feel it's best not to have the umbrella
>     "fast" flag.
>
>     Thanks again,
>
>     -Warren
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20171002/b8c7e7f7/attachment.html>

llvm dev - Oct 2017 - Trouble when suppressing a portion of fast-math-transformations

[llvm-dev] Trouble when suppressing a portion of fast-math-transformations

[llvm-dev] Trouble when suppressing a portion of fast-math-transformations

[llvm-dev] Trouble when suppressing a portion of fast-math-transformations