thr3ads.net - llvm dev - [llvm-dev] The most efficient way to implement an integer based power function pow in LLVM [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Steve (Numerics) Canon via llvm-dev

2017-Jan-12 19:04 UTC

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

> On Jan 12, 2017, at 12:58 PM, Friedman, Eli via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On 1/12/2017 9:33 AM, Mehdi Amini via llvm-dev wrote:
>>> On Jan 12, 2017, at 5:03 AM, Antoine Pitrou via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>> 
>>> On Mon, 9 Jan 2017 11:43:17 -0600
>>> Wei Ding via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>> Hi,
>>>> 
>>>> I want an efficient way to implement function pow in LLVM
instead of
>>>> invoking pow() math built-in. For algorithm part, I am clear
for the logic.
>>>> But I am not quite sure for which parts of LLVM should I
replace built-in
>>>> pow with another efficient pow implementation. Any comments and
feedback
>>>> are appreciated. Thanks!
>>> In Numba, we have decided to optimize some usages of the power
function
>>> in our own front-end, so that LLVM IR gets an already optimized
form,
>>> as we have found that otherwise LLVM may miss some optimization
>>> opportunities. YMMV.
>> It seems to me that it would be more interesting to gather these
misoptimization and fix LLVM to catch them.
>> 
>>> (e.g. we detect that the exponent is a compile-time constant and
>>> transform `x**3` into `x*x*x`)
>> This seems definitely in the scope of what LLVM could do, potentially
with TTI.
> 
> LLVM already does this... but only if the pow() call is marked
"fast".  IEEE 754 pow() is supposed to be correctly rounded, but
(x*x)*x has an extra rounding step.
pow( ) is not supposed to be correctly rounded.

IEEE 754 recommends that a correctly rounded power function exist [9.2], but
does not recommend or require that this be the default pow( ) function. [Note:
it was not actually shown until quite late in the IEEE 754 revision process that
it was even possible to have a reasonably efficient correctly-rounded pow( )
function, and such a function is at minimum an order of magnitude slower than a
good sub-ulp-accurate-but-not-correctly-rounded implementation. Most 754
committee members would recommend that the default pow( ) function *not* be
correctly rounded.]

In a good math library, however, pow( ) should absolutely be sub-ulp accurate,
which means that it should *not* be implemented via log( ) and exp( ), and
indeed, most math libraries don’t do that.

Just to provide some example data, for single-precision x in [1,2) on current OS
X:

	The worst-case error of x*x*x is 1.28736 ulp
	The worst-case error of powf(x, 3) is 0.500013 ulp

	The RMS error of x*x*x is 0.585066 ulp
	The RMS error of powf(x,3) is 0.499984 ulp

– Steve

Mehdi Amini via llvm-dev

2017-Jan-12 19:21 UTC

head link

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

> On Jan 12, 2017, at 11:04 AM, Steve (Numerics) Canon <scanon at
apple.com> wrote:
> 
>> 
>> On Jan 12, 2017, at 12:58 PM, Friedman, Eli via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
>> 
>> On 1/12/2017 9:33 AM, Mehdi Amini via llvm-dev wrote:
>>>> On Jan 12, 2017, at 5:03 AM, Antoine Pitrou via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>>> 
>>>> On Mon, 9 Jan 2017 11:43:17 -0600
>>>> Wei Ding via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>>> Hi,
>>>>> 
>>>>> I want an efficient way to implement function pow in LLVM
instead of
>>>>> invoking pow() math built-in. For algorithm part, I am
clear for the logic.
>>>>> But I am not quite sure for which parts of LLVM should I
replace built-in
>>>>> pow with another efficient pow implementation. Any comments
and feedback
>>>>> are appreciated. Thanks!
>>>> In Numba, we have decided to optimize some usages of the power
function
>>>> in our own front-end, so that LLVM IR gets an already optimized
form,
>>>> as we have found that otherwise LLVM may miss some optimization
>>>> opportunities. YMMV.
>>> It seems to me that it would be more interesting to gather these
misoptimization and fix LLVM to catch them.
>>> 
>>>> (e.g. we detect that the exponent is a compile-time constant
and
>>>> transform `x**3` into `x*x*x`)
>>> This seems definitely in the scope of what LLVM could do,
potentially with TTI.
>> 
>> LLVM already does this... but only if the pow() call is marked
"fast".  IEEE 754 pow() is supposed to be correctly rounded, but
(x*x)*x has an extra rounding step.
> 
> pow( ) is not supposed to be correctly rounded.
> 
> IEEE 754 recommends that a correctly rounded power function exist [9.2],
but does not recommend or require that this be the default pow( ) function.
[Note: it was not actually shown until quite late in the IEEE 754 revision
process that it was even possible to have a reasonably efficient
correctly-rounded pow( ) function, and such a function is at minimum an order of
magnitude slower than a good sub-ulp-accurate-but-not-correctly-rounded
implementation. Most 754 committee members would recommend that the default pow(
) function *not* be correctly rounded.]
So we should use x*x*x even without fast-math...
> In a good math library, however, pow( ) should absolutely be sub-ulp
accurate, which means that it should *not* be implemented via log( ) and exp( ),
and indeed, most math libraries don’t do that.
> 
> Just to provide some example data, for single-precision x in [1,2) on
current OS X:
> 
> 	The worst-case error of x*x*x is 1.28736 ulp
> 	The worst-case error of powf(x, 3) is 0.500013 ulp
> 
> 	The RMS error of x*x*x is 0.585066 ulp
> 	The RMS error of powf(x,3) is 0.499984 ulp
… or maybe not! :)

I’m not sure what to think of these results. For instance what’s the perf impact
of calling into powf(x, 3) on OSX vs using x*x*x?
At the source level, what could the user do to decide that 1.28ULP is acceptable
for his use-case?

— 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170112/7c91e2e4/attachment.html>

Steve (Numerics) Canon via llvm-dev

2017-Jan-12 19:31 UTC

head link

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

> On Jan 12, 2017, at 2:21 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
> 
> 
>> On Jan 12, 2017, at 11:04 AM, Steve (Numerics) Canon <scanon at
apple.com <mailto:scanon at apple.com>> wrote:
>> 
>>> 
>>> On Jan 12, 2017, at 12:58 PM, Friedman, Eli via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>> 
>>> On 1/12/2017 9:33 AM, Mehdi Amini via llvm-dev wrote:
>>>>> On Jan 12, 2017, at 5:03 AM, Antoine Pitrou via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>>> 
>>>>> On Mon, 9 Jan 2017 11:43:17 -0600
>>>>> Wei Ding via llvm-dev <llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>> wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I want an efficient way to implement function pow in
LLVM instead of
>>>>>> invoking pow() math built-in. For algorithm part, I am
clear for the logic.
>>>>>> But I am not quite sure for which parts of LLVM should
I replace built-in
>>>>>> pow with another efficient pow implementation. Any
comments and feedback
>>>>>> are appreciated. Thanks!
>>>>> In Numba, we have decided to optimize some usages of the
power function
>>>>> in our own front-end, so that LLVM IR gets an already
optimized form,
>>>>> as we have found that otherwise LLVM may miss some
optimization
>>>>> opportunities. YMMV.
>>>> It seems to me that it would be more interesting to gather
these misoptimization and fix LLVM to catch them.
>>>> 
>>>>> (e.g. we detect that the exponent is a compile-time
constant and
>>>>> transform `x**3` into `x*x*x`)
>>>> This seems definitely in the scope of what LLVM could do,
potentially with TTI.
>>> 
>>> LLVM already does this... but only if the pow() call is marked
"fast".  IEEE 754 pow() is supposed to be correctly rounded, but
(x*x)*x has an extra rounding step.
>> 
>> pow( ) is not supposed to be correctly rounded.
>> 
>> IEEE 754 recommends that a correctly rounded power function exist
[9.2], but does not recommend or require that this be the default pow( )
function. [Note: it was not actually shown until quite late in the IEEE 754
revision process that it was even possible to have a reasonably efficient
correctly-rounded pow( ) function, and such a function is at minimum an order of
magnitude slower than a good sub-ulp-accurate-but-not-correctly-rounded
implementation. Most 754 committee members would recommend that the default pow(
) function *not* be correctly rounded.]
> 
> So we should use x*x*x even without fast-math...
> 
>> In a good math library, however, pow( ) should absolutely be sub-ulp
accurate, which means that it should *not* be implemented via log( ) and exp( ),
and indeed, most math libraries don’t do that.
>> 
>> Just to provide some example data, for single-precision x in [1,2) on
current OS X:
>> 
>> 	The worst-case error of x*x*x is 1.28736 ulp
>> 	The worst-case error of powf(x, 3) is 0.500013 ulp
>> 
>> 	The RMS error of x*x*x is 0.585066 ulp
>> 	The RMS error of powf(x,3) is 0.499984 ulp
> 
> … or maybe not! :)
> 
> I’m not sure what to think of these results. For instance what’s the perf
impact of calling into powf(x, 3) on OSX vs using x*x*x?
Large! x*x*x has reciprocal throughput of 1 cycle, plus the opportunity for the
compiler to autovectorize. powf(x, 3) is much harder to vectorize, and has
reciprocal throughput of ~40 cycles on OS X (which is on the faster side; it’s
more like 100 cycles on some platforms).
> At the source level, what could the user do to decide that 1.28ULP is
acceptable for his use-case?
Write x*x*x. =)

I can see an argument for adding a flag that allows a few ulp of error in math
functions (roughly what Intel’s LA versions in MKL or some of the YEPPP
implementations provide), but there is a question of how many users would
actually find it and use it.

– Steve

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170112/5ad886d5/attachment.html>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Jan 2017 - The most efficient way to implement an integer based power function pow in LLVM

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

Apparently Analagous Threads