Steve (Numerics) Canon via llvm-dev
2017-Jan-12 19:04 UTC
[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM
> On Jan 12, 2017, at 12:58 PM, Friedman, Eli via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On 1/12/2017 9:33 AM, Mehdi Amini via llvm-dev wrote: >>> On Jan 12, 2017, at 5:03 AM, Antoine Pitrou via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>> >>> On Mon, 9 Jan 2017 11:43:17 -0600 >>> Wei Ding via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>>> Hi, >>>> >>>> I want an efficient way to implement function pow in LLVM instead of >>>> invoking pow() math built-in. For algorithm part, I am clear for the logic. >>>> But I am not quite sure for which parts of LLVM should I replace built-in >>>> pow with another efficient pow implementation. Any comments and feedback >>>> are appreciated. Thanks! >>> In Numba, we have decided to optimize some usages of the power function >>> in our own front-end, so that LLVM IR gets an already optimized form, >>> as we have found that otherwise LLVM may miss some optimization >>> opportunities. YMMV. >> It seems to me that it would be more interesting to gather these misoptimization and fix LLVM to catch them. >> >>> (e.g. we detect that the exponent is a compile-time constant and >>> transform `x**3` into `x*x*x`) >> This seems definitely in the scope of what LLVM could do, potentially with TTI. > > LLVM already does this... but only if the pow() call is marked "fast". IEEE 754 pow() is supposed to be correctly rounded, but (x*x)*x has an extra rounding step.pow( ) is not supposed to be correctly rounded. IEEE 754 recommends that a correctly rounded power function exist [9.2], but does not recommend or require that this be the default pow( ) function. [Note: it was not actually shown until quite late in the IEEE 754 revision process that it was even possible to have a reasonably efficient correctly-rounded pow( ) function, and such a function is at minimum an order of magnitude slower than a good sub-ulp-accurate-but-not-correctly-rounded implementation. Most 754 committee members would recommend that the default pow( ) function *not* be correctly rounded.] In a good math library, however, pow( ) should absolutely be sub-ulp accurate, which means that it should *not* be implemented via log( ) and exp( ), and indeed, most math libraries don’t do that. Just to provide some example data, for single-precision x in [1,2) on current OS X: The worst-case error of x*x*x is 1.28736 ulp The worst-case error of powf(x, 3) is 0.500013 ulp The RMS error of x*x*x is 0.585066 ulp The RMS error of powf(x,3) is 0.499984 ulp – Steve
Mehdi Amini via llvm-dev
2017-Jan-12 19:21 UTC
[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM
> On Jan 12, 2017, at 11:04 AM, Steve (Numerics) Canon <scanon at apple.com> wrote: > >> >> On Jan 12, 2017, at 12:58 PM, Friedman, Eli via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> On 1/12/2017 9:33 AM, Mehdi Amini via llvm-dev wrote: >>>> On Jan 12, 2017, at 5:03 AM, Antoine Pitrou via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>>> >>>> On Mon, 9 Jan 2017 11:43:17 -0600 >>>> Wei Ding via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>>>> Hi, >>>>> >>>>> I want an efficient way to implement function pow in LLVM instead of >>>>> invoking pow() math built-in. For algorithm part, I am clear for the logic. >>>>> But I am not quite sure for which parts of LLVM should I replace built-in >>>>> pow with another efficient pow implementation. Any comments and feedback >>>>> are appreciated. Thanks! >>>> In Numba, we have decided to optimize some usages of the power function >>>> in our own front-end, so that LLVM IR gets an already optimized form, >>>> as we have found that otherwise LLVM may miss some optimization >>>> opportunities. YMMV. >>> It seems to me that it would be more interesting to gather these misoptimization and fix LLVM to catch them. >>> >>>> (e.g. we detect that the exponent is a compile-time constant and >>>> transform `x**3` into `x*x*x`) >>> This seems definitely in the scope of what LLVM could do, potentially with TTI. >> >> LLVM already does this... but only if the pow() call is marked "fast". IEEE 754 pow() is supposed to be correctly rounded, but (x*x)*x has an extra rounding step. > > pow( ) is not supposed to be correctly rounded. > > IEEE 754 recommends that a correctly rounded power function exist [9.2], but does not recommend or require that this be the default pow( ) function. [Note: it was not actually shown until quite late in the IEEE 754 revision process that it was even possible to have a reasonably efficient correctly-rounded pow( ) function, and such a function is at minimum an order of magnitude slower than a good sub-ulp-accurate-but-not-correctly-rounded implementation. Most 754 committee members would recommend that the default pow( ) function *not* be correctly rounded.]So we should use x*x*x even without fast-math...> In a good math library, however, pow( ) should absolutely be sub-ulp accurate, which means that it should *not* be implemented via log( ) and exp( ), and indeed, most math libraries don’t do that. > > Just to provide some example data, for single-precision x in [1,2) on current OS X: > > The worst-case error of x*x*x is 1.28736 ulp > The worst-case error of powf(x, 3) is 0.500013 ulp > > The RMS error of x*x*x is 0.585066 ulp > The RMS error of powf(x,3) is 0.499984 ulp… or maybe not! :) I’m not sure what to think of these results. For instance what’s the perf impact of calling into powf(x, 3) on OSX vs using x*x*x? At the source level, what could the user do to decide that 1.28ULP is acceptable for his use-case? — Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170112/7c91e2e4/attachment.html>
Steve (Numerics) Canon via llvm-dev
2017-Jan-12 19:31 UTC
[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM
> On Jan 12, 2017, at 2:21 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: > > >> On Jan 12, 2017, at 11:04 AM, Steve (Numerics) Canon <scanon at apple.com <mailto:scanon at apple.com>> wrote: >> >>> >>> On Jan 12, 2017, at 12:58 PM, Friedman, Eli via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>> >>> On 1/12/2017 9:33 AM, Mehdi Amini via llvm-dev wrote: >>>>> On Jan 12, 2017, at 5:03 AM, Antoine Pitrou via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>>> >>>>> On Mon, 9 Jan 2017 11:43:17 -0600 >>>>> Wei Ding via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>>>> Hi, >>>>>> >>>>>> I want an efficient way to implement function pow in LLVM instead of >>>>>> invoking pow() math built-in. For algorithm part, I am clear for the logic. >>>>>> But I am not quite sure for which parts of LLVM should I replace built-in >>>>>> pow with another efficient pow implementation. Any comments and feedback >>>>>> are appreciated. Thanks! >>>>> In Numba, we have decided to optimize some usages of the power function >>>>> in our own front-end, so that LLVM IR gets an already optimized form, >>>>> as we have found that otherwise LLVM may miss some optimization >>>>> opportunities. YMMV. >>>> It seems to me that it would be more interesting to gather these misoptimization and fix LLVM to catch them. >>>> >>>>> (e.g. we detect that the exponent is a compile-time constant and >>>>> transform `x**3` into `x*x*x`) >>>> This seems definitely in the scope of what LLVM could do, potentially with TTI. >>> >>> LLVM already does this... but only if the pow() call is marked "fast". IEEE 754 pow() is supposed to be correctly rounded, but (x*x)*x has an extra rounding step. >> >> pow( ) is not supposed to be correctly rounded. >> >> IEEE 754 recommends that a correctly rounded power function exist [9.2], but does not recommend or require that this be the default pow( ) function. [Note: it was not actually shown until quite late in the IEEE 754 revision process that it was even possible to have a reasonably efficient correctly-rounded pow( ) function, and such a function is at minimum an order of magnitude slower than a good sub-ulp-accurate-but-not-correctly-rounded implementation. Most 754 committee members would recommend that the default pow( ) function *not* be correctly rounded.] > > So we should use x*x*x even without fast-math... > >> In a good math library, however, pow( ) should absolutely be sub-ulp accurate, which means that it should *not* be implemented via log( ) and exp( ), and indeed, most math libraries don’t do that. >> >> Just to provide some example data, for single-precision x in [1,2) on current OS X: >> >> The worst-case error of x*x*x is 1.28736 ulp >> The worst-case error of powf(x, 3) is 0.500013 ulp >> >> The RMS error of x*x*x is 0.585066 ulp >> The RMS error of powf(x,3) is 0.499984 ulp > > … or maybe not! :) > > I’m not sure what to think of these results. For instance what’s the perf impact of calling into powf(x, 3) on OSX vs using x*x*x?Large! x*x*x has reciprocal throughput of 1 cycle, plus the opportunity for the compiler to autovectorize. powf(x, 3) is much harder to vectorize, and has reciprocal throughput of ~40 cycles on OS X (which is on the faster side; it’s more like 100 cycles on some platforms).> At the source level, what could the user do to decide that 1.28ULP is acceptable for his use-case?Write x*x*x. =) I can see an argument for adding a flag that allows a few ulp of error in math functions (roughly what Intel’s LA versions in MKL or some of the YEPPP implementations provide), but there is a question of how many users would actually find it and use it. – Steve -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170112/5ad886d5/attachment.html>