thr3ads.net - llvm dev - [llvm-dev] The most efficient way to implement an integer based power function pow in LLVM [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Wei Ding via llvm-dev

2017-Jan-09 17:43 UTC

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

Hi,

I want an efficient way to implement function pow in LLVM instead of
invoking pow() math built-in. For algorithm part, I am clear for the logic.
But I am not quite sure for which parts of LLVM should I replace built-in
pow with another efficient pow implementation. Any comments and feedback
are appreciated. Thanks!

-- 
Wei Ding
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170109/69fc4143/attachment.html>

Friedman, Eli via llvm-dev

2017-Jan-09 18:19 UTC

head link

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

On 1/9/2017 9:43 AM, Wei Ding via llvm-dev wrote:> Hi,
>
> I want an efficient way to implement function pow in LLVM instead of 
> invoking pow() math built-in. For algorithm part, I am clear for the 
> logic. But I am not quite sure for which parts of LLVM should I 
> replace built-in pow with another efficient pow implementation. Any 
> comments and feedback are appreciated. Thanks!
The llvm.pow() intrinsic just calls pow() from libm; if you want to 
replace it with a different implementation, you can just write a 
function called pow() and link it into your program.

If you're looking for the part of LLVM which optimizes pow() calls, see 
LibCallSimplifier::optimizePow in lib/Transforms/Utils/SimplifyLibCalls.cpp.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

Antoine Pitrou via llvm-dev

2017-Jan-12 13:03 UTC

head link

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

On Mon, 9 Jan 2017 11:43:17 -0600
Wei Ding via llvm-dev <llvm-dev at lists.llvm.org>
wrote:> Hi,
> 
> I want an efficient way to implement function pow in LLVM instead of
> invoking pow() math built-in. For algorithm part, I am clear for the logic.
> But I am not quite sure for which parts of LLVM should I replace built-in
> pow with another efficient pow implementation. Any comments and feedback
> are appreciated. Thanks!
In Numba, we have decided to optimize some usages of the power function
in our own front-end, so that LLVM IR gets an already optimized form,
as we have found that otherwise LLVM may miss some optimization
opportunities. YMMV.

(e.g. we detect that the exponent is a compile-time constant and
transform `x**3` into `x*x*x`)

Note that not only `pow(x, 3)` can be slower than `x*x*x`, but it may
also have some precision issues, as it goes through log() and exp()
calls.

Regards

Antoine.

Mehdi Amini via llvm-dev

2017-Jan-12 17:33 UTC

head link

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

> On Jan 12, 2017, at 5:03 AM, Antoine Pitrou via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On Mon, 9 Jan 2017 11:43:17 -0600
> Wei Ding via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>> Hi,
>> 
>> I want an efficient way to implement function pow in LLVM instead of
>> invoking pow() math built-in. For algorithm part, I am clear for the
logic.
>> But I am not quite sure for which parts of LLVM should I replace
built-in
>> pow with another efficient pow implementation. Any comments and
feedback
>> are appreciated. Thanks!
> 
> In Numba, we have decided to optimize some usages of the power function
> in our own front-end, so that LLVM IR gets an already optimized form,
> as we have found that otherwise LLVM may miss some optimization
> opportunities. YMMV.
It seems to me that it would be more interesting to gather these misoptimization
and fix LLVM to catch them.
> 
> (e.g. we detect that the exponent is a compile-time constant and
> transform `x**3` into `x*x*x`)
This seems definitely in the scope of what LLVM could do, potentially with TTI.

— 
Mehdi

> 
> Note that not only `pow(x, 3)` can be slower than `x*x*x`, but it may
> also have some precision issues, as it goes through log() and exp()
> calls.
> 
> Regards
> 
> Antoine.
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Francois Fayard via llvm-dev

2017-Jan-13 08:32 UTC

head link

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

Bare in mind that writing a very efficient x^n is not that straightforward. Even
if it’s a corner cases, the example of x^7 is interesting. If you apply the
divide and rule algorithm, you get the following algorithm:

double pow_7(double x) {
  const double x2 = x * x;
  const double x3 = x * x2;
  const double x6 = x3 * x3;
  const double x7 = x * x6;

  return x7;
}

But you can also write

double pow_7(double x) {
  const double x2 = x * x;
  const double x3 = x * x2;
  const double x4 = x2 * x2;
  const double x7 = x3 * x4;

  return x7;
}

which is a different algorithm using the same number of application. But,
because of instruction level parallelism, they don’t get the same performance.
In the first implementation every single instruction has to wait for the
previous instruction to start. In the second one x3 and x4 could be computed
using instruction level parallelism as they don’t depend upon each other.

The performance difference in between the two are (compiled with -O3 and
-mavx2):
- clang 3.9.1: The first one is 25% slower than the second one
- intel 17.0.1: The first one is 05% slower than the second one
- gcc 6.2.0: The first one is 3% slower than the first one (but both are 2 times
slower than clang and intel)

Attached is the code asserting my claim.

François Fayard

====
#include <chrono>
#include <iostream>

int main() {
  const int n = 1000000000;
  const int length = 10;
  double* x = new double[length];
  for (int i = 0; i < length; ++i) {
    x[i] = 1.0;
  }

  asm volatile("" : : "g"(x) : "memory");

  auto begin = std::chrono::high_resolution_clock::now();
  for (int k = 0; k < n; ++k) {
    for (int i = 0; i < length; ++i) {
      const double x1 = x[i];
      const double x2 = x1 * x1;
      const double x3 = x1 * x2;
      const double x4 = x2 * x2;
      const double x7 = x3 * x4;
      x[i] = x7;
    }
  }
  auto end = std::chrono::high_resolution_clock::now();
  double time_fast       1.0e-9 *
      std::chrono::duration_cast<std::chrono::nanoseconds>(end -
begin).count();

  begin = std::chrono::high_resolution_clock::now();
  for (int k = 0; k < n; ++k) {
    for (int i = 0; i < length; ++i) {
      const double x1 = x[i];
      const double x2 = x1 * x1;
      const double x3 = x1 * x2;
      const double x6 = x3 * x3;
      const double x7 = x1 * x6;
      x[i] = x7;
    }
  }
  end = std::chrono::high_resolution_clock::now();
  double time_divide_and_rule       1.0e-9 *
      std::chrono::duration_cast<std::chrono::nanoseconds>(end -
begin).count();

  std::cout << "Optimized x^7: "
            << 1.0e9 * time_fast / (static_cast<double>(n) * length)
<< " ns"
            << std::endl;
  std::cout << "  Classic x^7: "
            << 1.0e9 * time_divide_and_rule /
(static_cast<double>(n) * length)
            << " ns" << std::endl;

  delete[] x;

  return 0;
}

====> On Jan 9, 2017, at 6:43 PM, Wei Ding via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
> 
> I want an efficient way to implement function pow in LLVM instead of
invoking pow() math built-in. For algorithm part, I am clear for the logic. But
I am not quite sure for which parts of LLVM should I replace built-in pow with
another efficient pow implementation. Any comments and feedback are appreciated.
Thanks!

mats petersson via llvm-dev

2017-Jan-13 11:39 UTC

head link

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

Is the original question here not about pow(int, int), rather than
pow(float, int)? In which case the problem about rounding or precision
becomes less of an issue (of course, for larger values, there will be
overflows instead - but that's still a problem if you do `int a = pow(x, y)
where x**y is out of range for the int value).

In my experience, it's not at all unlikely to find code like `int a pow(2,
3);` in code, often with the resulting questions about "why is my
result 7, not 8", because even with 0.5 ulp, it's not (always) EXACTLY
the
integer value that comes out of the function...

--
Mats

On 13 January 2017 at 08:32, Francois Fayard via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Bare in mind that writing a very efficient x^n is not that
> straightforward. Even if it’s a corner cases, the example of x^7 is
> interesting. If you apply the divide and rule algorithm, you get the
> following algorithm:
>
> double pow_7(double x) {
>   const double x2 = x * x;
>   const double x3 = x * x2;
>   const double x6 = x3 * x3;
>   const double x7 = x * x6;
>
>   return x7;
> }
>
> But you can also write
>
> double pow_7(double x) {
>   const double x2 = x * x;
>   const double x3 = x * x2;
>   const double x4 = x2 * x2;
>   const double x7 = x3 * x4;
>
>   return x7;
> }
>
> which is a different algorithm using the same number of application. But,
> because of instruction level parallelism, they don’t get the same
> performance. In the first implementation every single instruction has to
> wait for the previous instruction to start. In the second one x3 and x4
> could be computed using instruction level parallelism as they don’t depend
> upon each other.
>
> The performance difference in between the two are (compiled with -O3 and
> -mavx2):
> - clang 3.9.1: The first one is 25% slower than the second one
> - intel 17.0.1: The first one is 05% slower than the second one
> - gcc 6.2.0: The first one is 3% slower than the first one (but both are 2
> times slower than clang and intel)
>
> Attached is the code asserting my claim.
>
> François Fayard
>
> ====>
> #include <chrono>
> #include <iostream>
>
> int main() {
>   const int n = 1000000000;
>   const int length = 10;
>   double* x = new double[length];
>   for (int i = 0; i < length; ++i) {
>     x[i] = 1.0;
>   }
>
>   asm volatile("" : : "g"(x) : "memory");
>
>   auto begin = std::chrono::high_resolution_clock::now();
>   for (int k = 0; k < n; ++k) {
>     for (int i = 0; i < length; ++i) {
>       const double x1 = x[i];
>       const double x2 = x1 * x1;
>       const double x3 = x1 * x2;
>       const double x4 = x2 * x2;
>       const double x7 = x3 * x4;
>       x[i] = x7;
>     }
>   }
>   auto end = std::chrono::high_resolution_clock::now();
>   double time_fast >       1.0e-9 *
>       std::chrono::duration_cast<std::chrono::nanoseconds>(end -
> begin).count();
>
>   begin = std::chrono::high_resolution_clock::now();
>   for (int k = 0; k < n; ++k) {
>     for (int i = 0; i < length; ++i) {
>       const double x1 = x[i];
>       const double x2 = x1 * x1;
>       const double x3 = x1 * x2;
>       const double x6 = x3 * x3;
>       const double x7 = x1 * x6;
>       x[i] = x7;
>     }
>   }
>   end = std::chrono::high_resolution_clock::now();
>   double time_divide_and_rule >       1.0e-9 *
>       std::chrono::duration_cast<std::chrono::nanoseconds>(end -
> begin).count();
>
>   std::cout << "Optimized x^7: "
>             << 1.0e9 * time_fast / (static_cast<double>(n) *
length) << "
> ns"
>             << std::endl;
>   std::cout << "  Classic x^7: "
>             << 1.0e9 * time_divide_and_rule /
(static_cast<double>(n) *
> length)
>             << " ns" << std::endl;
>
>   delete[] x;
>
>   return 0;
> }
>
> ====>
> > On Jan 9, 2017, at 6:43 PM, Wei Ding via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > Hi,
> >
> > I want an efficient way to implement function pow in LLVM instead of
> invoking pow() math built-in. For algorithm part, I am clear for the logic.
> But I am not quite sure for which parts of LLVM should I replace built-in
> pow with another efficient pow implementation. Any comments and feedback
> are appreciated. Thanks!
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170113/51caf9be/attachment.html>

llvm dev - Jan 2017 - The most efficient way to implement an integer based power function pow in LLVM

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM

[llvm-dev] The most efficient way to implement an integer based power function pow in LLVM