thr3ads.net - llvm dev - [llvm-dev] RFC: SIMD math-function library [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Naoki Shibata via llvm-dev

2016-Jul-13 11:45 UTC

[llvm-dev] RFC: SIMD math-function library

Dear LLVM contributors,

I am Naoki Shibata, an associate professor at Nara Institute of Science 
and Technology.

I and Hal Finkel would like to jointly propose to add my vectorized math 
library to LLVM.

The library has been available as public domain software for years, I am 
going to double-license the library if necessary.

********

Below is a proposal to add my vectorized math library, SLEEF [1], for
evaluating elementary functions (trigonometry, log, exp, etc.) to LLVM. 
The library can be used directly, or can be targeted by an 
autovectorization infrastructure. Patches to tie SLEEF into LLVM's 
autovectorizer have been developed by Hal Finkel as part of the bgclang 
project (which provides LLVM/Clang ported to the IBM BG/Q supercomputer 
architecture). Hal has also developed a user-facing header for the 
library, in the style of Clang's intrinsics headers, which we can use as 
part of this project. SLEEF has been used as part of bgclang in this way 
for several years.

The library currently supports several architectures:
  * x86 - SSE2, FMA4, AVX, AVX2+FMA3
  * ARM - NEON (single-precision only)
  * A pure C (scalar) version
  * Hal's version supports PowerPC/QPX.

It is faily easy to port to other architectures. The library provides 
similar functionality to Intel's Short Vector Math Library (available 
with Intel's Compiler).

Roadmap:
--------
1) Get agreement on incorporating the library.
2) Renaming the public interface to use only the
    implementation-reserved namespace (i.e. names starting with
    underscores), as is appropriate for a compiler runtime library.
3) Convert the functions to use LLVM's naming conventions (including, if
    desired, converting the source files to C++ allowing the use of function
    overloading).
4) Create and document a public interface to the library.
5) Add support for targeting the library to LLVM's autovectorizer.
6) Work with the community to port the library to other architectures.

Motivation:

Recent CPUs and GPUs have vectorized FP multipliers and adders for 
improving throughput of FP computation. In order to extract the maximum 
computation power from processors with vectorized ALUs, the software has 
to be vectorized to use SIMD data structures. It is also preferred that 
conditional branches and scatter/gather memory access are eliminated as 
much as possible. However, rewriting existing software in this fashion 
is a very hard and time consuming task that involves converting data 
structures. Thus, realization of efficient libraries and automatic 
vectorization is desired.

In this proposal, we are going to incorporate a vectorized math library,
currently named SLEEF, into LLVM runtime library. By doing this, 
elementary functions can be directly evaluated using SIMD data types. We 
can also expect extra performance improvements by allowing LLVM to 
automatically target the functions (and inline them with LTO).

Functionality of the library:

For each elementary function, the library contains subroutines for 
evaluation in single precision and double precision. Different accuracy 
of the results can be chosen for a subset of the elementary functions; 
for this subset there are versions with up to 1 ulp error and versions 
with a few ulp error. Obviously, less accurate versions are faster. 
Please note that we have 0.5 ulp maximum error when we convert a real 
number into a floating point number. In Hal's bgclang port, the less 
accurate versions are used with -ffast-math, and the more-accurate ones 
otherwise.

For non-finite inputs and outputs, the library should return the same 
results as libm. The library is tested if the evaluation error is within 
the designed limit. The library is tested against high-precision 
evaluation using the libmpfr library. Especially, we rigorously checked 
the error of the trigonometric functions when the arguments are close to 
an integral multiple of PI/2.

The size of the functions is very small.

Implementation of the library:

Basically, each function consists of reduction and kernel. For the 
kernel, a polynomial approximation is used. The coefficients are 
carefully set to minimize the number of multiplications and additions 
while reducing the error. The reduction is devised so that the same 
kernel can be used for all range of the input arguments. In order to 
improve the accuracy in the functions with 1-ulp error, double-double 
calculations are used. Use of fused multiply-add operations, which is 
quite common recently, can further improve performance of these 
functions. Some of the implementation techniques used in the library are
explained in [3].

[1] https://github.com/shibatch/sleef
[2] https://github.com/hfinkel/sleef-bgq/blob/master/simd/qpxmath.h
[3] http://ito-lab.naist.jp/~n-sibata/pdfs/isc10simd.pdf


********

Regards,

Naoki Shibata

Vedant Kumar via llvm-dev

2016-Jul-13 16:45 UTC

head link

[llvm-dev] RFC: SIMD math-function library

Hi Naoki,

SLEEF looks very promising!

Are SLEEF routines validated against libm, in addition to libmpfr? Are
performance tracking tests in place to detect execution time or code size
regressions? If these are missing, IMO it would be good to add them to the
roadmap.

best,
vedant
> On Jul 13, 2016, at 4:45 AM, Naoki Shibata via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> Dear LLVM contributors,
> 
> I am Naoki Shibata, an associate professor at Nara Institute of Science and
Technology.
> 
> I and Hal Finkel would like to jointly propose to add my vectorized math
library to LLVM.
> 
> The library has been available as public domain software for years, I am
going to double-license the library if necessary.
> 
> ********
> 
> Below is a proposal to add my vectorized math library, SLEEF [1], for
> evaluating elementary functions (trigonometry, log, exp, etc.) to LLVM. The
library can be used directly, or can be targeted by an autovectorization
infrastructure. Patches to tie SLEEF into LLVM's autovectorizer have been
developed by Hal Finkel as part of the bgclang project (which provides
LLVM/Clang ported to the IBM BG/Q supercomputer architecture). Hal has also
developed a user-facing header for the library, in the style of Clang's
intrinsics headers, which we can use as part of this project. SLEEF has been
used as part of bgclang in this way for several years.
> 
> The library currently supports several architectures:
> * x86 - SSE2, FMA4, AVX, AVX2+FMA3
> * ARM - NEON (single-precision only)
> * A pure C (scalar) version
> * Hal's version supports PowerPC/QPX.
> 
> It is faily easy to port to other architectures. The library provides
similar functionality to Intel's Short Vector Math Library (available with
Intel's Compiler).
> 
> Roadmap:
> --------
> 1) Get agreement on incorporating the library.
> 2) Renaming the public interface to use only the
>   implementation-reserved namespace (i.e. names starting with
>   underscores), as is appropriate for a compiler runtime library.
> 3) Convert the functions to use LLVM's naming conventions (including,
if
>   desired, converting the source files to C++ allowing the use of function
>   overloading).
> 4) Create and document a public interface to the library.
> 5) Add support for targeting the library to LLVM's autovectorizer.
> 6) Work with the community to port the library to other architectures.
> 
> Motivation:
> 
> Recent CPUs and GPUs have vectorized FP multipliers and adders for
improving throughput of FP computation. In order to extract the maximum
computation power from processors with vectorized ALUs, the software has to be
vectorized to use SIMD data structures. It is also preferred that conditional
branches and scatter/gather memory access are eliminated as much as possible.
However, rewriting existing software in this fashion is a very hard and time
consuming task that involves converting data structures. Thus, realization of
efficient libraries and automatic vectorization is desired.
> 
> In this proposal, we are going to incorporate a vectorized math library,
> currently named SLEEF, into LLVM runtime library. By doing this, elementary
functions can be directly evaluated using SIMD data types. We can also expect
extra performance improvements by allowing LLVM to automatically target the
functions (and inline them with LTO).
> 
> Functionality of the library:
> 
> For each elementary function, the library contains subroutines for
evaluation in single precision and double precision. Different accuracy of the
results can be chosen for a subset of the elementary functions; for this subset
there are versions with up to 1 ulp error and versions with a few ulp error.
Obviously, less accurate versions are faster. Please note that we have 0.5 ulp
maximum error when we convert a real number into a floating point number. In
Hal's bgclang port, the less accurate versions are used with -ffast-math,
and the more-accurate ones otherwise.
> 
> For non-finite inputs and outputs, the library should return the same
results as libm. The library is tested if the evaluation error is within the
designed limit. The library is tested against high-precision evaluation using
the libmpfr library. Especially, we rigorously checked the error of the
trigonometric functions when the arguments are close to an integral multiple of
PI/2.
> 
> The size of the functions is very small.
> 
> Implementation of the library:
> 
> Basically, each function consists of reduction and kernel. For the kernel,
a polynomial approximation is used. The coefficients are carefully set to
minimize the number of multiplications and additions while reducing the error.
The reduction is devised so that the same kernel can be used for all range of
the input arguments. In order to improve the accuracy in the functions with
1-ulp error, double-double calculations are used. Use of fused multiply-add
operations, which is quite common recently, can further improve performance of
these functions. Some of the implementation techniques used in the library are
> explained in [3].
> 
> [1] https://github.com/shibatch/sleef
> [2] https://github.com/hfinkel/sleef-bgq/blob/master/simd/qpxmath.h
> [3] http://ito-lab.naist.jp/~n-sibata/pdfs/isc10simd.pdf
> 
> 
> ********
> 
> Regards,
> 
> Naoki Shibata
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Naoki Shibata via llvm-dev

2016-Jul-14 08:18 UTC

head link

[llvm-dev] RFC: SIMD math-function library

Hi Vedant,

Thank you for your comment.

For checking accuracy of finite outputs and correctness of handling 
non-finite inputs and outputs, I believe validating against libmpfr is 
enough. Please tell me the kind of regressions we need to detect. Do you 
have concern on correctness of libmpfr?

What kind of execution time or code size regressions are we going to 
check? Since SLEEF is completely branch-free, there should be no serious 
execution time and code size regression unless branches are introduced.

It is of course okay for me to add additional regression checking, but I 
just want to understand the necessity.

Regards,

Naoki Shibata


On 2016/07/14 1:45, Vedant Kumar wrote:> Hi Naoki,
>
> SLEEF looks very promising!
>
> Are SLEEF routines validated against libm, in addition to libmpfr? Are
> performance tracking tests in place to detect execution time or code size
> regressions? If these are missing, IMO it would be good to add them to the
> roadmap.
>
> best,
> vedant
>
>> On Jul 13, 2016, at 4:45 AM, Naoki Shibata via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>>
>>
>> Dear LLVM contributors,
>>
>> I am Naoki Shibata, an associate professor at Nara Institute of Science
and Technology.
>>
>> I and Hal Finkel would like to jointly propose to add my vectorized
math library to LLVM.
>>
>> The library has been available as public domain software for years, I
am going to double-license the library if necessary.
>>
>> ********
>>
>> Below is a proposal to add my vectorized math library, SLEEF [1], for
>> evaluating elementary functions (trigonometry, log, exp, etc.) to LLVM.
The library can be used directly, or can be targeted by an autovectorization
infrastructure. Patches to tie SLEEF into LLVM's autovectorizer have been
developed by Hal Finkel as part of the bgclang project (which provides
LLVM/Clang ported to the IBM BG/Q supercomputer architecture). Hal has also
developed a user-facing header for the library, in the style of Clang's
intrinsics headers, which we can use as part of this project. SLEEF has been
used as part of bgclang in this way for several years.
>>
>> The library currently supports several architectures:
>> * x86 - SSE2, FMA4, AVX, AVX2+FMA3
>> * ARM - NEON (single-precision only)
>> * A pure C (scalar) version
>> * Hal's version supports PowerPC/QPX.
>>
>> It is faily easy to port to other architectures. The library provides
similar functionality to Intel's Short Vector Math Library (available with
Intel's Compiler).
>>
>> Roadmap:
>> --------
>> 1) Get agreement on incorporating the library.
>> 2) Renaming the public interface to use only the
>>   implementation-reserved namespace (i.e. names starting with
>>   underscores), as is appropriate for a compiler runtime library.
>> 3) Convert the functions to use LLVM's naming conventions
(including, if
>>   desired, converting the source files to C++ allowing the use of
function
>>   overloading).
>> 4) Create and document a public interface to the library.
>> 5) Add support for targeting the library to LLVM's autovectorizer.
>> 6) Work with the community to port the library to other architectures.
>>
>> Motivation:
>>
>> Recent CPUs and GPUs have vectorized FP multipliers and adders for
improving throughput of FP computation. In order to extract the maximum
computation power from processors with vectorized ALUs, the software has to be
vectorized to use SIMD data structures. It is also preferred that conditional
branches and scatter/gather memory access are eliminated as much as possible.
However, rewriting existing software in this fashion is a very hard and time
consuming task that involves converting data structures. Thus, realization of
efficient libraries and automatic vectorization is desired.
>>
>> In this proposal, we are going to incorporate a vectorized math
library,
>> currently named SLEEF, into LLVM runtime library. By doing this,
elementary functions can be directly evaluated using SIMD data types. We can
also expect extra performance improvements by allowing LLVM to automatically
target the functions (and inline them with LTO).
>>
>> Functionality of the library:
>>
>> For each elementary function, the library contains subroutines for
evaluation in single precision and double precision. Different accuracy of the
results can be chosen for a subset of the elementary functions; for this subset
there are versions with up to 1 ulp error and versions with a few ulp error.
Obviously, less accurate versions are faster. Please note that we have 0.5 ulp
maximum error when we convert a real number into a floating point number. In
Hal's bgclang port, the less accurate versions are used with -ffast-math,
and the more-accurate ones otherwise.
>>
>> For non-finite inputs and outputs, the library should return the same
results as libm. The library is tested if the evaluation error is within the
designed limit. The library is tested against high-precision evaluation using
the libmpfr library. Especially, we rigorously checked the error of the
trigonometric functions when the arguments are close to an integral multiple of
PI/2.
>>
>> The size of the functions is very small.
>>
>> Implementation of the library:
>>
>> Basically, each function consists of reduction and kernel. For the
kernel, a polynomial approximation is used. The coefficients are carefully set
to minimize the number of multiplications and additions while reducing the
error. The reduction is devised so that the same kernel can be used for all
range of the input arguments. In order to improve the accuracy in the functions
with 1-ulp error, double-double calculations are used. Use of fused multiply-add
operations, which is quite common recently, can further improve performance of
these functions. Some of the implementation techniques used in the library are
>> explained in [3].
>>
>> [1] https://github.com/shibatch/sleef
>> [2] https://github.com/hfinkel/sleef-bgq/blob/master/simd/qpxmath.h
>> [3] http://ito-lab.naist.jp/~n-sibata/pdfs/isc10simd.pdf
>>
>>
>> ********
>>
>> Regards,
>>
>> Naoki Shibata
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Hal Finkel via llvm-dev

2016-Jul-15 03:53 UTC

head link

[llvm-dev] RFC: SIMD math-function library

Hi again,

As this RFC implies, I've been using the SLEEF library proposed here with
Clang/LLVM for many years, and fully support its adoption into the LLVM project.

I'm CC'ing Matt and Xinmin from Intel who have started working on
contributing support for their SVML library to LLVM
(http://reviews.llvm.org/D19544), and I understand plan to contribute (some
subset of) the vector math functions themselves. I'm also excited about
Intel's planned contributions.

Here's how I currently see the situation: Regardless of what Intel
contributes, we need a solution in this space for many different architectures.
From personal experience, SLEEF is relatively easy to port to different
architectures (i.e. different vector ISAs), and has already been ported to
several. The performance is good as is the accuracy. I think it would make a
great foundation for a vector-math-function runtime library for the LLVM
project. I don't know what routines Intel is planning to contribute, or for
what architectures they're tuned, but I expect we'll want to use those
implementations on x86 platforms where appropriate.

Matt, Xinmin, what do you think?

Thanks again,
Hal

----- Original Message -----> From: "Naoki Shibata" <shibatch.sf.net at gmail.com>
> To: llvm-dev at lists.llvm.org
> Cc: "Hal Finkel" <hfinkel at anl.gov>
> Sent: Wednesday, July 13, 2016 6:45:38 AM
> Subject: RFC: SIMD math-function library
> 
> 
> Dear LLVM contributors,
> 
> I am Naoki Shibata, an associate professor at Nara Institute of
> Science
> and Technology.
> 
> I and Hal Finkel would like to jointly propose to add my vectorized
> math
> library to LLVM.
> 
> The library has been available as public domain software for years, I
> am
> going to double-license the library if necessary.
> 
> ********
> 
> Below is a proposal to add my vectorized math library, SLEEF [1], for
> evaluating elementary functions (trigonometry, log, exp, etc.) to
> LLVM.
> The library can be used directly, or can be targeted by an
> autovectorization infrastructure. Patches to tie SLEEF into LLVM's
> autovectorizer have been developed by Hal Finkel as part of the
> bgclang
> project (which provides LLVM/Clang ported to the IBM BG/Q
> supercomputer
> architecture). Hal has also developed a user-facing header for the
> library, in the style of Clang's intrinsics headers, which we can use
> as
> part of this project. SLEEF has been used as part of bgclang in this
> way
> for several years.
> 
> The library currently supports several architectures:
>   * x86 - SSE2, FMA4, AVX, AVX2+FMA3
>   * ARM - NEON (single-precision only)
>   * A pure C (scalar) version
>   * Hal's version supports PowerPC/QPX.
> 
> It is faily easy to port to other architectures. The library provides
> similar functionality to Intel's Short Vector Math Library (available
> with Intel's Compiler).
> 
> Roadmap:
> --------
> 1) Get agreement on incorporating the library.
> 2) Renaming the public interface to use only the
>     implementation-reserved namespace (i.e. names starting with
>     underscores), as is appropriate for a compiler runtime library.
> 3) Convert the functions to use LLVM's naming conventions (including,
> if
>     desired, converting the source files to C++ allowing the use of
>     function
>     overloading).
> 4) Create and document a public interface to the library.
> 5) Add support for targeting the library to LLVM's autovectorizer.
> 6) Work with the community to port the library to other
> architectures.
> 
> Motivation:
> 
> Recent CPUs and GPUs have vectorized FP multipliers and adders for
> improving throughput of FP computation. In order to extract the
> maximum
> computation power from processors with vectorized ALUs, the software
> has
> to be vectorized to use SIMD data structures. It is also preferred
> that
> conditional branches and scatter/gather memory access are eliminated
> as
> much as possible. However, rewriting existing software in this
> fashion
> is a very hard and time consuming task that involves converting data
> structures. Thus, realization of efficient libraries and automatic
> vectorization is desired.
> 
> In this proposal, we are going to incorporate a vectorized math
> library,
> currently named SLEEF, into LLVM runtime library. By doing this,
> elementary functions can be directly evaluated using SIMD data types.
> We
> can also expect extra performance improvements by allowing LLVM to
> automatically target the functions (and inline them with LTO).
> 
> Functionality of the library:
> 
> For each elementary function, the library contains subroutines for
> evaluation in single precision and double precision. Different
> accuracy
> of the results can be chosen for a subset of the elementary
> functions;
> for this subset there are versions with up to 1 ulp error and
> versions
> with a few ulp error. Obviously, less accurate versions are faster.
> Please note that we have 0.5 ulp maximum error when we convert a real
> number into a floating point number. In Hal's bgclang port, the less
> accurate versions are used with -ffast-math, and the more-accurate
> ones
> otherwise.
> 
> For non-finite inputs and outputs, the library should return the same
> results as libm. The library is tested if the evaluation error is
> within
> the designed limit. The library is tested against high-precision
> evaluation using the libmpfr library. Especially, we rigorously
> checked
> the error of the trigonometric functions when the arguments are close
> to
> an integral multiple of PI/2.
> 
> The size of the functions is very small.
> 
> Implementation of the library:
> 
> Basically, each function consists of reduction and kernel. For the
> kernel, a polynomial approximation is used. The coefficients are
> carefully set to minimize the number of multiplications and additions
> while reducing the error. The reduction is devised so that the same
> kernel can be used for all range of the input arguments. In order to
> improve the accuracy in the functions with 1-ulp error, double-double
> calculations are used. Use of fused multiply-add operations, which is
> quite common recently, can further improve performance of these
> functions. Some of the implementation techniques used in the library
> are
> explained in [3].
> 
> [1] https://github.com/shibatch/sleef
> [2] https://github.com/hfinkel/sleef-bgq/blob/master/simd/qpxmath.h
> [3] http://ito-lab.naist.jp/~n-sibata/pdfs/isc10simd.pdf
> 
> 
> ********
> 
> Regards,
> 
> Naoki Shibata
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Naoki Shibata via llvm-dev

2016-Jul-15 04:37 UTC

head link

[llvm-dev] RFC: SIMD math-function library

Hi all,

Okay, the point is whether Intel will publish the source code for their 
SVML. If Intel will make SVML open-source, there would be not much 
advantage in incorporating SLEEF into LLVM, since it would be also 
fairly easy to port SVML to other architectures. If Intel will not 
open-source SVML, then there could be advantage in using SLEEF for x86 
by inlining the functions.

Is it possible to ask the person in charge what exactly Intel is going 
to contribute?

Naoki Shibata


On 2016/07/15 12:53, Hal Finkel wrote:> Hi again,
>
> As this RFC implies, I've been using the SLEEF library proposed here
with Clang/LLVM for many years, and fully support its adoption into the LLVM
project.
>
> I'm CC'ing Matt and Xinmin from Intel who have started working on
contributing support for their SVML library to LLVM
(http://reviews.llvm.org/D19544), and I understand plan to contribute (some
subset of) the vector math functions themselves. I'm also excited about
Intel's planned contributions.
>
> Here's how I currently see the situation: Regardless of what Intel
contributes, we need a solution in this space for many different architectures.
From personal experience, SLEEF is relatively easy to port to different
architectures (i.e. different vector ISAs), and has already been ported to
several. The performance is good as is the accuracy. I think it would make a
great foundation for a vector-math-function runtime library for the LLVM
project. I don't know what routines Intel is planning to contribute, or for
what architectures they're tuned, but I expect we'll want to use those
implementations on x86 platforms where appropriate.
>
> Matt, Xinmin, what do you think?
>
> Thanks again,
> Hal

Tian, Xinmin via llvm-dev

2016-Jul-15 04:39 UTC

head link

[llvm-dev] RFC: SIMD math-function library

I agree with Hal. 

Since SLEEF library is targeted (portable) for many different architectures, it
will be a great addition to LLVM community on SIMD support for all architectures

Currently, intel open sourced 6 functions (sin, cos, pow, exp, log, and sincos) 
GCC and LLVM for x86 ( {SS2, SSSE3, SSE4.1, SSE4.2, AVX, AVX2,  MIC, AVX512} x
{mask, non-mask} ),  AVX512 open source is  to be done), the plan is to open
source most of Intel SVML library for LLVM x86 support.

For achieving "close to metal performance for x86", I assume Intel
SVML would provide a better performance and more control on accuracy for the
time being, given the SVML team had tuned the SVML for many years for all x86
architectures, and we have not done performance and accuracy comparisons on
SLEEF and SVML libraries.

In any case, I would suggest move this RFC forward and start this project. I
think Intel's SVML code for x86 can be integrated into this project for x86,
I will talk to Intel SVML library owner/stakeholders and ask them to take a look
SLEEF and provide their recommendation/suggestion related to x86 and in general.

Thanks,
Xinmin

-----Original Message-----
From: Hal Finkel [mailto:hfinkel at anl.gov] 
Sent: Thursday, July 14, 2016 8:54 PM
To: Naoki Shibata <shibatch.sf.net at gmail.com>
Cc: llvm-dev at lists.llvm.org; Chandler Carruth <chandlerc at gmail.com>;
Tian, Xinmin <xinmin.tian at intel.com>; Masten, Matt <matt.masten at
intel.com>
Subject: Re: RFC: SIMD math-function library

Hi again,

As this RFC implies, I've been using the SLEEF library proposed here with
Clang/LLVM for many years, and fully support its adoption into the LLVM project.

I'm CC'ing Matt and Xinmin from Intel who have started working on
contributing support for their SVML library to LLVM
(http://reviews.llvm.org/D19544), and I understand plan to contribute (some
subset of) the vector math functions themselves. I'm also excited about
Intel's planned contributions.

Here's how I currently see the situation: Regardless of what Intel
contributes, we need a solution in this space for many different architectures.
From personal experience, SLEEF is relatively easy to port to different
architectures (i.e. different vector ISAs), and has already been ported to
several. The performance is good as is the accuracy. I think it would make a
great foundation for a vector-math-function runtime library for the LLVM
project. I don't know what routines Intel is planning to contribute, or for
what architectures they're tuned, but I expect we'll want to use those
implementations on x86 platforms where appropriate.

Matt, Xinmin, what do you think?

Thanks again,
Hal

----- Original Message -----> From: "Naoki Shibata" <shibatch.sf.net at gmail.com>
> To: llvm-dev at lists.llvm.org
> Cc: "Hal Finkel" <hfinkel at anl.gov>
> Sent: Wednesday, July 13, 2016 6:45:38 AM
> Subject: RFC: SIMD math-function library
> 
> 
> Dear LLVM contributors,
> 
> I am Naoki Shibata, an associate professor at Nara Institute of 
> Science and Technology.
> 
> I and Hal Finkel would like to jointly propose to add my vectorized 
> math library to LLVM.
> 
> The library has been available as public domain software for years, I 
> am going to double-license the library if necessary.
> 
> ********
> 
> Below is a proposal to add my vectorized math library, SLEEF [1], for 
> evaluating elementary functions (trigonometry, log, exp, etc.) to 
> LLVM.
> The library can be used directly, or can be targeted by an 
> autovectorization infrastructure. Patches to tie SLEEF into LLVM's 
> autovectorizer have been developed by Hal Finkel as part of the 
> bgclang project (which provides LLVM/Clang ported to the IBM BG/Q 
> supercomputer architecture). Hal has also developed a user-facing 
> header for the library, in the style of Clang's intrinsics headers, 
> which we can use as part of this project. SLEEF has been used as part 
> of bgclang in this way for several years.
> 
> The library currently supports several architectures:
>   * x86 - SSE2, FMA4, AVX, AVX2+FMA3
>   * ARM - NEON (single-precision only)
>   * A pure C (scalar) version
>   * Hal's version supports PowerPC/QPX.
> 
> It is faily easy to port to other architectures. The library provides 
> similar functionality to Intel's Short Vector Math Library (available 
> with Intel's Compiler).
> 
> Roadmap:
> --------
> 1) Get agreement on incorporating the library.
> 2) Renaming the public interface to use only the
>     implementation-reserved namespace (i.e. names starting with
>     underscores), as is appropriate for a compiler runtime library.
> 3) Convert the functions to use LLVM's naming conventions (including, 
> if
>     desired, converting the source files to C++ allowing the use of
>     function
>     overloading).
> 4) Create and document a public interface to the library.
> 5) Add support for targeting the library to LLVM's autovectorizer.
> 6) Work with the community to port the library to other architectures.
> 
> Motivation:
> 
> Recent CPUs and GPUs have vectorized FP multipliers and adders for 
> improving throughput of FP computation. In order to extract the 
> maximum computation power from processors with vectorized ALUs, the 
> software has to be vectorized to use SIMD data structures. It is also 
> preferred that conditional branches and scatter/gather memory access 
> are eliminated as much as possible. However, rewriting existing 
> software in this fashion is a very hard and time consuming task that 
> involves converting data structures. Thus, realization of efficient 
> libraries and automatic vectorization is desired.
> 
> In this proposal, we are going to incorporate a vectorized math 
> library, currently named SLEEF, into LLVM runtime library. By doing 
> this, elementary functions can be directly evaluated using SIMD data 
> types.
> We
> can also expect extra performance improvements by allowing LLVM to 
> automatically target the functions (and inline them with LTO).
> 
> Functionality of the library:
> 
> For each elementary function, the library contains subroutines for 
> evaluation in single precision and double precision. Different 
> accuracy of the results can be chosen for a subset of the elementary 
> functions; for this subset there are versions with up to 1 ulp error 
> and versions with a few ulp error. Obviously, less accurate versions 
> are faster.
> Please note that we have 0.5 ulp maximum error when we convert a real 
> number into a floating point number. In Hal's bgclang port, the less 
> accurate versions are used with -ffast-math, and the more-accurate 
> ones otherwise.
> 
> For non-finite inputs and outputs, the library should return the same 
> results as libm. The library is tested if the evaluation error is 
> within the designed limit. The library is tested against 
> high-precision evaluation using the libmpfr library. Especially, we 
> rigorously checked the error of the trigonometric functions when the 
> arguments are close to an integral multiple of PI/2.
> 
> The size of the functions is very small.
> 
> Implementation of the library:
> 
> Basically, each function consists of reduction and kernel. For the 
> kernel, a polynomial approximation is used. The coefficients are 
> carefully set to minimize the number of multiplications and additions 
> while reducing the error. The reduction is devised so that the same 
> kernel can be used for all range of the input arguments. In order to 
> improve the accuracy in the functions with 1-ulp error, double-double 
> calculations are used. Use of fused multiply-add operations, which is 
> quite common recently, can further improve performance of these 
> functions. Some of the implementation techniques used in the library 
> are explained in [3].
> 
> [1] https://github.com/shibatch/sleef
> [2] https://github.com/hfinkel/sleef-bgq/blob/master/simd/qpxmath.h
> [3] http://ito-lab.naist.jp/~n-sibata/pdfs/isc10simd.pdf
> 
> 
> ********
> 
> Regards,
> 
> Naoki Shibata
> 
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Naoki Shibata via llvm-dev

2016-Jul-15 04:45 UTC

head link

[llvm-dev] RFC: SIMD math-function library

Hi Martin,

Thank you for your comment.

It is of course possible to rewrite SLEEF in more generic way, and 
actually I once tried to do that using the vector data type in GCC. But 
the code generated from such source code was far less efficient than the 
version with explicit SIMD intrinsics.

Adding typedefs to specify the exact types is possible.

Regards,

Naoki Shibata


On 2016/07/14 18:25, Martin J. O'Riordan wrote:> Having support for vector equivalents to the ISO C math functions is very
valuable, and this kind of work of great benefit.
>
> There are a couple things though that concern me about this proposal:
>
> 1.  OpenCL C already provides a vector math binding that for the most
>     part provides this equivalence.  It also supports vectors of
>     multiple types through overloading.  Perhaps it might be possible
>     to align SLEEF with OpenCL C?
>
> 2.  There are hard assumptions about how 'float', 'double'
and 'long
>     double' are implemented.  Libraries with these kinds of hard-wired
>     assumptions (including 'compiler-rt') cause me a lot of trouble
to
>     port to our platform which is at variance with these common
>     assumptions.
>
>     So I would suggest that the implementation uses  typedefs to
>     specifically bind to the type that provides the specific FP
>     precision required.
>
>     CLang supports the IEEE FP16, FP32, FP64, FP128 types, which can
>     be bound to each of the higher level C types.  Our architecture
>     binds these as FP16 for '__fp16' aka 'half', FP32 for
'float' AND
>     for 'double' and FP64 for 'long double'.  There is no
hardware
>     support for FP64, so having 'float' and 'double' be
FP32 is
>     important to avoid the costly consequences of usual arithmetic
>     conversions in C.
>
>     Using specific synonyms would greatly enhance the portability of
>     the library implementation.  For example 'float32_t' instead of
>     'float' - pity C/C++ don't have these as Standard yet.
>
> Thanks,
>
> 	MartinO
>
> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Naoki Shibata via llvm-dev
> Sent: 14 July 2016 09:18
> To: Vedant Kumar <vsk at apple.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] RFC: SIMD math-function library
>
>
> Hi Vedant,
>
> Thank you for your comment.
>
> For checking accuracy of finite outputs and correctness of handling
non-finite inputs and outputs, I believe validating against libmpfr is enough.
Please tell me the kind of regressions we need to detect. Do you have concern on
correctness of libmpfr?
>
> What kind of execution time or code size regressions are we going to check?
Since SLEEF is completely branch-free, there should be no serious execution time
and code size regression unless branches are introduced.
>
> It is of course okay for me to add additional regression checking, but I
just want to understand the necessity.
>
> Regards,
>
> Naoki Shibata
>
>
> On 2016/07/14 1:45, Vedant Kumar wrote:
>> Hi Naoki,
>>
>> SLEEF looks very promising!
>>
>> Are SLEEF routines validated against libm, in addition to libmpfr? Are
>> performance tracking tests in place to detect execution time or code
>> size regressions? If these are missing, IMO it would be good to add
>> them to the roadmap.
>>
>> best,
>> vedant
>>
>>> On Jul 13, 2016, at 4:45 AM, Naoki Shibata via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>>
>>>
>>> Dear LLVM contributors,
>>>
>>> I am Naoki Shibata, an associate professor at Nara Institute of
Science and Technology.
>>>
>>> I and Hal Finkel would like to jointly propose to add my vectorized
math library to LLVM.
>>>
>>> The library has been available as public domain software for years,
I am going to double-license the library if necessary.
>>>
>>> ********
>>>
>>> Below is a proposal to add my vectorized math library, SLEEF [1],
for
>>> evaluating elementary functions (trigonometry, log, exp, etc.) to
LLVM. The library can be used directly, or can be targeted by an
autovectorization infrastructure. Patches to tie SLEEF into LLVM's
autovectorizer have been developed by Hal Finkel as part of the bgclang project
(which provides LLVM/Clang ported to the IBM BG/Q supercomputer architecture).
Hal has also developed a user-facing header for the library, in the style of
Clang's intrinsics headers, which we can use as part of this project. SLEEF
has been used as part of bgclang in this way for several years.
>>>
>>> The library currently supports several architectures:
>>> * x86 - SSE2, FMA4, AVX, AVX2+FMA3
>>> * ARM - NEON (single-precision only)
>>> * A pure C (scalar) version
>>> * Hal's version supports PowerPC/QPX.
>>>
>>> It is faily easy to port to other architectures. The library
provides similar functionality to Intel's Short Vector Math Library
(available with Intel's Compiler).
>>>
>>> Roadmap:
>>> --------
>>> 1) Get agreement on incorporating the library.
>>> 2) Renaming the public interface to use only the
>>>   implementation-reserved namespace (i.e. names starting with
>>>   underscores), as is appropriate for a compiler runtime library.
>>> 3) Convert the functions to use LLVM's naming conventions
(including, if
>>>   desired, converting the source files to C++ allowing the use of
function
>>>   overloading).
>>> 4) Create and document a public interface to the library.
>>> 5) Add support for targeting the library to LLVM's
autovectorizer.
>>> 6) Work with the community to port the library to other
architectures.
>>>
>>> Motivation:
>>>
>>> Recent CPUs and GPUs have vectorized FP multipliers and adders for
improving throughput of FP computation. In order to extract the maximum
computation power from processors with vectorized ALUs, the software has to be
vectorized to use SIMD data structures. It is also preferred that conditional
branches and scatter/gather memory access are eliminated as much as possible.
However, rewriting existing software in this fashion is a very hard and time
consuming task that involves converting data structures. Thus, realization of
efficient libraries and automatic vectorization is desired.
>>>
>>> In this proposal, we are going to incorporate a vectorized math
>>> library, currently named SLEEF, into LLVM runtime library. By doing
this, elementary functions can be directly evaluated using SIMD data types. We
can also expect extra performance improvements by allowing LLVM to automatically
target the functions (and inline them with LTO).
>>>
>>> Functionality of the library:
>>>
>>> For each elementary function, the library contains subroutines for
evaluation in single precision and double precision. Different accuracy of the
results can be chosen for a subset of the elementary functions; for this subset
there are versions with up to 1 ulp error and versions with a few ulp error.
Obviously, less accurate versions are faster. Please note that we have 0.5 ulp
maximum error when we convert a real number into a floating point number. In
Hal's bgclang port, the less accurate versions are used with -ffast-math,
and the more-accurate ones otherwise.
>>>
>>> For non-finite inputs and outputs, the library should return the
same results as libm. The library is tested if the evaluation error is within
the designed limit. The library is tested against high-precision evaluation
using the libmpfr library. Especially, we rigorously checked the error of the
trigonometric functions when the arguments are close to an integral multiple of
PI/2.
>>>
>>> The size of the functions is very small.
>>>
>>> Implementation of the library:
>>>
>>> Basically, each function consists of reduction and kernel. For the
>>> kernel, a polynomial approximation is used. The coefficients are
carefully set to minimize the number of multiplications and additions while
reducing the error. The reduction is devised so that the same kernel can be used
for all range of the input arguments. In order to improve the accuracy in the
functions with 1-ulp error, double-double calculations are used. Use of fused
multiply-add operations, which is quite common recently, can further improve
performance of these functions. Some of the implementation techniques used in
the library are explained in [3].
>>>
>>> [1] https://github.com/shibatch/sleef [2]
>>> https://github.com/hfinkel/sleef-bgq/blob/master/simd/qpxmath.h
>>> [3] http://ito-lab.naist.jp/~n-sibata/pdfs/isc10simd.pdf
>>>
>>>
>>> ********
>>>
>>> Regards,
>>>
>>> Naoki Shibata
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Martin J. O'Riordan via llvm-dev

2016-Jul-15 06:09 UTC

head link

[llvm-dev] RFC: SIMD math-function library

I am looking forward to porting it to our platform, I know that this will be
significant benefit.

We support 'v8f16' and v4f32' FP vector types natively, and having
this library provide the optimised math functions for them will definitely be
very useful.

All the best,

	MartinO

-----Original Message-----
From: Naoki Shibata [mailto:shibatch.sf.net at gmail.com] 
Sent: 15 July 2016 05:46
To: Martin.ORiordan at Movidius.com; 'Vedant Kumar' <vsk at
apple.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] RFC: SIMD math-function library


Hi Martin,

Thank you for your comment.

It is of course possible to rewrite SLEEF in more generic way, and actually I
once tried to do that using the vector data type in GCC. But the code generated
from such source code was far less efficient than the version with explicit SIMD
intrinsics.

Adding typedefs to specify the exact types is possible.

Regards,

Naoki Shibata


On 2016/07/14 18:25, Martin J. O'Riordan wrote:> Having support for vector equivalents to the ISO C math functions is very
valuable, and this kind of work of great benefit.
>
> There are a couple things though that concern me about this proposal:
>
> 1.  OpenCL C already provides a vector math binding that for the most
>     part provides this equivalence.  It also supports vectors of
>     multiple types through overloading.  Perhaps it might be possible
>     to align SLEEF with OpenCL C?
>
> 2.  There are hard assumptions about how 'float', 'double'
and 'long
>     double' are implemented.  Libraries with these kinds of hard-wired
>     assumptions (including 'compiler-rt') cause me a lot of trouble
to
>     port to our platform which is at variance with these common
>     assumptions.
>
>     So I would suggest that the implementation uses  typedefs to
>     specifically bind to the type that provides the specific FP
>     precision required.
>
>     CLang supports the IEEE FP16, FP32, FP64, FP128 types, which can
>     be bound to each of the higher level C types.  Our architecture
>     binds these as FP16 for '__fp16' aka 'half', FP32 for
'float' AND
>     for 'double' and FP64 for 'long double'.  There is no
hardware
>     support for FP64, so having 'float' and 'double' be
FP32 is
>     important to avoid the costly consequences of usual arithmetic
>     conversions in C.
>
>     Using specific synonyms would greatly enhance the portability of
>     the library implementation.  For example 'float32_t' instead of
>     'float' - pity C/C++ don't have these as Standard yet.
>
> Thanks,
>
> 	MartinO
>
> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of 
> Naoki Shibata via llvm-dev
> Sent: 14 July 2016 09:18
> To: Vedant Kumar <vsk at apple.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] RFC: SIMD math-function library
>
>
> Hi Vedant,
>
> Thank you for your comment.
>
> For checking accuracy of finite outputs and correctness of handling
non-finite inputs and outputs, I believe validating against libmpfr is enough.
Please tell me the kind of regressions we need to detect. Do you have concern on
correctness of libmpfr?
>
> What kind of execution time or code size regressions are we going to check?
Since SLEEF is completely branch-free, there should be no serious execution time
and code size regression unless branches are introduced.
>
> It is of course okay for me to add additional regression checking, but I
just want to understand the necessity.
>
> Regards,
>
> Naoki Shibata
>
>
> On 2016/07/14 1:45, Vedant Kumar wrote:
>> Hi Naoki,
>>
>> SLEEF looks very promising!
>>
>> Are SLEEF routines validated against libm, in addition to libmpfr? 
>> Are performance tracking tests in place to detect execution time or 
>> code size regressions? If these are missing, IMO it would be good to 
>> add them to the roadmap.
>>
>> best,
>> vedant
>>
>>> On Jul 13, 2016, at 4:45 AM, Naoki Shibata via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>>
>>>
>>> Dear LLVM contributors,
>>>
>>> I am Naoki Shibata, an associate professor at Nara Institute of
Science and Technology.
>>>
>>> I and Hal Finkel would like to jointly propose to add my vectorized
math library to LLVM.
>>>
>>> The library has been available as public domain software for years,
I am going to double-license the library if necessary.
>>>
>>> ********
>>>
>>> Below is a proposal to add my vectorized math library, SLEEF [1], 
>>> for evaluating elementary functions (trigonometry, log, exp, etc.)
to LLVM. The library can be used directly, or can be targeted by an
autovectorization infrastructure. Patches to tie SLEEF into LLVM's
autovectorizer have been developed by Hal Finkel as part of the bgclang project
(which provides LLVM/Clang ported to the IBM BG/Q supercomputer architecture).
Hal has also developed a user-facing header for the library, in the style of
Clang's intrinsics headers, which we can use as part of this project. SLEEF
has been used as part of bgclang in this way for several years.
>>>
>>> The library currently supports several architectures:
>>> * x86 - SSE2, FMA4, AVX, AVX2+FMA3
>>> * ARM - NEON (single-precision only)
>>> * A pure C (scalar) version
>>> * Hal's version supports PowerPC/QPX.
>>>
>>> It is faily easy to port to other architectures. The library
provides similar functionality to Intel's Short Vector Math Library
(available with Intel's Compiler).
>>>
>>> Roadmap:
>>> --------
>>> 1) Get agreement on incorporating the library.
>>> 2) Renaming the public interface to use only the
>>>   implementation-reserved namespace (i.e. names starting with
>>>   underscores), as is appropriate for a compiler runtime library.
>>> 3) Convert the functions to use LLVM's naming conventions
(including, if
>>>   desired, converting the source files to C++ allowing the use of
function
>>>   overloading).
>>> 4) Create and document a public interface to the library.
>>> 5) Add support for targeting the library to LLVM's
autovectorizer.
>>> 6) Work with the community to port the library to other
architectures.
>>>
>>> Motivation:
>>>
>>> Recent CPUs and GPUs have vectorized FP multipliers and adders for
improving throughput of FP computation. In order to extract the maximum
computation power from processors with vectorized ALUs, the software has to be
vectorized to use SIMD data structures. It is also preferred that conditional
branches and scatter/gather memory access are eliminated as much as possible.
However, rewriting existing software in this fashion is a very hard and time
consuming task that involves converting data structures. Thus, realization of
efficient libraries and automatic vectorization is desired.
>>>
>>> In this proposal, we are going to incorporate a vectorized math 
>>> library, currently named SLEEF, into LLVM runtime library. By doing
this, elementary functions can be directly evaluated using SIMD data types. We
can also expect extra performance improvements by allowing LLVM to automatically
target the functions (and inline them with LTO).
>>>
>>> Functionality of the library:
>>>
>>> For each elementary function, the library contains subroutines for
evaluation in single precision and double precision. Different accuracy of the
results can be chosen for a subset of the elementary functions; for this subset
there are versions with up to 1 ulp error and versions with a few ulp error.
Obviously, less accurate versions are faster. Please note that we have 0.5 ulp
maximum error when we convert a real number into a floating point number. In
Hal's bgclang port, the less accurate versions are used with -ffast-math,
and the more-accurate ones otherwise.
>>>
>>> For non-finite inputs and outputs, the library should return the
same results as libm. The library is tested if the evaluation error is within
the designed limit. The library is tested against high-precision evaluation
using the libmpfr library. Especially, we rigorously checked the error of the
trigonometric functions when the arguments are close to an integral multiple of
PI/2.
>>>
>>> The size of the functions is very small.
>>>
>>> Implementation of the library:
>>>
>>> Basically, each function consists of reduction and kernel. For the 
>>> kernel, a polynomial approximation is used. The coefficients are
carefully set to minimize the number of multiplications and additions while
reducing the error. The reduction is devised so that the same kernel can be used
for all range of the input arguments. In order to improve the accuracy in the
functions with 1-ulp error, double-double calculations are used. Use of fused
multiply-add operations, which is quite common recently, can further improve
performance of these functions. Some of the implementation techniques used in
the library are explained in [3].
>>>
>>> [1] https://github.com/shibatch/sleef [2] 
>>> https://github.com/hfinkel/sleef-bgq/blob/master/simd/qpxmath.h
>>> [3] http://ito-lab.naist.jp/~n-sibata/pdfs/isc10simd.pdf
>>>
>>>
>>> ********
>>>
>>> Regards,
>>>
>>> Naoki Shibata
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

llvm dev - Jul 2016 - RFC: SIMD math-function library

[llvm-dev] RFC: SIMD math-function library

[llvm-dev] RFC: SIMD math-function library

[llvm-dev] RFC: SIMD math-function library

[llvm-dev] RFC: SIMD math-function library

[llvm-dev] RFC: SIMD math-function library

[llvm-dev] RFC: SIMD math-function library

[llvm-dev] RFC: SIMD math-function library

[llvm-dev] RFC: SIMD math-function library