Francesco Petrogalli via llvm-dev
2018-Nov-30  05:26 UTC
[llvm-dev] [RFC] Re-implementing -fveclib with OpenMP
Hi all, I am submitting the following RFC [1] to re-implement -fveclib via OpenMP constructs. The RFC was discussed during a round table at the last LLVM developer meeting, and presented during the BoF [2]. The proposal is published on Phabricator, for the purpose of keeping track of the comments, and it now ready for a review from a wider audience after being polished by Hal Finkel and Hideki Saito (thank you!). Kind regards, Francesco [1] https://reviews.llvm.org/D54412 [2] https://llvm.org/devmtg/2018-10/talk-abstracts.html#bof7 IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Francesco Petrogalli via llvm-dev
2018-Dec-12  03:47 UTC
[llvm-dev] [RFC] Re-implementing -fveclib with OpenMP
Hi all, I have been asked to include the RFC into the email message.
Here it goes.
Kind regards,
Francesco
—————————————————————————————————————————————————————————
Introduction
===========
This RFC encompass the proposal of replacing the current
`TargetLibraryInfo` (TLI) based implementation of the command line
`-fveclib` with an OpenMP based one.
With this change, `-fveclib` will maintain its current behavior in terms
of user experience, but the new implementation will additionally:
1.  Decouples the compiler front-end that knows about the availability
    of vectorized routines, from the back-end that knows how to make use
    of them.
2.  Enable support for a developer's own vector libraries without
    requiring changes to the compiler, via the new `-fveclib-include`
    command line option.
3.  Enables other frontends and languages to add scalar-to-vector
    function mappings as relevant for their own runtime libraries, etc.
The implementation of the proposal will consists of the following
components:
1.  [Changes in LLVM IR](#llvmIR) to provide information about the
    availability of vector math functions via metadata attached to an
    `llvm::CallInst`.
2.  [An infrastructure](#infrastructure) that can be queried to retrive
    information about the available vector functions associated to a
    `llvm::CallInst`.
3.  [Changes in the LoopVectorizer](#LV) to use the API to query the
    metadata.
4.  [Changes in clang](#mathdoth) to add the metadata in the IR via two
    mechanisms:
    1.  A custom `math.h` header file shipped with the compiler.
    2.  A user header file distributed with the library, to be used with
        the command line option `-fveclib-include`.
5.  [Changes in the clang driver](#driver) to translate `-fveclib` in a
    combination of flags that enable the generation of the
    library-specific flags needed to select the list of available vector
    functions specified in any of the header files.
Current status of `-fveclib`
===========================
User interface
--------------
At the moment, a user can invoke `-fveclib` to generate vector calls
from two libraries, SVML and Accelerate, as follows:
    $> clang -fveclib=[SVML|Accelerate]
Interface with the loop vectorizer
----------------------------------
The TLI exposes an interface that enables querying the list of available
mappings by scalar name and number of lanes needed. The TLI interface is
currently used by the InnerLoopVectorizer to plant vector calls in
auto-vectorized loops.
Extending `-fveclib`
--------------------
Adding new libraries require listing the mapping in
`<llvm>/lib/Analysis/TargetLibraryInfo.cpp`, plus modifying the clang
front-end to handle the new value for the option - see for example the
two patches to add SLEEF (<http://sleef.org>) as a target library for
AArch64: <https://reviews.llvm.org/D53927> (LLVM code-base) and
<https://reviews.llvm.org/D53928> (clang code-base).
Limitations of the current implementation
-----------------------------------------
The mapping between scalar to vector version of a function is defined by
the backend, within the TLI specifically. For this reason the frontend's
-fveclib option is tied to the backend's support for the, often language
dependent, library. In particular, an IR file that is generated with a
version of clang that knows about the availability of library `X`, needs
to be processed by a backend end that also needs to know about the
availability of library `X`.
Proposed changes
===============
We propose an implementation of `-fveclib` that makes uses of a *veclib
specific* pragma that is based on the OpenMP `declare simd` and
`declare variant` mechanism to inform the backend components about the
availability of vector version of scalar functions found in IR. The
mechanism relies in storing such information in IR metadata, and
therefore makes the auto-vectorization of function calls a mid-end
(`opt`) process that is independent on the front-end that generated such
IR metadata.
Moreover, this implementation enhances the extendibility and portability
of `-fveclib` to other libraries and front-ends, and it provides a
generic mechanism that the users of the LLVM compiler will be able to
use for interfacing their own vector routines for generic code.
The proposed implementation can also be used to expose
vectorization-specific descriptors -- for example, like the `linear` and
`uniform` clauses of the OpenMP `declare simd` directive -- that could
be used to finely tune the automatic vectorization of some functions
(think for example the vectorization of
`double sincos(double , double *, double *)`, where `linear` can be used
to give extra information about the memory layout of the 2 pointers
parameters in the vector version).
The new proposed `#pragma` directive are:
1.  `#pragma veclib declare simd`.
2.  `#pragma veclib declare variant`.
Both directive follows the syntax of the `declare simd` and the
`declare variant` directives of OpenMP, with the exception that
`declare variant` is used only for the `simd` context.
We define a new `veclib`-only directive instead of using the `omp` ones
of OpenMP for the following reasons:
1.  Allow the compiler to perform auto-vectorization outside of an
    OpenMP SIMD context.
2.  Allow library vendors to provide standard mechanism, based on
    OpenMP, to inform the compiler about the availability of vector
    functions that can be used for auto-vectorization.
A new compiler option, `-fparse-veclib`, is added to clang to enable
parsing of the `veclib` directive outside an OpenMP context.
OpenMP compatibility
--------------------
Note that the `veclib` pragma can be converted to the standard OpenMP
one by the following pre-processor test.
    #ifdef _OPENMP
    #define veclib omp
    #endif
Notice also that the `veclib simd` and `veclib variant` directive can be
parsed with the same infrastructure used for the OpenMP correspondents.
In the following RFC, we will describe how the compiler behaves when
parsing a `veclib` pragma. The same behavior is obtained when parsing
the OpenMP based one when the compiler is invoked with the comman line
options that enable OpenMP (`-fopenmp[-simd]`).
Changes in LLVM IR {#llvmIR}
------------------
The IR is enriched with metadata that details the availability of vector
versions of an associated scalar function. This metadata is attached to
the call site of the scalar function.
The metadata takes the form of an attribute containing a comma separated
list of vector function mappings. Each entry has a unique name that
follows the Vector Function ABI[^1] and real name that is used when
generating calls to this vector function.
    vfunc_name1(real_name1), vfunc_name2(real_name2)
The Vector Function ABI name describes the signature of the vector
function so that properties like vectorisation factor can be queried
during compilation.
The real name is optional and assumed to match the vector function ABI
name when omitted.
For example, the availability of a 2-lane double precision `sin`
function via SVML when targeting AVX on x86 is provided by the following
IR.
    // ...
    ... = call double @sin(double) #0
    // ...
    #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2),
                              _ZGVdN4v_sin(__svml_sin4),
                              ..."} }
The string `"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant
attribute provides information on the shape of the vector function via
the string `_ZGVcN2v_sin`, mangled according to the Vector Function ABI
for Intel, and remaps the standard Vector Function ABI name to the
non-standard name `__svml_sin2`.
This metadata is compatible with the proposal "Proposal for function
vectorization and loop vectorization with function calls",[^2] that uses
Vector Function ABI mangled names to inform the vectorizer about the
availability of vector functions. The proposal extends the original by
allowing the explicit mapping of the Vector Function ABI mangled name to
a non-standard name, which allows the use of existing vector libraries.
The `vector-variant` attribute needs to be attached on a per-call basis
to avoid conflicts when merging modules with different vector variants.
The query infrastructure: SVFS {#infrastructure}
------------------------------
The Search Vector Function System (SVFS) is constructed from an
`llvm::Module` instance so it can create function definitions. The SVFS
exposes an API with two methods.
### `SVFS::isFunctionVectorizable`
This method queries the avilability of a vectorized version of a
function. The signature of the method is as follows.
    bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeSet Params);
The method determine the availability of vector version of the function
invoked by the `Call` parameter by looking at the `vector-variant`
metadata.
The `Params` argument is a set mapping the position of a parameter in
the CallInst to its `ParameterType` descriptor. The `ParameterType`
descriptor holds information about the shape of the correspondend
parameter in the signature of the vector function. This `ParamaterType`
is used to query the SVMS about the availability of vector version that
have `linear` or `uniform` parameters (in the sense of OpenMP 4.0 and
onwards).
The method we propose, when invoked with an empty `ParTypeSet`, is
equivalent to the `TargetLibraryInfo` method
`isFunctionVectorizable(StrinRef Name)`
### `SVFS::getVectorizedFunction`
This method returns the vector function declaration that correspond to
the needs of the vectorization technique that is being run.
The signature of the function is as follows.
    std::pair<llvm::FunctionType *, std::string> getVectorizedFunction(
      llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params);
The `Call` parameter is the call instance that is being vectorized, the
`VF` parameter represent the vectorization factor (how many lanes), the
`IsMasked` parameter decides whether or not the signature of the vector
function is required to have a mask parameter, the `Params` parameter
describes the shape of the vector function as in the
`isFunctionVectorizable` method.
The methods uses the `vector-variant` metadata and returns the function
signature and the name of the function based on the input parameters.
The SVFS can add new function definitions, in the same module as the
`Call`, to provide vector functions that are not present within the
vector-variant metadata. For example, if a library provides a vector
version of a function with a vectorization factor of 2, but the
vectorizer is requesting a vectorization factor of 4, the SVFS is
allowed to create a definition that calls the 2-lane version (provided
by the library) twice. This capability applies similarly for providing
masked and unmasked versions when the request doesn't match what is
available in the library.
This method is equivalent to the TLI method
`StringRef getVectorizedFunction(StringRef F, unsigned VF) const;`.
Notice that to fully support OpenMP vectorization we need to think about
a fuzzy matching mechanism that is able to select a candidate in the
calling context. However, this is not needed for `-fveclib` because the
scalar-to-vector mappings of `-fveclib` are such that for every scalar
function there is only one possible vector function associated.
Therefore, extending this behavior to a generic one is an aspect of the
implementation that will be treated in a separate RFC about the
vectorization pass.
### Scalable vectorization
Both methods of the SVFS API will be extended with a boolean parameter
to specify whether scalable signatures are needed by the user of the
SVFS.
Changes in the LoopVectorizer {#LV}
-----------------------------
The LoopVectorizer and the related analysis passes will have to replace
the TLI version of `isFunctionVectorizable` and `getVectorizedFunction`
with the SVFS ones.
Changes in clang: shipping `math.h` with the compiler {#mathdoth}
-----------------------------------------------------
We use clang to generate the metadata described above. The functions
available in library `X` are listed in a custom `math.h` file that is
shipped with the compiler in `<clang>/lib/Headers/math.h`. The header
file is implemented by including "once" the system `math.h` file,
followed by `#ifdef` guarded re-declarations of the functions enriched
with `#pragma veclib declare simd` directives.
    #include_once <math.h>
    // ... cpp extern "C" guards omitted
    #ifdef _CLANG_USE_LIBRARY_X
    #pragma veclib declare simd simdlen(4) notinbranch
    extern double sin(double);
    #endif
This generates the vector Function ABI mangled name to be used in the
`vector-variant` attribute, for example `_ZGVcN2v_sin`, when targeting
AVX code generation.
The part of the vector-variant attribute that redirects the call to
`__svml_sin2` is also added via the header file `math.h`, by using the
OpenMP 5.0 directive `declare variant`,[^3] guarded by SVML specific
preprocessor macros:
    #ifdef _CLANG_USE_SVML
    #pragma veclib declare simd simdlen(4) notinbranch
    extern double sin(double);
    #pragma veclib declare variant(double sin(double)) \
    match(construct=simd(simdlen(4),notinbranch), device={isa(avx2)})
    __m256d __svml_sin4(__m256d x);
    #endif
Note that the list of if-guarded function declaration do not need to
leave in the same `math.h` file, but can be included in `math.h` from
library-specific header files.
Changes in the clang driver {#driver}
---------------------------
To enable the information provided via `math.h`, the clang driver will
translate the `-fveclib=X` option into `-D_CLANG_USE_LIBRARY_X -lX` to
turn on the correct section of the header file and the flag for the
linker.
Note that the `veclib` directives are loaded even when *not* compiling
for an OpenMP target.
Extending auto-vectorization capabilities of LLVM
================================================
When compared to the TLI-based auto-vectorization mechanism, the
OpenMP-based mechanism has the advantage of enabling users to provide
their own vector routines (not just the math ones) by adding
`veclib declare simd` and `veclib declare variant` definitions in their
source.
For this specific functionality, the following command line option is
added to clang:
    -fveclib-include=path/to/header/file.h
This options enable clang to recognize the `veclib declare simd` and
`veclib declare variant` directive listed in the library of the header
file.
Summary
======
New `veclib` directives in clang
--------------------------------
1.  `#pragma veclib declare simd [clause, ]`, same as
    `#pragma omp declare simd` from OpenMP 4.0+.
2.  `#pragma omp declare variant`, same as `#pragma omp declare variant`
    restricted to the `simd` context selector, from OpenMP 5.0+.
New `math.h` header file
------------------------
Shipped in `<clang>/lib/Headers/math.h`, contains all the declaration of
the functions available in the vector library `X`, `ifdef` guarded by
the macro `__CLANG_ENABLE_LIBRARY_X`.
Option behavior, and interaction with OpenMP
--------------------------------------------
The behavior described below makes sure that \`-fveclib\`\` function
vectorization and OpenMP function vectorization are orthogonal.
No options
:   No function vectorization via vector library, neither user provided
    or shipped via an internal `math.h`.
`-fveclib=X`
:   The driver transform this into
    `-fparse-veclib -D__CLANG_ENABLE_LIBRARY_X=1 -lX`. This is used only
    for users that want to vectorize `math.h` functions.
`-fveclib-include=path/to/user/provided/header/file.h`
:   The driver transform this into
    `-fparse-veclib -include=path/to/user/provided/header/file.h`. The
    user has to provide the correct linker flag for both the scalar
    version and the vector version of whatever function they have
    defined in the header file. The header file must use the `veclib`
    directive to inform the compiler about the available vector
    functions.
`-fopenmp[-simd]`
:   No vectorization happens other then for those functions that are
    marked with OpenMP declare simd. The header `math.h` is loaded, but
    the `veclib` decorated declarations are invisible to the compiler
    instance because hidden behind the `__CLANG_ENABLE_LIBRARY_X`
    macros, which are not defined.
`-fopenmp[-simd] -fveclib=X` or
`-fopenmp[-simd] -fveclib-include=path/to/user/provided/header/file.h`
: Same behavior as without the `-fopenmp[-simd]` option.
[^1]: Vector Funcion ABI for x86:
    <https://software.intel.com/en-us/articles/vector-simd-function-abi>.
    Vector Function ABI for AArch64:
   
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi
[^2]: <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
[^3]:
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
> On Nov 29, 2018, at 11:26 PM, Francesco Petrogalli <Francesco.Petrogalli
at arm.com> wrote:
> 
> Hi all,
> 
> I am submitting the following RFC [1] to re-implement -fveclib via OpenMP
constructs. The RFC was discussed during a round table at the last LLVM
developer meeting, and presented during the BoF [2].
> 
> The proposal is published on Phabricator, for the purpose of keeping track
of the comments, and it now ready for a review from a wider audience after being
polished by Hal Finkel and Hideki Saito (thank you!).
> 
> Kind regards,
> 
> Francesco
> 
> [1] https://reviews.llvm.org/D54412
> [2] https://llvm.org/devmtg/2018-10/talk-abstracts.html#bof7
Renato Golin via llvm-dev
2018-Dec-12  21:58 UTC
[llvm-dev] [RFC] Re-implementing -fveclib with OpenMP
Hi Francesco, This is a huge RFC and I don't think we can discuss all of it at the same time, at least not in a constructive manner. What ends up happening is that people ignore the thread and developers get upset. So, I'll start with the summary, to make sure the overall assumptions in the RFC match the ones I have about it, then we can delve into details. I also think we should not discuss user-include files now. Whatever we define for the standard ones will work for user driven ones, but user driven have additional complexities that will only get in the way of the standard discussion. Comments inline... On Wed, 12 Dec 2018 at 03:47, Francesco Petrogalli via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Summary > ======> > New `veclib` directives in clangI know this is not new but, why "fveclib"? From the review, I take this is the same as GCC's "mveclibabi", and if it is, why come up with a new name for the same thing? If it's not, what justifies implementing a different way of handling the same concept (vector math libraries), which is surely going to confuse a lot of users. In some reviews, it was said that some proprietary compilers already use "fveclib", but between being coherent with other OSS compilers and closed source compilers, I think the answer is clear. I'm not against the name, I'm just making sure we're not creating problem for ourselves.> -------------------------------- > > 1. `#pragma veclib declare simd [clause, ]`, same as > `#pragma omp declare simd` from OpenMP 4.0+.Why not just use "pragma omp simd"? If I recall correctly, there's an option to allow OMP SIMD pragmas without enabling full OMP, so that we can use it without needing all the headers and libraries, just to control vectorisation. Creating new pragmas should be seen with extreme prejudice, as these things tend to simplify the life of the compiler developers but create nightmares for application developers, especially if they want to use multiple compilers.> 2. `#pragma omp declare variant`, same as `#pragma omp declare variant` > restricted to the `simd` context selector, from OpenMP 5.0+.Is this just for the user-driven stuff? If so, let's look at it later.> New `math.h` header file > ------------------------ > > Shipped in `<clang>/lib/Headers/math.h`, contains all the declaration of > the functions available in the vector library `X`, `ifdef` guarded by > the macro `__CLANG_ENABLE_LIBRARY_X`.So, the compiler will have the header files and the libraries will be in charge of implementing them, to avoid linkage errors? If this is a standard ABI that multiple libraries follow, I'm in favour. If we'll end up with one (or more) header(s) per library or worse, need to update the header every time the library changes something, then I'm completely against. The latter will generate the compatibility issue I mentioned in one of the reviews, where the compiler has different header files but the implementations are slightly off-base. Keeping multiple copies of those libraries in the same file system (for different users in the same clusters) is even worse. That's the kind of thing that is better left for the libraries themselves. If they have both headers and objects, keeping all together into one directory is enough.> Option behavior, and interaction with OpenMP > -------------------------------------------- > > `-fveclib=X` > > : The driver transform this into > `-fparse-veclib -D__CLANG_ENABLE_LIBRARY_X=1 -lX`. This is used only > for users that want to vectorize `math.h` functions.Why not just include the header when you use it, instead of include and guard for all cases?> `-fopenmp[-simd]` > > : No vectorization happens other then for those functions that are > marked with OpenMP declare simd. The header `math.h` is loaded, but > the `veclib` decorated declarations are invisible to the compiler > instance because hidden behind the `__CLANG_ENABLE_LIBRARY_X` > macros, which are not defined."No vectorisation" you mean, no "function" vectorisation. Other vectorisation (from -O3 etc) will still happen.> `-fopenmp[-simd] -fveclib=X` or > > : Same behavior as without the `-fopenmp[-simd]` option.So, fveclib will enable OMP SIMD by default? I think that's what some of the reviews (particularly on certification) were against. This is not correct. The only way this can work is without including OMP dependencies when using vector libraries. If the omp-simd option does not add OMP deps (as I hinted above, there may be a way), then this is fine. But if veclib flags force OMP dependencies, than this cannot work. -- cheers, --renato
Apparently Analagous Threads
- [RFC] Re-implementing -fveclib with OpenMP
- [RFC] Expose user provided vector function for auto-vectorization.
- [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.
- [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.
- [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.