Francesco Petrogalli via llvm-dev
2018-Nov-30 05:26 UTC
[llvm-dev] [RFC] Re-implementing -fveclib with OpenMP
Hi all, I am submitting the following RFC [1] to re-implement -fveclib via OpenMP constructs. The RFC was discussed during a round table at the last LLVM developer meeting, and presented during the BoF [2]. The proposal is published on Phabricator, for the purpose of keeping track of the comments, and it now ready for a review from a wider audience after being polished by Hal Finkel and Hideki Saito (thank you!). Kind regards, Francesco [1] https://reviews.llvm.org/D54412 [2] https://llvm.org/devmtg/2018-10/talk-abstracts.html#bof7 IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Francesco Petrogalli via llvm-dev
2018-Dec-12 03:47 UTC
[llvm-dev] [RFC] Re-implementing -fveclib with OpenMP
Hi all, I have been asked to include the RFC into the email message. Here it goes. Kind regards, Francesco ————————————————————————————————————————————————————————— Introduction =========== This RFC encompass the proposal of replacing the current `TargetLibraryInfo` (TLI) based implementation of the command line `-fveclib` with an OpenMP based one. With this change, `-fveclib` will maintain its current behavior in terms of user experience, but the new implementation will additionally: 1. Decouples the compiler front-end that knows about the availability of vectorized routines, from the back-end that knows how to make use of them. 2. Enable support for a developer's own vector libraries without requiring changes to the compiler, via the new `-fveclib-include` command line option. 3. Enables other frontends and languages to add scalar-to-vector function mappings as relevant for their own runtime libraries, etc. The implementation of the proposal will consists of the following components: 1. [Changes in LLVM IR](#llvmIR) to provide information about the availability of vector math functions via metadata attached to an `llvm::CallInst`. 2. [An infrastructure](#infrastructure) that can be queried to retrive information about the available vector functions associated to a `llvm::CallInst`. 3. [Changes in the LoopVectorizer](#LV) to use the API to query the metadata. 4. [Changes in clang](#mathdoth) to add the metadata in the IR via two mechanisms: 1. A custom `math.h` header file shipped with the compiler. 2. A user header file distributed with the library, to be used with the command line option `-fveclib-include`. 5. [Changes in the clang driver](#driver) to translate `-fveclib` in a combination of flags that enable the generation of the library-specific flags needed to select the list of available vector functions specified in any of the header files. Current status of `-fveclib` =========================== User interface -------------- At the moment, a user can invoke `-fveclib` to generate vector calls from two libraries, SVML and Accelerate, as follows: $> clang -fveclib=[SVML|Accelerate] Interface with the loop vectorizer ---------------------------------- The TLI exposes an interface that enables querying the list of available mappings by scalar name and number of lanes needed. The TLI interface is currently used by the InnerLoopVectorizer to plant vector calls in auto-vectorized loops. Extending `-fveclib` -------------------- Adding new libraries require listing the mapping in `<llvm>/lib/Analysis/TargetLibraryInfo.cpp`, plus modifying the clang front-end to handle the new value for the option - see for example the two patches to add SLEEF (<http://sleef.org>) as a target library for AArch64: <https://reviews.llvm.org/D53927> (LLVM code-base) and <https://reviews.llvm.org/D53928> (clang code-base). Limitations of the current implementation ----------------------------------------- The mapping between scalar to vector version of a function is defined by the backend, within the TLI specifically. For this reason the frontend's -fveclib option is tied to the backend's support for the, often language dependent, library. In particular, an IR file that is generated with a version of clang that knows about the availability of library `X`, needs to be processed by a backend end that also needs to know about the availability of library `X`. Proposed changes =============== We propose an implementation of `-fveclib` that makes uses of a *veclib specific* pragma that is based on the OpenMP `declare simd` and `declare variant` mechanism to inform the backend components about the availability of vector version of scalar functions found in IR. The mechanism relies in storing such information in IR metadata, and therefore makes the auto-vectorization of function calls a mid-end (`opt`) process that is independent on the front-end that generated such IR metadata. Moreover, this implementation enhances the extendibility and portability of `-fveclib` to other libraries and front-ends, and it provides a generic mechanism that the users of the LLVM compiler will be able to use for interfacing their own vector routines for generic code. The proposed implementation can also be used to expose vectorization-specific descriptors -- for example, like the `linear` and `uniform` clauses of the OpenMP `declare simd` directive -- that could be used to finely tune the automatic vectorization of some functions (think for example the vectorization of `double sincos(double , double *, double *)`, where `linear` can be used to give extra information about the memory layout of the 2 pointers parameters in the vector version). The new proposed `#pragma` directive are: 1. `#pragma veclib declare simd`. 2. `#pragma veclib declare variant`. Both directive follows the syntax of the `declare simd` and the `declare variant` directives of OpenMP, with the exception that `declare variant` is used only for the `simd` context. We define a new `veclib`-only directive instead of using the `omp` ones of OpenMP for the following reasons: 1. Allow the compiler to perform auto-vectorization outside of an OpenMP SIMD context. 2. Allow library vendors to provide standard mechanism, based on OpenMP, to inform the compiler about the availability of vector functions that can be used for auto-vectorization. A new compiler option, `-fparse-veclib`, is added to clang to enable parsing of the `veclib` directive outside an OpenMP context. OpenMP compatibility -------------------- Note that the `veclib` pragma can be converted to the standard OpenMP one by the following pre-processor test. #ifdef _OPENMP #define veclib omp #endif Notice also that the `veclib simd` and `veclib variant` directive can be parsed with the same infrastructure used for the OpenMP correspondents. In the following RFC, we will describe how the compiler behaves when parsing a `veclib` pragma. The same behavior is obtained when parsing the OpenMP based one when the compiler is invoked with the comman line options that enable OpenMP (`-fopenmp[-simd]`). Changes in LLVM IR {#llvmIR} ------------------ The IR is enriched with metadata that details the availability of vector versions of an associated scalar function. This metadata is attached to the call site of the scalar function. The metadata takes the form of an attribute containing a comma separated list of vector function mappings. Each entry has a unique name that follows the Vector Function ABI[^1] and real name that is used when generating calls to this vector function. vfunc_name1(real_name1), vfunc_name2(real_name2) The Vector Function ABI name describes the signature of the vector function so that properties like vectorisation factor can be queried during compilation. The real name is optional and assumed to match the vector function ABI name when omitted. For example, the availability of a 2-lane double precision `sin` function via SVML when targeting AVX on x86 is provided by the following IR. // ... ... = call double @sin(double) #0 // ... #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2), _ZGVdN4v_sin(__svml_sin4), ..."} } The string `"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant attribute provides information on the shape of the vector function via the string `_ZGVcN2v_sin`, mangled according to the Vector Function ABI for Intel, and remaps the standard Vector Function ABI name to the non-standard name `__svml_sin2`. This metadata is compatible with the proposal "Proposal for function vectorization and loop vectorization with function calls",[^2] that uses Vector Function ABI mangled names to inform the vectorizer about the availability of vector functions. The proposal extends the original by allowing the explicit mapping of the Vector Function ABI mangled name to a non-standard name, which allows the use of existing vector libraries. The `vector-variant` attribute needs to be attached on a per-call basis to avoid conflicts when merging modules with different vector variants. The query infrastructure: SVFS {#infrastructure} ------------------------------ The Search Vector Function System (SVFS) is constructed from an `llvm::Module` instance so it can create function definitions. The SVFS exposes an API with two methods. ### `SVFS::isFunctionVectorizable` This method queries the avilability of a vectorized version of a function. The signature of the method is as follows. bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeSet Params); The method determine the availability of vector version of the function invoked by the `Call` parameter by looking at the `vector-variant` metadata. The `Params` argument is a set mapping the position of a parameter in the CallInst to its `ParameterType` descriptor. The `ParameterType` descriptor holds information about the shape of the correspondend parameter in the signature of the vector function. This `ParamaterType` is used to query the SVMS about the availability of vector version that have `linear` or `uniform` parameters (in the sense of OpenMP 4.0 and onwards). The method we propose, when invoked with an empty `ParTypeSet`, is equivalent to the `TargetLibraryInfo` method `isFunctionVectorizable(StrinRef Name)` ### `SVFS::getVectorizedFunction` This method returns the vector function declaration that correspond to the needs of the vectorization technique that is being run. The signature of the function is as follows. std::pair<llvm::FunctionType *, std::string> getVectorizedFunction( llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params); The `Call` parameter is the call instance that is being vectorized, the `VF` parameter represent the vectorization factor (how many lanes), the `IsMasked` parameter decides whether or not the signature of the vector function is required to have a mask parameter, the `Params` parameter describes the shape of the vector function as in the `isFunctionVectorizable` method. The methods uses the `vector-variant` metadata and returns the function signature and the name of the function based on the input parameters. The SVFS can add new function definitions, in the same module as the `Call`, to provide vector functions that are not present within the vector-variant metadata. For example, if a library provides a vector version of a function with a vectorization factor of 2, but the vectorizer is requesting a vectorization factor of 4, the SVFS is allowed to create a definition that calls the 2-lane version (provided by the library) twice. This capability applies similarly for providing masked and unmasked versions when the request doesn't match what is available in the library. This method is equivalent to the TLI method `StringRef getVectorizedFunction(StringRef F, unsigned VF) const;`. Notice that to fully support OpenMP vectorization we need to think about a fuzzy matching mechanism that is able to select a candidate in the calling context. However, this is not needed for `-fveclib` because the scalar-to-vector mappings of `-fveclib` are such that for every scalar function there is only one possible vector function associated. Therefore, extending this behavior to a generic one is an aspect of the implementation that will be treated in a separate RFC about the vectorization pass. ### Scalable vectorization Both methods of the SVFS API will be extended with a boolean parameter to specify whether scalable signatures are needed by the user of the SVFS. Changes in the LoopVectorizer {#LV} ----------------------------- The LoopVectorizer and the related analysis passes will have to replace the TLI version of `isFunctionVectorizable` and `getVectorizedFunction` with the SVFS ones. Changes in clang: shipping `math.h` with the compiler {#mathdoth} ----------------------------------------------------- We use clang to generate the metadata described above. The functions available in library `X` are listed in a custom `math.h` file that is shipped with the compiler in `<clang>/lib/Headers/math.h`. The header file is implemented by including "once" the system `math.h` file, followed by `#ifdef` guarded re-declarations of the functions enriched with `#pragma veclib declare simd` directives. #include_once <math.h> // ... cpp extern "C" guards omitted #ifdef _CLANG_USE_LIBRARY_X #pragma veclib declare simd simdlen(4) notinbranch extern double sin(double); #endif This generates the vector Function ABI mangled name to be used in the `vector-variant` attribute, for example `_ZGVcN2v_sin`, when targeting AVX code generation. The part of the vector-variant attribute that redirects the call to `__svml_sin2` is also added via the header file `math.h`, by using the OpenMP 5.0 directive `declare variant`,[^3] guarded by SVML specific preprocessor macros: #ifdef _CLANG_USE_SVML #pragma veclib declare simd simdlen(4) notinbranch extern double sin(double); #pragma veclib declare variant(double sin(double)) \ match(construct=simd(simdlen(4),notinbranch), device={isa(avx2)}) __m256d __svml_sin4(__m256d x); #endif Note that the list of if-guarded function declaration do not need to leave in the same `math.h` file, but can be included in `math.h` from library-specific header files. Changes in the clang driver {#driver} --------------------------- To enable the information provided via `math.h`, the clang driver will translate the `-fveclib=X` option into `-D_CLANG_USE_LIBRARY_X -lX` to turn on the correct section of the header file and the flag for the linker. Note that the `veclib` directives are loaded even when *not* compiling for an OpenMP target. Extending auto-vectorization capabilities of LLVM ================================================ When compared to the TLI-based auto-vectorization mechanism, the OpenMP-based mechanism has the advantage of enabling users to provide their own vector routines (not just the math ones) by adding `veclib declare simd` and `veclib declare variant` definitions in their source. For this specific functionality, the following command line option is added to clang: -fveclib-include=path/to/header/file.h This options enable clang to recognize the `veclib declare simd` and `veclib declare variant` directive listed in the library of the header file. Summary ====== New `veclib` directives in clang -------------------------------- 1. `#pragma veclib declare simd [clause, ]`, same as `#pragma omp declare simd` from OpenMP 4.0+. 2. `#pragma omp declare variant`, same as `#pragma omp declare variant` restricted to the `simd` context selector, from OpenMP 5.0+. New `math.h` header file ------------------------ Shipped in `<clang>/lib/Headers/math.h`, contains all the declaration of the functions available in the vector library `X`, `ifdef` guarded by the macro `__CLANG_ENABLE_LIBRARY_X`. Option behavior, and interaction with OpenMP -------------------------------------------- The behavior described below makes sure that \`-fveclib\`\` function vectorization and OpenMP function vectorization are orthogonal. No options : No function vectorization via vector library, neither user provided or shipped via an internal `math.h`. `-fveclib=X` : The driver transform this into `-fparse-veclib -D__CLANG_ENABLE_LIBRARY_X=1 -lX`. This is used only for users that want to vectorize `math.h` functions. `-fveclib-include=path/to/user/provided/header/file.h` : The driver transform this into `-fparse-veclib -include=path/to/user/provided/header/file.h`. The user has to provide the correct linker flag for both the scalar version and the vector version of whatever function they have defined in the header file. The header file must use the `veclib` directive to inform the compiler about the available vector functions. `-fopenmp[-simd]` : No vectorization happens other then for those functions that are marked with OpenMP declare simd. The header `math.h` is loaded, but the `veclib` decorated declarations are invisible to the compiler instance because hidden behind the `__CLANG_ENABLE_LIBRARY_X` macros, which are not defined. `-fopenmp[-simd] -fveclib=X` or `-fopenmp[-simd] -fveclib-include=path/to/user/provided/header/file.h` : Same behavior as without the `-fopenmp[-simd]` option. [^1]: Vector Funcion ABI for x86: <https://software.intel.com/en-us/articles/vector-simd-function-abi>. Vector Function ABI for AArch64: https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi [^2]: <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html> [^3]: <https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>> On Nov 29, 2018, at 11:26 PM, Francesco Petrogalli <Francesco.Petrogalli at arm.com> wrote: > > Hi all, > > I am submitting the following RFC [1] to re-implement -fveclib via OpenMP constructs. The RFC was discussed during a round table at the last LLVM developer meeting, and presented during the BoF [2]. > > The proposal is published on Phabricator, for the purpose of keeping track of the comments, and it now ready for a review from a wider audience after being polished by Hal Finkel and Hideki Saito (thank you!). > > Kind regards, > > Francesco > > [1] https://reviews.llvm.org/D54412 > [2] https://llvm.org/devmtg/2018-10/talk-abstracts.html#bof7
Renato Golin via llvm-dev
2018-Dec-12 21:58 UTC
[llvm-dev] [RFC] Re-implementing -fveclib with OpenMP
Hi Francesco, This is a huge RFC and I don't think we can discuss all of it at the same time, at least not in a constructive manner. What ends up happening is that people ignore the thread and developers get upset. So, I'll start with the summary, to make sure the overall assumptions in the RFC match the ones I have about it, then we can delve into details. I also think we should not discuss user-include files now. Whatever we define for the standard ones will work for user driven ones, but user driven have additional complexities that will only get in the way of the standard discussion. Comments inline... On Wed, 12 Dec 2018 at 03:47, Francesco Petrogalli via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Summary > ======> > New `veclib` directives in clangI know this is not new but, why "fveclib"? From the review, I take this is the same as GCC's "mveclibabi", and if it is, why come up with a new name for the same thing? If it's not, what justifies implementing a different way of handling the same concept (vector math libraries), which is surely going to confuse a lot of users. In some reviews, it was said that some proprietary compilers already use "fveclib", but between being coherent with other OSS compilers and closed source compilers, I think the answer is clear. I'm not against the name, I'm just making sure we're not creating problem for ourselves.> -------------------------------- > > 1. `#pragma veclib declare simd [clause, ]`, same as > `#pragma omp declare simd` from OpenMP 4.0+.Why not just use "pragma omp simd"? If I recall correctly, there's an option to allow OMP SIMD pragmas without enabling full OMP, so that we can use it without needing all the headers and libraries, just to control vectorisation. Creating new pragmas should be seen with extreme prejudice, as these things tend to simplify the life of the compiler developers but create nightmares for application developers, especially if they want to use multiple compilers.> 2. `#pragma omp declare variant`, same as `#pragma omp declare variant` > restricted to the `simd` context selector, from OpenMP 5.0+.Is this just for the user-driven stuff? If so, let's look at it later.> New `math.h` header file > ------------------------ > > Shipped in `<clang>/lib/Headers/math.h`, contains all the declaration of > the functions available in the vector library `X`, `ifdef` guarded by > the macro `__CLANG_ENABLE_LIBRARY_X`.So, the compiler will have the header files and the libraries will be in charge of implementing them, to avoid linkage errors? If this is a standard ABI that multiple libraries follow, I'm in favour. If we'll end up with one (or more) header(s) per library or worse, need to update the header every time the library changes something, then I'm completely against. The latter will generate the compatibility issue I mentioned in one of the reviews, where the compiler has different header files but the implementations are slightly off-base. Keeping multiple copies of those libraries in the same file system (for different users in the same clusters) is even worse. That's the kind of thing that is better left for the libraries themselves. If they have both headers and objects, keeping all together into one directory is enough.> Option behavior, and interaction with OpenMP > -------------------------------------------- > > `-fveclib=X` > > : The driver transform this into > `-fparse-veclib -D__CLANG_ENABLE_LIBRARY_X=1 -lX`. This is used only > for users that want to vectorize `math.h` functions.Why not just include the header when you use it, instead of include and guard for all cases?> `-fopenmp[-simd]` > > : No vectorization happens other then for those functions that are > marked with OpenMP declare simd. The header `math.h` is loaded, but > the `veclib` decorated declarations are invisible to the compiler > instance because hidden behind the `__CLANG_ENABLE_LIBRARY_X` > macros, which are not defined."No vectorisation" you mean, no "function" vectorisation. Other vectorisation (from -O3 etc) will still happen.> `-fopenmp[-simd] -fveclib=X` or > > : Same behavior as without the `-fopenmp[-simd]` option.So, fveclib will enable OMP SIMD by default? I think that's what some of the reviews (particularly on certification) were against. This is not correct. The only way this can work is without including OMP dependencies when using vector libraries. If the omp-simd option does not add OMP deps (as I hinted above, there may be a way), then this is fine. But if veclib flags force OMP dependencies, than this cannot work. -- cheers, --renato