Finkel, Hal J. via llvm-dev
2019-May-29 02:55 UTC
[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.
On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote:> I generally like the idea of having support in IR for vectorization of > custom functions. I have several use cases which would benefit from this. > > I'd suggest a couple of reframings to the IR representation though. > > First, this should probably be specified as metadata/attribute on a > function declaration. Allowing the callsite variant is fine, but it > should primarily be a property of the called function, not of the call > site. Being able to specify it once per declaration is much cleaner.I agree. We should support this both on the function declaration and on the call sites.> > Second, I really don't like the mangling use here. We need a better way > to specify the properties of the function then it's mangled name. One > thought to explore is to directly use the Value of the function > declaration (since this is metadata and we can do that), and then tie > the properties to the function declaration in some way? Sorry, I don't > really have a specific suggestion here.Is the problem the mangling or the fact that the mangling is ABI/target-specific? One option is to use LLVM's mangling scheme (the one we use for intrinsics) and then provide some backend infrastructure to translate later. -Hal> > Philip > > On 5/28/19 12:44 PM, Francesco Petrogalli via llvm-dev wrote: >> Dear all, >> >> This RFC is a proposal to provide auto-vectorization functionality for user provided vector functions. >> >> The proposal is a modification of an RFC that I have sent out a couple of months ago, with the title `[RFC] Re-implementing -fveclib with OpenMP` (see http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The previous RFC is to be considered abandoned. >> >> The original RFC was proposing to re-implement the `-fveclib` command line option. This proposal avoids that, and limits its scope to the mechanics of providing vector function in user code that the compiler can pick up for auto-vectorization. This narrower scope limits the impact of changes that are needed in both clang and LLVM. >> >> Please let me know what you think. >> >> Kind regards, >> >> Francesco >> >> >> ================================================================================>> >> Introduction >> ===========>> >> This RFC encompasses the proposal of informing the vectorizer about the >> availability of vector functions provided by the user. The mechanism is >> based on the use of the directive `declare variant` introduced in OpenMP >> 5.0 [^1]. >> >> The mechanism proposed has the following properties: >> >> 1. Decouples the compiler front-end that knows about the availability >> of vectorized routines, from the back-end that knows how to make use >> of them. >> 2. Enable support for a developer's own vector libraries without >> requiring changes to the compiler. >> 3. Enables other frontends (e.g. f18) to add scalar-to-vector function >> mappings as relevant for their own runtime libraries, etc. >> >> The implemetation consists of two separate sets of changes. >> >> The first set is a set o changes in `llvm`, and consists of: >> >> 1. [Changes in LLVM IR](#llvmIR) to provide information about the >> availability of user-defined vector functions via metadata attached >> to an `llvm::CallInst`. >> 2. [An infrastructure](#infrastructure) that can be queried to retrive >> information about the available vector functions associated to a >> `llvm::CallInst`. >> 3. [Changes in the LoopVectorizer](#LV) to use the API to query the >> metadata. >> >> The second set consists of the changes [changes in clang](#clang) that >> are needed too to recognize the `#pragma clang declare variant` >> directive. >> >> Proposed changes >> ===============>> >> We propose an implementation that uses `#pragma clang declare variant` >> to inform the backend components about the availability of vector >> version of scalar functions found in IR. The mechanism relies in storing >> such information in IR metadata, and therefore makes the >> auto-vectorization of function calls a mid-end (`opt`) process that is >> independent on the front-end that generated such IR metadata. >> >> This implementation provides a generic mechanism that the users of the >> LLVM compiler will be able to use for interfacing their own vector >> routines for generic code. >> >> The implementation can also expose vectorization-specific descriptors -- >> for example, like the `linear` and `uniform` clauses of the OpenMP >> `declare simd` directive -- that could be used to finely tune the >> automatic vectorization of some functions (think for example the >> vectorization of `double sincos(double , double *, double *)`, where >> `linear` can be used to give extra information about the memory layout >> of the 2 pointers parameters in the vector version). >> >> The directive `#pragma clang declare variant` follows the syntax of the >> `#pragma omp declare variant` directive of OpenMP. >> >> We define the new directive in the `clang` namespace instead of using >> the `omp` one of OpenMP to allow the compiler to perform >> auto-vectorization outside of an OpenMP SIMD context. >> >> The mechanism is base on OpenMP to provide a uniform user experience >> across the two mechanism, and to maximise the number of shared >> components of the infrastructure needed in the compiler frontend to >> enable the feature. >> >> Changes in LLVM IR {#llvmIR} >> ------------------ >> >> The IR is enriched with metadata that details the availability of vector >> versions of an associated scalar function. This metadata is attached to >> the call site of the scalar function. >> >> The metadata takes the form of an attribute containing a comma separated >> list of vector function mappings. Each entry has a unique name that >> follows the Vector Function ABI[^2] and real name that is used when >> generating calls to this vector function. >> >> vfunc_name1(real_name1), vfunc_name2(real_name2) >> >> The Vector Function ABI name describes the signature of the vector >> function so that properties like vectorisation factor can be queried >> during compilation. >> >> The `(real name)` token is optional and assumed to match the Vector >> Function ABI name when omitted. >> >> For example, the availability of a 2-lane double precision `sin` >> function via SVML when targeting AVX on x86 is provided by the following >> IR. >> >> // ... >> ... = call double @sin(double) #0 >> // ... >> >> #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2), >> _ZGVdN4v_sin(__svml_sin4), >> ..."} } >> >> The string `"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant >> attribute provides information on the shape of the vector function via >> the string `_ZGVcN2v_sin`, mangled according to the Vector Function ABI >> for Intel, and remaps the standard Vector Function ABI name to the >> non-standard name `__svml_sin2`. >> >> This metadata is compatible with the proposal "Proposal for function >> vectorization and loop vectorization with function calls",[^3] that uses >> Vector Function ABI mangled names to inform the vectorizer about the >> availability of vector functions. The proposal extends the original by >> allowing the explicit mapping of the Vector Function ABI mangled name to >> a non-standard name, which allows the use of existing vector libraries. >> >> The `vector-variant` attribute needs to be attached on a per-call basis >> to avoid conflicts when merging modules with different vector variants. >> >> The query infrastructure: SVFS {#infrastructure} >> ------------------------------ >> >> The Search Vector Function System (SVFS) is constructed from an >> `llvm::Module` instance so it can create function definitions. The SVFS >> exposes an API with two methods. >> >> ### `SVFS::isFunctionVectorizable` >> >> This method queries the avilability of a vectorized version of a >> function. The signature of the method is as follows. >> >> bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap Params); >> >> The method determine the availability of vector version of the function >> invoked by the `Call` parameter by looking at the `vector-variant` >> metadata. >> >> The `Params` argument is a map that associates the position of a >> parameter in the `CallInst` to its `ParameterType` descriptor. The >> `ParameterType` descriptor holds information about the shape of the >> correspondend parameter in the signature of the vector function. This >> `ParamaterType` is used to query the SVMS about the availability of >> vector version that have `linear`, `uniform` or `align` parameters (in >> the sense of OpenMP 4.0 and onwards). >> >> The method `isFunctionVectorizable`, when invoked with an empty >> `ParTypeMap`, is equivalent to the `TargetLibraryInfo` method >> `isFunctionVectorizable(StrinRef Name)`. >> >> ### `SVFS::getVectorizedFunction` >> >> This method returns the vector function declaration that correspond to >> the needs of the vectorization technique that is being run. >> >> The signature of the function is as follows. >> >> std::pair<llvm::FunctionType *, std::string> getVectorizedFunction( >> llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params); >> >> The `Call` parameter is the call instance that is being vectorized, the >> `VF` parameter represent the vectorization factor (how many lanes), the >> `IsMasked` parameter decides whether or not the signature of the vector >> function is required to have a mask parameter, the `Params` parameter >> describes the shape of the vector function as in the >> `isFunctionVectorizable` method. >> >> The methods uses the `vector-variant` metadata and returns the function >> signature and the name of the function based on the input parameters. >> >> The SVFS can add new function definitions, in the same module as the >> `Call`, to provide vector functions that are not present within the >> vector-variant metadata. For example, if a library provides a vector >> version of a function with a vectorization factor of 2, but the >> vectorizer is requesting a vectorization factor of 4, the SVFS is >> allowed to create a definition that calls the 2-lane version twice. This >> capability applies similarly for providing masked and unmasked versions >> when the request does not match what is available in the library. >> >> This method is equivalent to the TLI method >> `StringRef getVectorizedFunction(StringRef F, unsigned VF) const;`. >> >> Notice that to fully support OpenMP vectorization we need to think about >> a fuzzy matching mechanism that is able to select a candidate in the >> calling context. However, this proposal is intended for scalar-to-vector >> mappings of math-like functions that are most likely to associate a >> unique vector candidate in most contexts. Therefore, extending this >> behavior to a generic one is an aspect of the implementation that will >> be treated in a separate RFC about the vectorization pass. >> >> ### Scalable vectorization >> >> Both methods of the SVFS API will be extended with a boolean parameter >> to specify whether scalable signatures are needed by the user of the >> SVFS. >> >> Changes in clang {#clang} >> ---------------- >> >> We use clang to generate the metadata described above. >> >> In the compilation unit, the vector function definition or declaration >> must be visible and associated to the scalar version via the >> `#pragma clang declare variant` according to the rule defined by the >> correspondent `#pragma omp declare variant` defined in OpenMP 5.0, as in >> the following example. >> >> #pragma clang declare variant(vector_sinf) \ >> match(construct=simd(simdlen(4),notinbranch), device={isa("simd")}) >> extern float sinf(float); >> >> float32x4_t vector_sinf(float32x4_t x); >> >> The `construct` set in the directive, together with the `device` set, is >> used to generate the vector mangled name to be used in the >> `vector-variant` attribute, for example `_ZGVnN2v_sin`, when targeting >> AArch64 Advanced SIMD code generation. The rule for mangling the name of >> the scalar function in the vector name are defined in the the Vector >> Function ABI specification of the target. >> >> The part of the vector-variant attribute that redirects the call to >> `vector_sinf` is derived from the `variant-id` specified in the >> `variant` clause. >> >> Summary >> ======>> >> New `clang` directive in clang >> ------------------------------ >> >> `#pragma omp declare variant`, same as `#pragma omp declare variant` >> restricted to the `simd` context selector, from OpenMP 5.0+. >> >> Option behavior, and interaction with OpenMP >> -------------------------------------------- >> >> The behavior described below makes sure that >> `#pragma cland declare variant` function vectorization and OpenMP >> function vectorization are orthogonal. >> >> `-fclang-declare-variant` >> >> : The `#pragma clang declare variant` directives are parsed and used >> to populate the `vector-variant` attribute. >> >> `-fopenmp[-simd]` >> >> : The `#pragma omp declare variant` directives are parsed and used to >> populate the `vector-variant` attribute. >> >> `-fopenmp[-simd]`and `-fno-clang-declare-variant` >> >> : The directive `#pragma omp declare variant` is used to populate the >> `vector-variant` attribute in IR. The directive >> `#pragma clang declare variant` are ignored. >> >> [^1]: <https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf> >> >> [^2]: Vector Function ABI for x86: >> <https://software.intel.com/en-us/articles/vector-simd-function-abi>. >> Vector Function ABI for AArch64: >> https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi >> >> [^3]: <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Philip Reames via llvm-dev
2019-May-29 18:52 UTC
[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.
On 5/28/19 7:55 PM, Finkel, Hal J. wrote:> On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote: >> I generally like the idea of having support in IR for vectorization of >> custom functions. I have several use cases which would benefit from this. >> >> I'd suggest a couple of reframings to the IR representation though. >> >> First, this should probably be specified as metadata/attribute on a >> function declaration. Allowing the callsite variant is fine, but it >> should primarily be a property of the called function, not of the call >> site. Being able to specify it once per declaration is much cleaner. > > I agree. We should support this both on the function declaration and on > the call sites. > > >> Second, I really don't like the mangling use here. We need a better way >> to specify the properties of the function then it's mangled name. One >> thought to explore is to directly use the Value of the function >> declaration (since this is metadata and we can do that), and then tie >> the properties to the function declaration in some way? Sorry, I don't >> really have a specific suggestion here. > > Is the problem the mangling or the fact that the mangling is > ABI/target-specific? One option is to use LLVM's mangling scheme (the > one we use for intrinsics) and then provide some backend infrastructure > to translate later.Well, both honestly. But mangling with a non-target specific scheme is a lot better, so I might be okay with that. Good idea.> > > -Hal > > >> Philip >> >> On 5/28/19 12:44 PM, Francesco Petrogalli via llvm-dev wrote: >>> Dear all, >>> >>> This RFC is a proposal to provide auto-vectorization functionality for user provided vector functions. >>> >>> The proposal is a modification of an RFC that I have sent out a couple of months ago, with the title `[RFC] Re-implementing -fveclib with OpenMP` (see http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The previous RFC is to be considered abandoned. >>> >>> The original RFC was proposing to re-implement the `-fveclib` command line option. This proposal avoids that, and limits its scope to the mechanics of providing vector function in user code that the compiler can pick up for auto-vectorization. This narrower scope limits the impact of changes that are needed in both clang and LLVM. >>> >>> Please let me know what you think. >>> >>> Kind regards, >>> >>> Francesco >>> >>> >>> ================================================================================>>> >>> Introduction >>> ===========>>> >>> This RFC encompasses the proposal of informing the vectorizer about the >>> availability of vector functions provided by the user. The mechanism is >>> based on the use of the directive `declare variant` introduced in OpenMP >>> 5.0 [^1]. >>> >>> The mechanism proposed has the following properties: >>> >>> 1. Decouples the compiler front-end that knows about the availability >>> of vectorized routines, from the back-end that knows how to make use >>> of them. >>> 2. Enable support for a developer's own vector libraries without >>> requiring changes to the compiler. >>> 3. Enables other frontends (e.g. f18) to add scalar-to-vector function >>> mappings as relevant for their own runtime libraries, etc. >>> >>> The implemetation consists of two separate sets of changes. >>> >>> The first set is a set o changes in `llvm`, and consists of: >>> >>> 1. [Changes in LLVM IR](#llvmIR) to provide information about the >>> availability of user-defined vector functions via metadata attached >>> to an `llvm::CallInst`. >>> 2. [An infrastructure](#infrastructure) that can be queried to retrive >>> information about the available vector functions associated to a >>> `llvm::CallInst`. >>> 3. [Changes in the LoopVectorizer](#LV) to use the API to query the >>> metadata. >>> >>> The second set consists of the changes [changes in clang](#clang) that >>> are needed too to recognize the `#pragma clang declare variant` >>> directive. >>> >>> Proposed changes >>> ===============>>> >>> We propose an implementation that uses `#pragma clang declare variant` >>> to inform the backend components about the availability of vector >>> version of scalar functions found in IR. The mechanism relies in storing >>> such information in IR metadata, and therefore makes the >>> auto-vectorization of function calls a mid-end (`opt`) process that is >>> independent on the front-end that generated such IR metadata. >>> >>> This implementation provides a generic mechanism that the users of the >>> LLVM compiler will be able to use for interfacing their own vector >>> routines for generic code. >>> >>> The implementation can also expose vectorization-specific descriptors -- >>> for example, like the `linear` and `uniform` clauses of the OpenMP >>> `declare simd` directive -- that could be used to finely tune the >>> automatic vectorization of some functions (think for example the >>> vectorization of `double sincos(double , double *, double *)`, where >>> `linear` can be used to give extra information about the memory layout >>> of the 2 pointers parameters in the vector version). >>> >>> The directive `#pragma clang declare variant` follows the syntax of the >>> `#pragma omp declare variant` directive of OpenMP. >>> >>> We define the new directive in the `clang` namespace instead of using >>> the `omp` one of OpenMP to allow the compiler to perform >>> auto-vectorization outside of an OpenMP SIMD context. >>> >>> The mechanism is base on OpenMP to provide a uniform user experience >>> across the two mechanism, and to maximise the number of shared >>> components of the infrastructure needed in the compiler frontend to >>> enable the feature. >>> >>> Changes in LLVM IR {#llvmIR} >>> ------------------ >>> >>> The IR is enriched with metadata that details the availability of vector >>> versions of an associated scalar function. This metadata is attached to >>> the call site of the scalar function. >>> >>> The metadata takes the form of an attribute containing a comma separated >>> list of vector function mappings. Each entry has a unique name that >>> follows the Vector Function ABI[^2] and real name that is used when >>> generating calls to this vector function. >>> >>> vfunc_name1(real_name1), vfunc_name2(real_name2) >>> >>> The Vector Function ABI name describes the signature of the vector >>> function so that properties like vectorisation factor can be queried >>> during compilation. >>> >>> The `(real name)` token is optional and assumed to match the Vector >>> Function ABI name when omitted. >>> >>> For example, the availability of a 2-lane double precision `sin` >>> function via SVML when targeting AVX on x86 is provided by the following >>> IR. >>> >>> // ... >>> ... = call double @sin(double) #0 >>> // ... >>> >>> #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2), >>> _ZGVdN4v_sin(__svml_sin4), >>> ..."} } >>> >>> The string `"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant >>> attribute provides information on the shape of the vector function via >>> the string `_ZGVcN2v_sin`, mangled according to the Vector Function ABI >>> for Intel, and remaps the standard Vector Function ABI name to the >>> non-standard name `__svml_sin2`. >>> >>> This metadata is compatible with the proposal "Proposal for function >>> vectorization and loop vectorization with function calls",[^3] that uses >>> Vector Function ABI mangled names to inform the vectorizer about the >>> availability of vector functions. The proposal extends the original by >>> allowing the explicit mapping of the Vector Function ABI mangled name to >>> a non-standard name, which allows the use of existing vector libraries. >>> >>> The `vector-variant` attribute needs to be attached on a per-call basis >>> to avoid conflicts when merging modules with different vector variants. >>> >>> The query infrastructure: SVFS {#infrastructure} >>> ------------------------------ >>> >>> The Search Vector Function System (SVFS) is constructed from an >>> `llvm::Module` instance so it can create function definitions. The SVFS >>> exposes an API with two methods. >>> >>> ### `SVFS::isFunctionVectorizable` >>> >>> This method queries the avilability of a vectorized version of a >>> function. The signature of the method is as follows. >>> >>> bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap Params); >>> >>> The method determine the availability of vector version of the function >>> invoked by the `Call` parameter by looking at the `vector-variant` >>> metadata. >>> >>> The `Params` argument is a map that associates the position of a >>> parameter in the `CallInst` to its `ParameterType` descriptor. The >>> `ParameterType` descriptor holds information about the shape of the >>> correspondend parameter in the signature of the vector function. This >>> `ParamaterType` is used to query the SVMS about the availability of >>> vector version that have `linear`, `uniform` or `align` parameters (in >>> the sense of OpenMP 4.0 and onwards). >>> >>> The method `isFunctionVectorizable`, when invoked with an empty >>> `ParTypeMap`, is equivalent to the `TargetLibraryInfo` method >>> `isFunctionVectorizable(StrinRef Name)`. >>> >>> ### `SVFS::getVectorizedFunction` >>> >>> This method returns the vector function declaration that correspond to >>> the needs of the vectorization technique that is being run. >>> >>> The signature of the function is as follows. >>> >>> std::pair<llvm::FunctionType *, std::string> getVectorizedFunction( >>> llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params); >>> >>> The `Call` parameter is the call instance that is being vectorized, the >>> `VF` parameter represent the vectorization factor (how many lanes), the >>> `IsMasked` parameter decides whether or not the signature of the vector >>> function is required to have a mask parameter, the `Params` parameter >>> describes the shape of the vector function as in the >>> `isFunctionVectorizable` method. >>> >>> The methods uses the `vector-variant` metadata and returns the function >>> signature and the name of the function based on the input parameters. >>> >>> The SVFS can add new function definitions, in the same module as the >>> `Call`, to provide vector functions that are not present within the >>> vector-variant metadata. For example, if a library provides a vector >>> version of a function with a vectorization factor of 2, but the >>> vectorizer is requesting a vectorization factor of 4, the SVFS is >>> allowed to create a definition that calls the 2-lane version twice. This >>> capability applies similarly for providing masked and unmasked versions >>> when the request does not match what is available in the library. >>> >>> This method is equivalent to the TLI method >>> `StringRef getVectorizedFunction(StringRef F, unsigned VF) const;`. >>> >>> Notice that to fully support OpenMP vectorization we need to think about >>> a fuzzy matching mechanism that is able to select a candidate in the >>> calling context. However, this proposal is intended for scalar-to-vector >>> mappings of math-like functions that are most likely to associate a >>> unique vector candidate in most contexts. Therefore, extending this >>> behavior to a generic one is an aspect of the implementation that will >>> be treated in a separate RFC about the vectorization pass. >>> >>> ### Scalable vectorization >>> >>> Both methods of the SVFS API will be extended with a boolean parameter >>> to specify whether scalable signatures are needed by the user of the >>> SVFS. >>> >>> Changes in clang {#clang} >>> ---------------- >>> >>> We use clang to generate the metadata described above. >>> >>> In the compilation unit, the vector function definition or declaration >>> must be visible and associated to the scalar version via the >>> `#pragma clang declare variant` according to the rule defined by the >>> correspondent `#pragma omp declare variant` defined in OpenMP 5.0, as in >>> the following example. >>> >>> #pragma clang declare variant(vector_sinf) \ >>> match(construct=simd(simdlen(4),notinbranch), device={isa("simd")}) >>> extern float sinf(float); >>> >>> float32x4_t vector_sinf(float32x4_t x); >>> >>> The `construct` set in the directive, together with the `device` set, is >>> used to generate the vector mangled name to be used in the >>> `vector-variant` attribute, for example `_ZGVnN2v_sin`, when targeting >>> AArch64 Advanced SIMD code generation. The rule for mangling the name of >>> the scalar function in the vector name are defined in the the Vector >>> Function ABI specification of the target. >>> >>> The part of the vector-variant attribute that redirects the call to >>> `vector_sinf` is derived from the `variant-id` specified in the >>> `variant` clause. >>> >>> Summary >>> ======>>> >>> New `clang` directive in clang >>> ------------------------------ >>> >>> `#pragma omp declare variant`, same as `#pragma omp declare variant` >>> restricted to the `simd` context selector, from OpenMP 5.0+. >>> >>> Option behavior, and interaction with OpenMP >>> -------------------------------------------- >>> >>> The behavior described below makes sure that >>> `#pragma cland declare variant` function vectorization and OpenMP >>> function vectorization are orthogonal. >>> >>> `-fclang-declare-variant` >>> >>> : The `#pragma clang declare variant` directives are parsed and used >>> to populate the `vector-variant` attribute. >>> >>> `-fopenmp[-simd]` >>> >>> : The `#pragma omp declare variant` directives are parsed and used to >>> populate the `vector-variant` attribute. >>> >>> `-fopenmp[-simd]`and `-fno-clang-declare-variant` >>> >>> : The directive `#pragma omp declare variant` is used to populate the >>> `vector-variant` attribute in IR. The directive >>> `#pragma clang declare variant` are ignored. >>> >>> [^1]: <https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf> >>> >>> [^2]: Vector Function ABI for x86: >>> <https://software.intel.com/en-us/articles/vector-simd-function-abi>. >>> Vector Function ABI for AArch64: >>> https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi >>> >>> [^3]: <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> _______________________________________________ >> cfe-dev mailing list >> cfe-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Finkel, Hal J. via llvm-dev
2019-May-29 19:16 UTC
[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.
On 5/29/19 1:52 PM, Philip Reames wrote:> On 5/28/19 7:55 PM, Finkel, Hal J. wrote: >> On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote: >>> I generally like the idea of having support in IR for vectorization of >>> custom functions. I have several use cases which would benefit from this. >>> >>> I'd suggest a couple of reframings to the IR representation though. >>> >>> First, this should probably be specified as metadata/attribute on a >>> function declaration. Allowing the callsite variant is fine, but it >>> should primarily be a property of the called function, not of the call >>> site. Being able to specify it once per declaration is much cleaner. >> I agree. We should support this both on the function declaration and on >> the call sites. >> >> >>> Second, I really don't like the mangling use here. We need a better way >>> to specify the properties of the function then it's mangled name. One >>> thought to explore is to directly use the Value of the function >>> declaration (since this is metadata and we can do that), and then tie >>> the properties to the function declaration in some way? Sorry, I don't >>> really have a specific suggestion here. >> Is the problem the mangling or the fact that the mangling is >> ABI/target-specific? One option is to use LLVM's mangling scheme (the >> one we use for intrinsics) and then provide some backend infrastructure >> to translate later. > Well, both honestly. But mangling with a non-target specific scheme is > a lot better, so I might be okay with that. Good idea.I liked your idea of directly encoding the signature in the metadata, but I think that we want to continue to use attributes, and not metadata, and the options for attributes seem more limited - unless we allow attributes to take metadata arguments - maybe that's an enhancement worth considering. -Hal>> >> -Hal >> >> >>> Philip >>> >>> On 5/28/19 12:44 PM, Francesco Petrogalli via llvm-dev wrote: >>>> Dear all, >>>> >>>> This RFC is a proposal to provide auto-vectorization functionality for user provided vector functions. >>>> >>>> The proposal is a modification of an RFC that I have sent out a couple of months ago, with the title `[RFC] Re-implementing -fveclib with OpenMP` (see http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The previous RFC is to be considered abandoned. >>>> >>>> The original RFC was proposing to re-implement the `-fveclib` command line option. This proposal avoids that, and limits its scope to the mechanics of providing vector function in user code that the compiler can pick up for auto-vectorization. This narrower scope limits the impact of changes that are needed in both clang and LLVM. >>>> >>>> Please let me know what you think. >>>> >>>> Kind regards, >>>> >>>> Francesco >>>> >>>> >>>> ================================================================================>>>> >>>> Introduction >>>> ===========>>>> >>>> This RFC encompasses the proposal of informing the vectorizer about the >>>> availability of vector functions provided by the user. The mechanism is >>>> based on the use of the directive `declare variant` introduced in OpenMP >>>> 5.0 [^1]. >>>> >>>> The mechanism proposed has the following properties: >>>> >>>> 1. Decouples the compiler front-end that knows about the availability >>>> of vectorized routines, from the back-end that knows how to make use >>>> of them. >>>> 2. Enable support for a developer's own vector libraries without >>>> requiring changes to the compiler. >>>> 3. Enables other frontends (e.g. f18) to add scalar-to-vector function >>>> mappings as relevant for their own runtime libraries, etc. >>>> >>>> The implemetation consists of two separate sets of changes. >>>> >>>> The first set is a set o changes in `llvm`, and consists of: >>>> >>>> 1. [Changes in LLVM IR](#llvmIR) to provide information about the >>>> availability of user-defined vector functions via metadata attached >>>> to an `llvm::CallInst`. >>>> 2. [An infrastructure](#infrastructure) that can be queried to retrive >>>> information about the available vector functions associated to a >>>> `llvm::CallInst`. >>>> 3. [Changes in the LoopVectorizer](#LV) to use the API to query the >>>> metadata. >>>> >>>> The second set consists of the changes [changes in clang](#clang) that >>>> are needed too to recognize the `#pragma clang declare variant` >>>> directive. >>>> >>>> Proposed changes >>>> ===============>>>> >>>> We propose an implementation that uses `#pragma clang declare variant` >>>> to inform the backend components about the availability of vector >>>> version of scalar functions found in IR. The mechanism relies in storing >>>> such information in IR metadata, and therefore makes the >>>> auto-vectorization of function calls a mid-end (`opt`) process that is >>>> independent on the front-end that generated such IR metadata. >>>> >>>> This implementation provides a generic mechanism that the users of the >>>> LLVM compiler will be able to use for interfacing their own vector >>>> routines for generic code. >>>> >>>> The implementation can also expose vectorization-specific descriptors -- >>>> for example, like the `linear` and `uniform` clauses of the OpenMP >>>> `declare simd` directive -- that could be used to finely tune the >>>> automatic vectorization of some functions (think for example the >>>> vectorization of `double sincos(double , double *, double *)`, where >>>> `linear` can be used to give extra information about the memory layout >>>> of the 2 pointers parameters in the vector version). >>>> >>>> The directive `#pragma clang declare variant` follows the syntax of the >>>> `#pragma omp declare variant` directive of OpenMP. >>>> >>>> We define the new directive in the `clang` namespace instead of using >>>> the `omp` one of OpenMP to allow the compiler to perform >>>> auto-vectorization outside of an OpenMP SIMD context. >>>> >>>> The mechanism is base on OpenMP to provide a uniform user experience >>>> across the two mechanism, and to maximise the number of shared >>>> components of the infrastructure needed in the compiler frontend to >>>> enable the feature. >>>> >>>> Changes in LLVM IR {#llvmIR} >>>> ------------------ >>>> >>>> The IR is enriched with metadata that details the availability of vector >>>> versions of an associated scalar function. This metadata is attached to >>>> the call site of the scalar function. >>>> >>>> The metadata takes the form of an attribute containing a comma separated >>>> list of vector function mappings. Each entry has a unique name that >>>> follows the Vector Function ABI[^2] and real name that is used when >>>> generating calls to this vector function. >>>> >>>> vfunc_name1(real_name1), vfunc_name2(real_name2) >>>> >>>> The Vector Function ABI name describes the signature of the vector >>>> function so that properties like vectorisation factor can be queried >>>> during compilation. >>>> >>>> The `(real name)` token is optional and assumed to match the Vector >>>> Function ABI name when omitted. >>>> >>>> For example, the availability of a 2-lane double precision `sin` >>>> function via SVML when targeting AVX on x86 is provided by the following >>>> IR. >>>> >>>> // ... >>>> ... = call double @sin(double) #0 >>>> // ... >>>> >>>> #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2), >>>> _ZGVdN4v_sin(__svml_sin4), >>>> ..."} } >>>> >>>> The string `"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant >>>> attribute provides information on the shape of the vector function via >>>> the string `_ZGVcN2v_sin`, mangled according to the Vector Function ABI >>>> for Intel, and remaps the standard Vector Function ABI name to the >>>> non-standard name `__svml_sin2`. >>>> >>>> This metadata is compatible with the proposal "Proposal for function >>>> vectorization and loop vectorization with function calls",[^3] that uses >>>> Vector Function ABI mangled names to inform the vectorizer about the >>>> availability of vector functions. The proposal extends the original by >>>> allowing the explicit mapping of the Vector Function ABI mangled name to >>>> a non-standard name, which allows the use of existing vector libraries. >>>> >>>> The `vector-variant` attribute needs to be attached on a per-call basis >>>> to avoid conflicts when merging modules with different vector variants. >>>> >>>> The query infrastructure: SVFS {#infrastructure} >>>> ------------------------------ >>>> >>>> The Search Vector Function System (SVFS) is constructed from an >>>> `llvm::Module` instance so it can create function definitions. The SVFS >>>> exposes an API with two methods. >>>> >>>> ### `SVFS::isFunctionVectorizable` >>>> >>>> This method queries the avilability of a vectorized version of a >>>> function. The signature of the method is as follows. >>>> >>>> bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap Params); >>>> >>>> The method determine the availability of vector version of the function >>>> invoked by the `Call` parameter by looking at the `vector-variant` >>>> metadata. >>>> >>>> The `Params` argument is a map that associates the position of a >>>> parameter in the `CallInst` to its `ParameterType` descriptor. The >>>> `ParameterType` descriptor holds information about the shape of the >>>> correspondend parameter in the signature of the vector function. This >>>> `ParamaterType` is used to query the SVMS about the availability of >>>> vector version that have `linear`, `uniform` or `align` parameters (in >>>> the sense of OpenMP 4.0 and onwards). >>>> >>>> The method `isFunctionVectorizable`, when invoked with an empty >>>> `ParTypeMap`, is equivalent to the `TargetLibraryInfo` method >>>> `isFunctionVectorizable(StrinRef Name)`. >>>> >>>> ### `SVFS::getVectorizedFunction` >>>> >>>> This method returns the vector function declaration that correspond to >>>> the needs of the vectorization technique that is being run. >>>> >>>> The signature of the function is as follows. >>>> >>>> std::pair<llvm::FunctionType *, std::string> getVectorizedFunction( >>>> llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params); >>>> >>>> The `Call` parameter is the call instance that is being vectorized, the >>>> `VF` parameter represent the vectorization factor (how many lanes), the >>>> `IsMasked` parameter decides whether or not the signature of the vector >>>> function is required to have a mask parameter, the `Params` parameter >>>> describes the shape of the vector function as in the >>>> `isFunctionVectorizable` method. >>>> >>>> The methods uses the `vector-variant` metadata and returns the function >>>> signature and the name of the function based on the input parameters. >>>> >>>> The SVFS can add new function definitions, in the same module as the >>>> `Call`, to provide vector functions that are not present within the >>>> vector-variant metadata. For example, if a library provides a vector >>>> version of a function with a vectorization factor of 2, but the >>>> vectorizer is requesting a vectorization factor of 4, the SVFS is >>>> allowed to create a definition that calls the 2-lane version twice. This >>>> capability applies similarly for providing masked and unmasked versions >>>> when the request does not match what is available in the library. >>>> >>>> This method is equivalent to the TLI method >>>> `StringRef getVectorizedFunction(StringRef F, unsigned VF) const;`. >>>> >>>> Notice that to fully support OpenMP vectorization we need to think about >>>> a fuzzy matching mechanism that is able to select a candidate in the >>>> calling context. However, this proposal is intended for scalar-to-vector >>>> mappings of math-like functions that are most likely to associate a >>>> unique vector candidate in most contexts. Therefore, extending this >>>> behavior to a generic one is an aspect of the implementation that will >>>> be treated in a separate RFC about the vectorization pass. >>>> >>>> ### Scalable vectorization >>>> >>>> Both methods of the SVFS API will be extended with a boolean parameter >>>> to specify whether scalable signatures are needed by the user of the >>>> SVFS. >>>> >>>> Changes in clang {#clang} >>>> ---------------- >>>> >>>> We use clang to generate the metadata described above. >>>> >>>> In the compilation unit, the vector function definition or declaration >>>> must be visible and associated to the scalar version via the >>>> `#pragma clang declare variant` according to the rule defined by the >>>> correspondent `#pragma omp declare variant` defined in OpenMP 5.0, as in >>>> the following example. >>>> >>>> #pragma clang declare variant(vector_sinf) \ >>>> match(construct=simd(simdlen(4),notinbranch), device={isa("simd")}) >>>> extern float sinf(float); >>>> >>>> float32x4_t vector_sinf(float32x4_t x); >>>> >>>> The `construct` set in the directive, together with the `device` set, is >>>> used to generate the vector mangled name to be used in the >>>> `vector-variant` attribute, for example `_ZGVnN2v_sin`, when targeting >>>> AArch64 Advanced SIMD code generation. The rule for mangling the name of >>>> the scalar function in the vector name are defined in the the Vector >>>> Function ABI specification of the target. >>>> >>>> The part of the vector-variant attribute that redirects the call to >>>> `vector_sinf` is derived from the `variant-id` specified in the >>>> `variant` clause. >>>> >>>> Summary >>>> ======>>>> >>>> New `clang` directive in clang >>>> ------------------------------ >>>> >>>> `#pragma omp declare variant`, same as `#pragma omp declare variant` >>>> restricted to the `simd` context selector, from OpenMP 5.0+. >>>> >>>> Option behavior, and interaction with OpenMP >>>> -------------------------------------------- >>>> >>>> The behavior described below makes sure that >>>> `#pragma cland declare variant` function vectorization and OpenMP >>>> function vectorization are orthogonal. >>>> >>>> `-fclang-declare-variant` >>>> >>>> : The `#pragma clang declare variant` directives are parsed and used >>>> to populate the `vector-variant` attribute. >>>> >>>> `-fopenmp[-simd]` >>>> >>>> : The `#pragma omp declare variant` directives are parsed and used to >>>> populate the `vector-variant` attribute. >>>> >>>> `-fopenmp[-simd]`and `-fno-clang-declare-variant` >>>> >>>> : The directive `#pragma omp declare variant` is used to populate the >>>> `vector-variant` attribute in IR. The directive >>>> `#pragma clang declare variant` are ignored. >>>> >>>> [^1]: <https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf> >>>> >>>> [^2]: Vector Function ABI for x86: >>>> <https://software.intel.com/en-us/articles/vector-simd-function-abi>. >>>> Vector Function ABI for AArch64: >>>> https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi >>>> >>>> [^3]: <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> _______________________________________________ >>> cfe-dev mailing list >>> cfe-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Possibly Parallel Threads
- [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.
- [RFC] Expose user provided vector function for auto-vectorization.
- [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.
- [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.
- [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.