Finkel, Hal J. via llvm-dev
2019-May-29 19:16 UTC
[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.
On 5/29/19 1:52 PM, Philip Reames wrote:> On 5/28/19 7:55 PM, Finkel, Hal J. wrote: >> On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote: >>> I generally like the idea of having support in IR for vectorization of >>> custom functions. I have several use cases which would benefit from this. >>> >>> I'd suggest a couple of reframings to the IR representation though. >>> >>> First, this should probably be specified as metadata/attribute on a >>> function declaration. Allowing the callsite variant is fine, but it >>> should primarily be a property of the called function, not of the call >>> site. Being able to specify it once per declaration is much cleaner. >> I agree. We should support this both on the function declaration and on >> the call sites. >> >> >>> Second, I really don't like the mangling use here. We need a better way >>> to specify the properties of the function then it's mangled name. One >>> thought to explore is to directly use the Value of the function >>> declaration (since this is metadata and we can do that), and then tie >>> the properties to the function declaration in some way? Sorry, I don't >>> really have a specific suggestion here. >> Is the problem the mangling or the fact that the mangling is >> ABI/target-specific? One option is to use LLVM's mangling scheme (the >> one we use for intrinsics) and then provide some backend infrastructure >> to translate later. > Well, both honestly. But mangling with a non-target specific scheme is > a lot better, so I might be okay with that. Good idea.I liked your idea of directly encoding the signature in the metadata, but I think that we want to continue to use attributes, and not metadata, and the options for attributes seem more limited - unless we allow attributes to take metadata arguments - maybe that's an enhancement worth considering. -Hal>> >> -Hal >> >> >>> Philip >>> >>> On 5/28/19 12:44 PM, Francesco Petrogalli via llvm-dev wrote: >>>> Dear all, >>>> >>>> This RFC is a proposal to provide auto-vectorization functionality for user provided vector functions. >>>> >>>> The proposal is a modification of an RFC that I have sent out a couple of months ago, with the title `[RFC] Re-implementing -fveclib with OpenMP` (see http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The previous RFC is to be considered abandoned. >>>> >>>> The original RFC was proposing to re-implement the `-fveclib` command line option. This proposal avoids that, and limits its scope to the mechanics of providing vector function in user code that the compiler can pick up for auto-vectorization. This narrower scope limits the impact of changes that are needed in both clang and LLVM. >>>> >>>> Please let me know what you think. >>>> >>>> Kind regards, >>>> >>>> Francesco >>>> >>>> >>>> ================================================================================>>>> >>>> Introduction >>>> ===========>>>> >>>> This RFC encompasses the proposal of informing the vectorizer about the >>>> availability of vector functions provided by the user. The mechanism is >>>> based on the use of the directive `declare variant` introduced in OpenMP >>>> 5.0 [^1]. >>>> >>>> The mechanism proposed has the following properties: >>>> >>>> 1. Decouples the compiler front-end that knows about the availability >>>> of vectorized routines, from the back-end that knows how to make use >>>> of them. >>>> 2. Enable support for a developer's own vector libraries without >>>> requiring changes to the compiler. >>>> 3. Enables other frontends (e.g. f18) to add scalar-to-vector function >>>> mappings as relevant for their own runtime libraries, etc. >>>> >>>> The implemetation consists of two separate sets of changes. >>>> >>>> The first set is a set o changes in `llvm`, and consists of: >>>> >>>> 1. [Changes in LLVM IR](#llvmIR) to provide information about the >>>> availability of user-defined vector functions via metadata attached >>>> to an `llvm::CallInst`. >>>> 2. [An infrastructure](#infrastructure) that can be queried to retrive >>>> information about the available vector functions associated to a >>>> `llvm::CallInst`. >>>> 3. [Changes in the LoopVectorizer](#LV) to use the API to query the >>>> metadata. >>>> >>>> The second set consists of the changes [changes in clang](#clang) that >>>> are needed too to recognize the `#pragma clang declare variant` >>>> directive. >>>> >>>> Proposed changes >>>> ===============>>>> >>>> We propose an implementation that uses `#pragma clang declare variant` >>>> to inform the backend components about the availability of vector >>>> version of scalar functions found in IR. The mechanism relies in storing >>>> such information in IR metadata, and therefore makes the >>>> auto-vectorization of function calls a mid-end (`opt`) process that is >>>> independent on the front-end that generated such IR metadata. >>>> >>>> This implementation provides a generic mechanism that the users of the >>>> LLVM compiler will be able to use for interfacing their own vector >>>> routines for generic code. >>>> >>>> The implementation can also expose vectorization-specific descriptors -- >>>> for example, like the `linear` and `uniform` clauses of the OpenMP >>>> `declare simd` directive -- that could be used to finely tune the >>>> automatic vectorization of some functions (think for example the >>>> vectorization of `double sincos(double , double *, double *)`, where >>>> `linear` can be used to give extra information about the memory layout >>>> of the 2 pointers parameters in the vector version). >>>> >>>> The directive `#pragma clang declare variant` follows the syntax of the >>>> `#pragma omp declare variant` directive of OpenMP. >>>> >>>> We define the new directive in the `clang` namespace instead of using >>>> the `omp` one of OpenMP to allow the compiler to perform >>>> auto-vectorization outside of an OpenMP SIMD context. >>>> >>>> The mechanism is base on OpenMP to provide a uniform user experience >>>> across the two mechanism, and to maximise the number of shared >>>> components of the infrastructure needed in the compiler frontend to >>>> enable the feature. >>>> >>>> Changes in LLVM IR {#llvmIR} >>>> ------------------ >>>> >>>> The IR is enriched with metadata that details the availability of vector >>>> versions of an associated scalar function. This metadata is attached to >>>> the call site of the scalar function. >>>> >>>> The metadata takes the form of an attribute containing a comma separated >>>> list of vector function mappings. Each entry has a unique name that >>>> follows the Vector Function ABI[^2] and real name that is used when >>>> generating calls to this vector function. >>>> >>>> vfunc_name1(real_name1), vfunc_name2(real_name2) >>>> >>>> The Vector Function ABI name describes the signature of the vector >>>> function so that properties like vectorisation factor can be queried >>>> during compilation. >>>> >>>> The `(real name)` token is optional and assumed to match the Vector >>>> Function ABI name when omitted. >>>> >>>> For example, the availability of a 2-lane double precision `sin` >>>> function via SVML when targeting AVX on x86 is provided by the following >>>> IR. >>>> >>>> // ... >>>> ... = call double @sin(double) #0 >>>> // ... >>>> >>>> #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2), >>>> _ZGVdN4v_sin(__svml_sin4), >>>> ..."} } >>>> >>>> The string `"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant >>>> attribute provides information on the shape of the vector function via >>>> the string `_ZGVcN2v_sin`, mangled according to the Vector Function ABI >>>> for Intel, and remaps the standard Vector Function ABI name to the >>>> non-standard name `__svml_sin2`. >>>> >>>> This metadata is compatible with the proposal "Proposal for function >>>> vectorization and loop vectorization with function calls",[^3] that uses >>>> Vector Function ABI mangled names to inform the vectorizer about the >>>> availability of vector functions. The proposal extends the original by >>>> allowing the explicit mapping of the Vector Function ABI mangled name to >>>> a non-standard name, which allows the use of existing vector libraries. >>>> >>>> The `vector-variant` attribute needs to be attached on a per-call basis >>>> to avoid conflicts when merging modules with different vector variants. >>>> >>>> The query infrastructure: SVFS {#infrastructure} >>>> ------------------------------ >>>> >>>> The Search Vector Function System (SVFS) is constructed from an >>>> `llvm::Module` instance so it can create function definitions. The SVFS >>>> exposes an API with two methods. >>>> >>>> ### `SVFS::isFunctionVectorizable` >>>> >>>> This method queries the avilability of a vectorized version of a >>>> function. The signature of the method is as follows. >>>> >>>> bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap Params); >>>> >>>> The method determine the availability of vector version of the function >>>> invoked by the `Call` parameter by looking at the `vector-variant` >>>> metadata. >>>> >>>> The `Params` argument is a map that associates the position of a >>>> parameter in the `CallInst` to its `ParameterType` descriptor. The >>>> `ParameterType` descriptor holds information about the shape of the >>>> correspondend parameter in the signature of the vector function. This >>>> `ParamaterType` is used to query the SVMS about the availability of >>>> vector version that have `linear`, `uniform` or `align` parameters (in >>>> the sense of OpenMP 4.0 and onwards). >>>> >>>> The method `isFunctionVectorizable`, when invoked with an empty >>>> `ParTypeMap`, is equivalent to the `TargetLibraryInfo` method >>>> `isFunctionVectorizable(StrinRef Name)`. >>>> >>>> ### `SVFS::getVectorizedFunction` >>>> >>>> This method returns the vector function declaration that correspond to >>>> the needs of the vectorization technique that is being run. >>>> >>>> The signature of the function is as follows. >>>> >>>> std::pair<llvm::FunctionType *, std::string> getVectorizedFunction( >>>> llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params); >>>> >>>> The `Call` parameter is the call instance that is being vectorized, the >>>> `VF` parameter represent the vectorization factor (how many lanes), the >>>> `IsMasked` parameter decides whether or not the signature of the vector >>>> function is required to have a mask parameter, the `Params` parameter >>>> describes the shape of the vector function as in the >>>> `isFunctionVectorizable` method. >>>> >>>> The methods uses the `vector-variant` metadata and returns the function >>>> signature and the name of the function based on the input parameters. >>>> >>>> The SVFS can add new function definitions, in the same module as the >>>> `Call`, to provide vector functions that are not present within the >>>> vector-variant metadata. For example, if a library provides a vector >>>> version of a function with a vectorization factor of 2, but the >>>> vectorizer is requesting a vectorization factor of 4, the SVFS is >>>> allowed to create a definition that calls the 2-lane version twice. This >>>> capability applies similarly for providing masked and unmasked versions >>>> when the request does not match what is available in the library. >>>> >>>> This method is equivalent to the TLI method >>>> `StringRef getVectorizedFunction(StringRef F, unsigned VF) const;`. >>>> >>>> Notice that to fully support OpenMP vectorization we need to think about >>>> a fuzzy matching mechanism that is able to select a candidate in the >>>> calling context. However, this proposal is intended for scalar-to-vector >>>> mappings of math-like functions that are most likely to associate a >>>> unique vector candidate in most contexts. Therefore, extending this >>>> behavior to a generic one is an aspect of the implementation that will >>>> be treated in a separate RFC about the vectorization pass. >>>> >>>> ### Scalable vectorization >>>> >>>> Both methods of the SVFS API will be extended with a boolean parameter >>>> to specify whether scalable signatures are needed by the user of the >>>> SVFS. >>>> >>>> Changes in clang {#clang} >>>> ---------------- >>>> >>>> We use clang to generate the metadata described above. >>>> >>>> In the compilation unit, the vector function definition or declaration >>>> must be visible and associated to the scalar version via the >>>> `#pragma clang declare variant` according to the rule defined by the >>>> correspondent `#pragma omp declare variant` defined in OpenMP 5.0, as in >>>> the following example. >>>> >>>> #pragma clang declare variant(vector_sinf) \ >>>> match(construct=simd(simdlen(4),notinbranch), device={isa("simd")}) >>>> extern float sinf(float); >>>> >>>> float32x4_t vector_sinf(float32x4_t x); >>>> >>>> The `construct` set in the directive, together with the `device` set, is >>>> used to generate the vector mangled name to be used in the >>>> `vector-variant` attribute, for example `_ZGVnN2v_sin`, when targeting >>>> AArch64 Advanced SIMD code generation. The rule for mangling the name of >>>> the scalar function in the vector name are defined in the the Vector >>>> Function ABI specification of the target. >>>> >>>> The part of the vector-variant attribute that redirects the call to >>>> `vector_sinf` is derived from the `variant-id` specified in the >>>> `variant` clause. >>>> >>>> Summary >>>> ======>>>> >>>> New `clang` directive in clang >>>> ------------------------------ >>>> >>>> `#pragma omp declare variant`, same as `#pragma omp declare variant` >>>> restricted to the `simd` context selector, from OpenMP 5.0+. >>>> >>>> Option behavior, and interaction with OpenMP >>>> -------------------------------------------- >>>> >>>> The behavior described below makes sure that >>>> `#pragma cland declare variant` function vectorization and OpenMP >>>> function vectorization are orthogonal. >>>> >>>> `-fclang-declare-variant` >>>> >>>> : The `#pragma clang declare variant` directives are parsed and used >>>> to populate the `vector-variant` attribute. >>>> >>>> `-fopenmp[-simd]` >>>> >>>> : The `#pragma omp declare variant` directives are parsed and used to >>>> populate the `vector-variant` attribute. >>>> >>>> `-fopenmp[-simd]`and `-fno-clang-declare-variant` >>>> >>>> : The directive `#pragma omp declare variant` is used to populate the >>>> `vector-variant` attribute in IR. The directive >>>> `#pragma clang declare variant` are ignored. >>>> >>>> [^1]: <https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf> >>>> >>>> [^2]: Vector Function ABI for x86: >>>> <https://software.intel.com/en-us/articles/vector-simd-function-abi>. >>>> Vector Function ABI for AArch64: >>>> https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi >>>> >>>> [^3]: <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> _______________________________________________ >>> cfe-dev mailing list >>> cfe-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Doerfert, Johannes via llvm-dev
2019-May-30 16:05 UTC
[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.
On 05/29, Finkel, Hal J. via cfe-dev wrote:> On 5/29/19 1:52 PM, Philip Reames wrote: > > On 5/28/19 7:55 PM, Finkel, Hal J. wrote: > >> On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote: > >>> I generally like the idea of having support in IR for vectorization of > >>> custom functions. I have several use cases which would benefit from this. > >>> > >>> I'd suggest a couple of reframings to the IR representation though. > >>> > >>> First, this should probably be specified as metadata/attribute on a > >>> function declaration. Allowing the callsite variant is fine, but it > >>> should primarily be a property of the called function, not of the call > >>> site. Being able to specify it once per declaration is much cleaner. > >> I agree. We should support this both on the function declaration and on > >> the call sites. > >> > >> > >>> Second, I really don't like the mangling use here. We need a better way > >>> to specify the properties of the function then it's mangled name. One > >>> thought to explore is to directly use the Value of the function > >>> declaration (since this is metadata and we can do that), and then tie > >>> the properties to the function declaration in some way? Sorry, I don't > >>> really have a specific suggestion here. > >> Is the problem the mangling or the fact that the mangling is > >> ABI/target-specific? One option is to use LLVM's mangling scheme (the > >> one we use for intrinsics) and then provide some backend infrastructure > >> to translate later. > > Well, both honestly. But mangling with a non-target specific scheme is > > a lot better, so I might be okay with that. Good idea. > > > I liked your idea of directly encoding the signature in the metadata, > but I think that we want to continue to use attributes, and not > metadata, and the options for attributes seem more limited - unless we > allow attributes to take metadata arguments - maybe that's an > enhancement worth considering.I recently talked to people in the OpenMP language committee meeting about this and, thinking forward to the actual implementation/use of the OpenMP 5.x declare variant feature, I'd say: - We will need a mangling scheme if we want to allow variants on declarations that are defined elsewhere. - We will need a (OpenMP) standardized mangling scheme if we want interoperability between compilers. I assume we want both so I think we will need both. That said, I think this should allow us to avoid attributes/metadata which seems to me like a good thing right now. Cheers, Johannes> >>> On 5/28/19 12:44 PM, Francesco Petrogalli via llvm-dev wrote: > >>>> Dear all, > >>>> > >>>> This RFC is a proposal to provide auto-vectorization functionality for user provided vector functions. > >>>> > >>>> The proposal is a modification of an RFC that I have sent out a couple of months ago, with the title `[RFC] Re-implementing -fveclib with OpenMP` (see http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The previous RFC is to be considered abandoned. > >>>> > >>>> The original RFC was proposing to re-implement the `-fveclib` command line option. This proposal avoids that, and limits its scope to the mechanics of providing vector function in user code that the compiler can pick up for auto-vectorization. This narrower scope limits the impact of changes that are needed in both clang and LLVM. > >>>> > >>>> Please let me know what you think. > >>>> > >>>> Kind regards, > >>>> > >>>> Francesco > >>>> > >>>> > >>>> ================================================================================> >>>> > >>>> Introduction > >>>> ===========> >>>> > >>>> This RFC encompasses the proposal of informing the vectorizer about the > >>>> availability of vector functions provided by the user. The mechanism is > >>>> based on the use of the directive `declare variant` introduced in OpenMP > >>>> 5.0 [^1]. > >>>> > >>>> The mechanism proposed has the following properties: > >>>> > >>>> 1. Decouples the compiler front-end that knows about the availability > >>>> of vectorized routines, from the back-end that knows how to make use > >>>> of them. > >>>> 2. Enable support for a developer's own vector libraries without > >>>> requiring changes to the compiler. > >>>> 3. Enables other frontends (e.g. f18) to add scalar-to-vector function > >>>> mappings as relevant for their own runtime libraries, etc. > >>>> > >>>> The implemetation consists of two separate sets of changes. > >>>> > >>>> The first set is a set o changes in `llvm`, and consists of: > >>>> > >>>> 1. [Changes in LLVM IR](#llvmIR) to provide information about the > >>>> availability of user-defined vector functions via metadata attached > >>>> to an `llvm::CallInst`. > >>>> 2. [An infrastructure](#infrastructure) that can be queried to retrive > >>>> information about the available vector functions associated to a > >>>> `llvm::CallInst`. > >>>> 3. [Changes in the LoopVectorizer](#LV) to use the API to query the > >>>> metadata. > >>>> > >>>> The second set consists of the changes [changes in clang](#clang) that > >>>> are needed too to recognize the `#pragma clang declare variant` > >>>> directive. > >>>> > >>>> Proposed changes > >>>> ===============> >>>> > >>>> We propose an implementation that uses `#pragma clang declare variant` > >>>> to inform the backend components about the availability of vector > >>>> version of scalar functions found in IR. The mechanism relies in storing > >>>> such information in IR metadata, and therefore makes the > >>>> auto-vectorization of function calls a mid-end (`opt`) process that is > >>>> independent on the front-end that generated such IR metadata. > >>>> > >>>> This implementation provides a generic mechanism that the users of the > >>>> LLVM compiler will be able to use for interfacing their own vector > >>>> routines for generic code. > >>>> > >>>> The implementation can also expose vectorization-specific descriptors -- > >>>> for example, like the `linear` and `uniform` clauses of the OpenMP > >>>> `declare simd` directive -- that could be used to finely tune the > >>>> automatic vectorization of some functions (think for example the > >>>> vectorization of `double sincos(double , double *, double *)`, where > >>>> `linear` can be used to give extra information about the memory layout > >>>> of the 2 pointers parameters in the vector version). > >>>> > >>>> The directive `#pragma clang declare variant` follows the syntax of the > >>>> `#pragma omp declare variant` directive of OpenMP. > >>>> > >>>> We define the new directive in the `clang` namespace instead of using > >>>> the `omp` one of OpenMP to allow the compiler to perform > >>>> auto-vectorization outside of an OpenMP SIMD context. > >>>> > >>>> The mechanism is base on OpenMP to provide a uniform user experience > >>>> across the two mechanism, and to maximise the number of shared > >>>> components of the infrastructure needed in the compiler frontend to > >>>> enable the feature. > >>>> > >>>> Changes in LLVM IR {#llvmIR} > >>>> ------------------ > >>>> > >>>> The IR is enriched with metadata that details the availability of vector > >>>> versions of an associated scalar function. This metadata is attached to > >>>> the call site of the scalar function. > >>>> > >>>> The metadata takes the form of an attribute containing a comma separated > >>>> list of vector function mappings. Each entry has a unique name that > >>>> follows the Vector Function ABI[^2] and real name that is used when > >>>> generating calls to this vector function. > >>>> > >>>> vfunc_name1(real_name1), vfunc_name2(real_name2) > >>>> > >>>> The Vector Function ABI name describes the signature of the vector > >>>> function so that properties like vectorisation factor can be queried > >>>> during compilation. > >>>> > >>>> The `(real name)` token is optional and assumed to match the Vector > >>>> Function ABI name when omitted. > >>>> > >>>> For example, the availability of a 2-lane double precision `sin` > >>>> function via SVML when targeting AVX on x86 is provided by the following > >>>> IR. > >>>> > >>>> // ... > >>>> ... = call double @sin(double) #0 > >>>> // ... > >>>> > >>>> #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2), > >>>> _ZGVdN4v_sin(__svml_sin4), > >>>> ..."} } > >>>> > >>>> The string `"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant > >>>> attribute provides information on the shape of the vector function via > >>>> the string `_ZGVcN2v_sin`, mangled according to the Vector Function ABI > >>>> for Intel, and remaps the standard Vector Function ABI name to the > >>>> non-standard name `__svml_sin2`. > >>>> > >>>> This metadata is compatible with the proposal "Proposal for function > >>>> vectorization and loop vectorization with function calls",[^3] that uses > >>>> Vector Function ABI mangled names to inform the vectorizer about the > >>>> availability of vector functions. The proposal extends the original by > >>>> allowing the explicit mapping of the Vector Function ABI mangled name to > >>>> a non-standard name, which allows the use of existing vector libraries. > >>>> > >>>> The `vector-variant` attribute needs to be attached on a per-call basis > >>>> to avoid conflicts when merging modules with different vector variants. > >>>> > >>>> The query infrastructure: SVFS {#infrastructure} > >>>> ------------------------------ > >>>> > >>>> The Search Vector Function System (SVFS) is constructed from an > >>>> `llvm::Module` instance so it can create function definitions. The SVFS > >>>> exposes an API with two methods. > >>>> > >>>> ### `SVFS::isFunctionVectorizable` > >>>> > >>>> This method queries the avilability of a vectorized version of a > >>>> function. The signature of the method is as follows. > >>>> > >>>> bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap Params); > >>>> > >>>> The method determine the availability of vector version of the function > >>>> invoked by the `Call` parameter by looking at the `vector-variant` > >>>> metadata. > >>>> > >>>> The `Params` argument is a map that associates the position of a > >>>> parameter in the `CallInst` to its `ParameterType` descriptor. The > >>>> `ParameterType` descriptor holds information about the shape of the > >>>> correspondend parameter in the signature of the vector function. This > >>>> `ParamaterType` is used to query the SVMS about the availability of > >>>> vector version that have `linear`, `uniform` or `align` parameters (in > >>>> the sense of OpenMP 4.0 and onwards). > >>>> > >>>> The method `isFunctionVectorizable`, when invoked with an empty > >>>> `ParTypeMap`, is equivalent to the `TargetLibraryInfo` method > >>>> `isFunctionVectorizable(StrinRef Name)`. > >>>> > >>>> ### `SVFS::getVectorizedFunction` > >>>> > >>>> This method returns the vector function declaration that correspond to > >>>> the needs of the vectorization technique that is being run. > >>>> > >>>> The signature of the function is as follows. > >>>> > >>>> std::pair<llvm::FunctionType *, std::string> getVectorizedFunction( > >>>> llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params); > >>>> > >>>> The `Call` parameter is the call instance that is being vectorized, the > >>>> `VF` parameter represent the vectorization factor (how many lanes), the > >>>> `IsMasked` parameter decides whether or not the signature of the vector > >>>> function is required to have a mask parameter, the `Params` parameter > >>>> describes the shape of the vector function as in the > >>>> `isFunctionVectorizable` method. > >>>> > >>>> The methods uses the `vector-variant` metadata and returns the function > >>>> signature and the name of the function based on the input parameters. > >>>> > >>>> The SVFS can add new function definitions, in the same module as the > >>>> `Call`, to provide vector functions that are not present within the > >>>> vector-variant metadata. For example, if a library provides a vector > >>>> version of a function with a vectorization factor of 2, but the > >>>> vectorizer is requesting a vectorization factor of 4, the SVFS is > >>>> allowed to create a definition that calls the 2-lane version twice. This > >>>> capability applies similarly for providing masked and unmasked versions > >>>> when the request does not match what is available in the library. > >>>> > >>>> This method is equivalent to the TLI method > >>>> `StringRef getVectorizedFunction(StringRef F, unsigned VF) const;`. > >>>> > >>>> Notice that to fully support OpenMP vectorization we need to think about > >>>> a fuzzy matching mechanism that is able to select a candidate in the > >>>> calling context. However, this proposal is intended for scalar-to-vector > >>>> mappings of math-like functions that are most likely to associate a > >>>> unique vector candidate in most contexts. Therefore, extending this > >>>> behavior to a generic one is an aspect of the implementation that will > >>>> be treated in a separate RFC about the vectorization pass. > >>>> > >>>> ### Scalable vectorization > >>>> > >>>> Both methods of the SVFS API will be extended with a boolean parameter > >>>> to specify whether scalable signatures are needed by the user of the > >>>> SVFS. > >>>> > >>>> Changes in clang {#clang} > >>>> ---------------- > >>>> > >>>> We use clang to generate the metadata described above. > >>>> > >>>> In the compilation unit, the vector function definition or declaration > >>>> must be visible and associated to the scalar version via the > >>>> `#pragma clang declare variant` according to the rule defined by the > >>>> correspondent `#pragma omp declare variant` defined in OpenMP 5.0, as in > >>>> the following example. > >>>> > >>>> #pragma clang declare variant(vector_sinf) \ > >>>> match(construct=simd(simdlen(4),notinbranch), device={isa("simd")}) > >>>> extern float sinf(float); > >>>> > >>>> float32x4_t vector_sinf(float32x4_t x); > >>>> > >>>> The `construct` set in the directive, together with the `device` set, is > >>>> used to generate the vector mangled name to be used in the > >>>> `vector-variant` attribute, for example `_ZGVnN2v_sin`, when targeting > >>>> AArch64 Advanced SIMD code generation. The rule for mangling the name of > >>>> the scalar function in the vector name are defined in the the Vector > >>>> Function ABI specification of the target. > >>>> > >>>> The part of the vector-variant attribute that redirects the call to > >>>> `vector_sinf` is derived from the `variant-id` specified in the > >>>> `variant` clause. > >>>> > >>>> Summary > >>>> ======> >>>> > >>>> New `clang` directive in clang > >>>> ------------------------------ > >>>> > >>>> `#pragma omp declare variant`, same as `#pragma omp declare variant` > >>>> restricted to the `simd` context selector, from OpenMP 5.0+. > >>>> > >>>> Option behavior, and interaction with OpenMP > >>>> -------------------------------------------- > >>>> > >>>> The behavior described below makes sure that > >>>> `#pragma cland declare variant` function vectorization and OpenMP > >>>> function vectorization are orthogonal. > >>>> > >>>> `-fclang-declare-variant` > >>>> > >>>> : The `#pragma clang declare variant` directives are parsed and used > >>>> to populate the `vector-variant` attribute. > >>>> > >>>> `-fopenmp[-simd]` > >>>> > >>>> : The `#pragma omp declare variant` directives are parsed and used to > >>>> populate the `vector-variant` attribute. > >>>> > >>>> `-fopenmp[-simd]`and `-fno-clang-declare-variant` > >>>> > >>>> : The directive `#pragma omp declare variant` is used to populate the > >>>> `vector-variant` attribute in IR. The directive > >>>> `#pragma clang declare variant` are ignored. > >>>> > >>>> [^1]: <https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf> > >>>> > >>>> [^2]: Vector Function ABI for x86: > >>>> <https://software.intel.com/en-us/articles/vector-simd-function-abi>. > >>>> Vector Function ABI for AArch64: > >>>> https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi > >>>> > >>>> [^3]: <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html> > >>>> > >>>> _______________________________________________ > >>>> LLVM Developers mailing list > >>>> llvm-dev at lists.llvm.org > >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>> _______________________________________________ > >>> cfe-dev mailing list > >>> cfe-dev at lists.llvm.org > >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190530/2a1e1420/attachment.sig>
Philip Reames via llvm-dev
2019-May-30 17:53 UTC
[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.
On 5/30/19 9:05 AM, Doerfert, Johannes wrote:> On 05/29, Finkel, Hal J. via cfe-dev wrote: >> On 5/29/19 1:52 PM, Philip Reames wrote: >>> On 5/28/19 7:55 PM, Finkel, Hal J. wrote: >>>> On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote: >>>>> I generally like the idea of having support in IR for vectorization of >>>>> custom functions. I have several use cases which would benefit from this. >>>>> >>>>> I'd suggest a couple of reframings to the IR representation though. >>>>> >>>>> First, this should probably be specified as metadata/attribute on a >>>>> function declaration. Allowing the callsite variant is fine, but it >>>>> should primarily be a property of the called function, not of the call >>>>> site. Being able to specify it once per declaration is much cleaner. >>>> I agree. We should support this both on the function declaration and on >>>> the call sites. >>>> >>>> >>>>> Second, I really don't like the mangling use here. We need a better way >>>>> to specify the properties of the function then it's mangled name. One >>>>> thought to explore is to directly use the Value of the function >>>>> declaration (since this is metadata and we can do that), and then tie >>>>> the properties to the function declaration in some way? Sorry, I don't >>>>> really have a specific suggestion here. >>>> Is the problem the mangling or the fact that the mangling is >>>> ABI/target-specific? One option is to use LLVM's mangling scheme (the >>>> one we use for intrinsics) and then provide some backend infrastructure >>>> to translate later. >>> Well, both honestly. But mangling with a non-target specific scheme is >>> a lot better, so I might be okay with that. Good idea. >> >> I liked your idea of directly encoding the signature in the metadata, >> but I think that we want to continue to use attributes, and not >> metadata, and the options for attributes seem more limited - unless we >> allow attributes to take metadata arguments - maybe that's an >> enhancement worth considering. > I recently talked to people in the OpenMP language committee meeting > about this and, thinking forward to the actual implementation/use of the > OpenMP 5.x declare variant feature, I'd say: > > - We will need a mangling scheme if we want to allow variants on > declarations that are defined elsewhere. > - We will need a (OpenMP) standardized mangling scheme if we want > interoperability between compilers. > > I assume we want both so I think we will need both.If I'm reading this correctly, this describes a need for the frontend to have a mangling scheme. Nothing in here would seem to prevent the frontend for generating a declaration for a mangled external symbol and then referencing that declaration. Am I missing something?> > That said, I think this should allow us to avoid attributes/metadata > which seems to me like a good thing right now. > > Cheers, > Johannes > > >>>>> On 5/28/19 12:44 PM, Francesco Petrogalli via llvm-dev wrote: >>>>>> Dear all, >>>>>> >>>>>> This RFC is a proposal to provide auto-vectorization functionality for user provided vector functions. >>>>>> >>>>>> The proposal is a modification of an RFC that I have sent out a couple of months ago, with the title `[RFC] Re-implementing -fveclib with OpenMP` (see http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The previous RFC is to be considered abandoned. >>>>>> >>>>>> The original RFC was proposing to re-implement the `-fveclib` command line option. This proposal avoids that, and limits its scope to the mechanics of providing vector function in user code that the compiler can pick up for auto-vectorization. This narrower scope limits the impact of changes that are needed in both clang and LLVM. >>>>>> >>>>>> Please let me know what you think. >>>>>> >>>>>> Kind regards, >>>>>> >>>>>> Francesco >>>>>> >>>>>> >>>>>> ================================================================================>>>>>> >>>>>> Introduction >>>>>> ===========>>>>>> >>>>>> This RFC encompasses the proposal of informing the vectorizer about the >>>>>> availability of vector functions provided by the user. The mechanism is >>>>>> based on the use of the directive `declare variant` introduced in OpenMP >>>>>> 5.0 [^1]. >>>>>> >>>>>> The mechanism proposed has the following properties: >>>>>> >>>>>> 1. Decouples the compiler front-end that knows about the availability >>>>>> of vectorized routines, from the back-end that knows how to make use >>>>>> of them. >>>>>> 2. Enable support for a developer's own vector libraries without >>>>>> requiring changes to the compiler. >>>>>> 3. Enables other frontends (e.g. f18) to add scalar-to-vector function >>>>>> mappings as relevant for their own runtime libraries, etc. >>>>>> >>>>>> The implemetation consists of two separate sets of changes. >>>>>> >>>>>> The first set is a set o changes in `llvm`, and consists of: >>>>>> >>>>>> 1. [Changes in LLVM IR](#llvmIR) to provide information about the >>>>>> availability of user-defined vector functions via metadata attached >>>>>> to an `llvm::CallInst`. >>>>>> 2. [An infrastructure](#infrastructure) that can be queried to retrive >>>>>> information about the available vector functions associated to a >>>>>> `llvm::CallInst`. >>>>>> 3. [Changes in the LoopVectorizer](#LV) to use the API to query the >>>>>> metadata. >>>>>> >>>>>> The second set consists of the changes [changes in clang](#clang) that >>>>>> are needed too to recognize the `#pragma clang declare variant` >>>>>> directive. >>>>>> >>>>>> Proposed changes >>>>>> ===============>>>>>> >>>>>> We propose an implementation that uses `#pragma clang declare variant` >>>>>> to inform the backend components about the availability of vector >>>>>> version of scalar functions found in IR. The mechanism relies in storing >>>>>> such information in IR metadata, and therefore makes the >>>>>> auto-vectorization of function calls a mid-end (`opt`) process that is >>>>>> independent on the front-end that generated such IR metadata. >>>>>> >>>>>> This implementation provides a generic mechanism that the users of the >>>>>> LLVM compiler will be able to use for interfacing their own vector >>>>>> routines for generic code. >>>>>> >>>>>> The implementation can also expose vectorization-specific descriptors -- >>>>>> for example, like the `linear` and `uniform` clauses of the OpenMP >>>>>> `declare simd` directive -- that could be used to finely tune the >>>>>> automatic vectorization of some functions (think for example the >>>>>> vectorization of `double sincos(double , double *, double *)`, where >>>>>> `linear` can be used to give extra information about the memory layout >>>>>> of the 2 pointers parameters in the vector version). >>>>>> >>>>>> The directive `#pragma clang declare variant` follows the syntax of the >>>>>> `#pragma omp declare variant` directive of OpenMP. >>>>>> >>>>>> We define the new directive in the `clang` namespace instead of using >>>>>> the `omp` one of OpenMP to allow the compiler to perform >>>>>> auto-vectorization outside of an OpenMP SIMD context. >>>>>> >>>>>> The mechanism is base on OpenMP to provide a uniform user experience >>>>>> across the two mechanism, and to maximise the number of shared >>>>>> components of the infrastructure needed in the compiler frontend to >>>>>> enable the feature. >>>>>> >>>>>> Changes in LLVM IR {#llvmIR} >>>>>> ------------------ >>>>>> >>>>>> The IR is enriched with metadata that details the availability of vector >>>>>> versions of an associated scalar function. This metadata is attached to >>>>>> the call site of the scalar function. >>>>>> >>>>>> The metadata takes the form of an attribute containing a comma separated >>>>>> list of vector function mappings. Each entry has a unique name that >>>>>> follows the Vector Function ABI[^2] and real name that is used when >>>>>> generating calls to this vector function. >>>>>> >>>>>> vfunc_name1(real_name1), vfunc_name2(real_name2) >>>>>> >>>>>> The Vector Function ABI name describes the signature of the vector >>>>>> function so that properties like vectorisation factor can be queried >>>>>> during compilation. >>>>>> >>>>>> The `(real name)` token is optional and assumed to match the Vector >>>>>> Function ABI name when omitted. >>>>>> >>>>>> For example, the availability of a 2-lane double precision `sin` >>>>>> function via SVML when targeting AVX on x86 is provided by the following >>>>>> IR. >>>>>> >>>>>> // ... >>>>>> ... = call double @sin(double) #0 >>>>>> // ... >>>>>> >>>>>> #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2), >>>>>> _ZGVdN4v_sin(__svml_sin4), >>>>>> ..."} } >>>>>> >>>>>> The string `"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant >>>>>> attribute provides information on the shape of the vector function via >>>>>> the string `_ZGVcN2v_sin`, mangled according to the Vector Function ABI >>>>>> for Intel, and remaps the standard Vector Function ABI name to the >>>>>> non-standard name `__svml_sin2`. >>>>>> >>>>>> This metadata is compatible with the proposal "Proposal for function >>>>>> vectorization and loop vectorization with function calls",[^3] that uses >>>>>> Vector Function ABI mangled names to inform the vectorizer about the >>>>>> availability of vector functions. The proposal extends the original by >>>>>> allowing the explicit mapping of the Vector Function ABI mangled name to >>>>>> a non-standard name, which allows the use of existing vector libraries. >>>>>> >>>>>> The `vector-variant` attribute needs to be attached on a per-call basis >>>>>> to avoid conflicts when merging modules with different vector variants. >>>>>> >>>>>> The query infrastructure: SVFS {#infrastructure} >>>>>> ------------------------------ >>>>>> >>>>>> The Search Vector Function System (SVFS) is constructed from an >>>>>> `llvm::Module` instance so it can create function definitions. The SVFS >>>>>> exposes an API with two methods. >>>>>> >>>>>> ### `SVFS::isFunctionVectorizable` >>>>>> >>>>>> This method queries the avilability of a vectorized version of a >>>>>> function. The signature of the method is as follows. >>>>>> >>>>>> bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap Params); >>>>>> >>>>>> The method determine the availability of vector version of the function >>>>>> invoked by the `Call` parameter by looking at the `vector-variant` >>>>>> metadata. >>>>>> >>>>>> The `Params` argument is a map that associates the position of a >>>>>> parameter in the `CallInst` to its `ParameterType` descriptor. The >>>>>> `ParameterType` descriptor holds information about the shape of the >>>>>> correspondend parameter in the signature of the vector function. This >>>>>> `ParamaterType` is used to query the SVMS about the availability of >>>>>> vector version that have `linear`, `uniform` or `align` parameters (in >>>>>> the sense of OpenMP 4.0 and onwards). >>>>>> >>>>>> The method `isFunctionVectorizable`, when invoked with an empty >>>>>> `ParTypeMap`, is equivalent to the `TargetLibraryInfo` method >>>>>> `isFunctionVectorizable(StrinRef Name)`. >>>>>> >>>>>> ### `SVFS::getVectorizedFunction` >>>>>> >>>>>> This method returns the vector function declaration that correspond to >>>>>> the needs of the vectorization technique that is being run. >>>>>> >>>>>> The signature of the function is as follows. >>>>>> >>>>>> std::pair<llvm::FunctionType *, std::string> getVectorizedFunction( >>>>>> llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params); >>>>>> >>>>>> The `Call` parameter is the call instance that is being vectorized, the >>>>>> `VF` parameter represent the vectorization factor (how many lanes), the >>>>>> `IsMasked` parameter decides whether or not the signature of the vector >>>>>> function is required to have a mask parameter, the `Params` parameter >>>>>> describes the shape of the vector function as in the >>>>>> `isFunctionVectorizable` method. >>>>>> >>>>>> The methods uses the `vector-variant` metadata and returns the function >>>>>> signature and the name of the function based on the input parameters. >>>>>> >>>>>> The SVFS can add new function definitions, in the same module as the >>>>>> `Call`, to provide vector functions that are not present within the >>>>>> vector-variant metadata. For example, if a library provides a vector >>>>>> version of a function with a vectorization factor of 2, but the >>>>>> vectorizer is requesting a vectorization factor of 4, the SVFS is >>>>>> allowed to create a definition that calls the 2-lane version twice. This >>>>>> capability applies similarly for providing masked and unmasked versions >>>>>> when the request does not match what is available in the library. >>>>>> >>>>>> This method is equivalent to the TLI method >>>>>> `StringRef getVectorizedFunction(StringRef F, unsigned VF) const;`. >>>>>> >>>>>> Notice that to fully support OpenMP vectorization we need to think about >>>>>> a fuzzy matching mechanism that is able to select a candidate in the >>>>>> calling context. However, this proposal is intended for scalar-to-vector >>>>>> mappings of math-like functions that are most likely to associate a >>>>>> unique vector candidate in most contexts. Therefore, extending this >>>>>> behavior to a generic one is an aspect of the implementation that will >>>>>> be treated in a separate RFC about the vectorization pass. >>>>>> >>>>>> ### Scalable vectorization >>>>>> >>>>>> Both methods of the SVFS API will be extended with a boolean parameter >>>>>> to specify whether scalable signatures are needed by the user of the >>>>>> SVFS. >>>>>> >>>>>> Changes in clang {#clang} >>>>>> ---------------- >>>>>> >>>>>> We use clang to generate the metadata described above. >>>>>> >>>>>> In the compilation unit, the vector function definition or declaration >>>>>> must be visible and associated to the scalar version via the >>>>>> `#pragma clang declare variant` according to the rule defined by the >>>>>> correspondent `#pragma omp declare variant` defined in OpenMP 5.0, as in >>>>>> the following example. >>>>>> >>>>>> #pragma clang declare variant(vector_sinf) \ >>>>>> match(construct=simd(simdlen(4),notinbranch), device={isa("simd")}) >>>>>> extern float sinf(float); >>>>>> >>>>>> float32x4_t vector_sinf(float32x4_t x); >>>>>> >>>>>> The `construct` set in the directive, together with the `device` set, is >>>>>> used to generate the vector mangled name to be used in the >>>>>> `vector-variant` attribute, for example `_ZGVnN2v_sin`, when targeting >>>>>> AArch64 Advanced SIMD code generation. The rule for mangling the name of >>>>>> the scalar function in the vector name are defined in the the Vector >>>>>> Function ABI specification of the target. >>>>>> >>>>>> The part of the vector-variant attribute that redirects the call to >>>>>> `vector_sinf` is derived from the `variant-id` specified in the >>>>>> `variant` clause. >>>>>> >>>>>> Summary >>>>>> ======>>>>>> >>>>>> New `clang` directive in clang >>>>>> ------------------------------ >>>>>> >>>>>> `#pragma omp declare variant`, same as `#pragma omp declare variant` >>>>>> restricted to the `simd` context selector, from OpenMP 5.0+. >>>>>> >>>>>> Option behavior, and interaction with OpenMP >>>>>> -------------------------------------------- >>>>>> >>>>>> The behavior described below makes sure that >>>>>> `#pragma cland declare variant` function vectorization and OpenMP >>>>>> function vectorization are orthogonal. >>>>>> >>>>>> `-fclang-declare-variant` >>>>>> >>>>>> : The `#pragma clang declare variant` directives are parsed and used >>>>>> to populate the `vector-variant` attribute. >>>>>> >>>>>> `-fopenmp[-simd]` >>>>>> >>>>>> : The `#pragma omp declare variant` directives are parsed and used to >>>>>> populate the `vector-variant` attribute. >>>>>> >>>>>> `-fopenmp[-simd]`and `-fno-clang-declare-variant` >>>>>> >>>>>> : The directive `#pragma omp declare variant` is used to populate the >>>>>> `vector-variant` attribute in IR. The directive >>>>>> `#pragma clang declare variant` are ignored. >>>>>> >>>>>> [^1]: <https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf> >>>>>> >>>>>> [^2]: Vector Function ABI for x86: >>>>>> <https://software.intel.com/en-us/articles/vector-simd-function-abi>. >>>>>> Vector Function ABI for AArch64: >>>>>> https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi >>>>>> >>>>>> [^3]: <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html> >>>>>> >>>>>> _______________________________________________ >>>>>> LLVM Developers mailing list >>>>>> llvm-dev at lists.llvm.org >>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>> _______________________________________________ >>>>> cfe-dev mailing list >>>>> cfe-dev at lists.llvm.org >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >> -- >> Hal Finkel >> Lead, Compiler Technology and Programming Languages >> Leadership Computing Facility >> Argonne National Laboratory >> >> _______________________________________________ >> cfe-dev mailing list >> cfe-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Possibly Parallel Threads
- [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.
- [RFC] Expose user provided vector function for auto-vectorization.
- [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.
- [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.
- [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.