thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization. [May 2019]

If this information is useful, please help other people find it:
Share via:

Finkel, Hal J. via llvm-dev

2019-May-29 02:55 UTC

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote:> I generally like the idea of having support in IR for vectorization of
> custom functions.  I have several use cases which would benefit from this.
>
> I'd suggest a couple of reframings to the IR representation though.
>
> First, this should probably be specified as metadata/attribute on a
> function declaration.  Allowing the callsite variant is fine, but it
> should primarily be a property of the called function, not of the call
> site.  Being able to specify it once per declaration is much cleaner.

I agree. We should support this both on the function declaration and on 
the call sites.

>
> Second, I really don't like the mangling use here.  We need a better
way
> to specify the properties of the function then it's mangled name.  One
> thought to explore is to directly use the Value of the function
> declaration (since this is metadata and we can do that), and then tie
> the properties to the function declaration in some way?  Sorry, I don't
> really have a specific suggestion here.

Is the problem the mangling or the fact that the mangling is 
ABI/target-specific? One option is to use LLVM's mangling scheme (the 
one we use for intrinsics) and then provide some backend infrastructure 
to translate later.


  -Hal

>
> Philip
>
> On 5/28/19 12:44 PM, Francesco Petrogalli via llvm-dev wrote:
>> Dear all,
>>
>> This RFC is a proposal to provide auto-vectorization functionality for
user provided vector functions.
>>
>> The proposal is a modification of an RFC that I have sent out a couple
of months ago, with the title `[RFC] Re-implementing -fveclib with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
>>
>> The original RFC was proposing to re-implement the `-fveclib` command
line option. This proposal avoids that, and limits its scope to the mechanics of
providing vector function in user code that the compiler can pick up for
auto-vectorization. This narrower scope limits the impact of changes that are
needed in both clang and LLVM.
>>
>> Please let me know what you think.
>>
>> Kind regards,
>>
>> Francesco
>>
>>
>>
================================================================================>>
>> Introduction
>> ===========>>
>> This RFC encompasses the proposal of informing the vectorizer about the
>> availability of vector functions provided by the user. The mechanism is
>> based on the use of the directive `declare variant` introduced in
OpenMP
>> 5.0 [^1].
>>
>> The mechanism proposed has the following properties:
>>
>> 1.  Decouples the compiler front-end that knows about the availability
>>      of vectorized routines, from the back-end that knows how to make
use
>>      of them.
>> 2.  Enable support for a developer's own vector libraries without
>>      requiring changes to the compiler.
>> 3.  Enables other frontends (e.g. f18) to add scalar-to-vector function
>>      mappings as relevant for their own runtime libraries, etc.
>>
>> The implemetation consists of two separate sets of changes.
>>
>> The first set is a set o changes in `llvm`, and consists of:
>>
>> 1.  [Changes in LLVM IR](#llvmIR) to provide information about the
>>      availability of user-defined vector functions via metadata
attached
>>      to an `llvm::CallInst`.
>> 2.  [An infrastructure](#infrastructure) that can be queried to retrive
>>      information about the available vector functions associated to a
>>      `llvm::CallInst`.
>> 3.  [Changes in the LoopVectorizer](#LV) to use the API to query the
>>      metadata.
>>
>> The second set consists of the changes [changes in clang](#clang) that
>> are needed too to recognize the `#pragma clang declare variant`
>> directive.
>>
>> Proposed changes
>> ===============>>
>> We propose an implementation that uses `#pragma clang declare variant`
>> to inform the backend components about the availability of vector
>> version of scalar functions found in IR. The mechanism relies in
storing
>> such information in IR metadata, and therefore makes the
>> auto-vectorization of function calls a mid-end (`opt`) process that is
>> independent on the front-end that generated such IR metadata.
>>
>> This implementation provides a generic mechanism that the users of the
>> LLVM compiler will be able to use for interfacing their own vector
>> routines for generic code.
>>
>> The implementation can also expose vectorization-specific descriptors
--
>> for example, like the `linear` and `uniform` clauses of the OpenMP
>> `declare simd` directive -- that could be used to finely tune the
>> automatic vectorization of some functions (think for example the
>> vectorization of `double sincos(double , double *, double *)`, where
>> `linear` can be used to give extra information about the memory layout
>> of the 2 pointers parameters in the vector version).
>>
>> The directive `#pragma clang declare variant` follows the syntax of the
>> `#pragma omp declare variant` directive of OpenMP.
>>
>> We define the new directive in the `clang` namespace instead of using
>> the `omp` one of OpenMP to allow the compiler to perform
>> auto-vectorization outside of an OpenMP SIMD context.
>>
>> The mechanism is base on OpenMP to provide a uniform user experience
>> across the two mechanism, and to maximise the number of shared
>> components of the infrastructure needed in the compiler frontend to
>> enable the feature.
>>
>> Changes in LLVM IR {#llvmIR}
>> ------------------
>>
>> The IR is enriched with metadata that details the availability of
vector
>> versions of an associated scalar function. This metadata is attached to
>> the call site of the scalar function.
>>
>> The metadata takes the form of an attribute containing a comma
separated
>> list of vector function mappings. Each entry has a unique name that
>> follows the Vector Function ABI[^2] and real name that is used when
>> generating calls to this vector function.
>>
>>      vfunc_name1(real_name1), vfunc_name2(real_name2)
>>
>> The Vector Function ABI name describes the signature of the vector
>> function so that properties like vectorisation factor can be queried
>> during compilation.
>>
>> The `(real name)` token is optional and assumed to match the Vector
>> Function ABI name when omitted.
>>
>> For example, the availability of a 2-lane double precision `sin`
>> function via SVML when targeting AVX on x86 is provided by the
following
>> IR.
>>
>>      // ...
>>      ... = call double @sin(double) #0
>>      // ...
>>
>>      #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2),
>>                                _ZGVdN4v_sin(__svml_sin4),
>>                                ..."} }
>>
>> The string `"_ZGVcN2v_sin(__svml_sin2)"` in this
vector-variant
>> attribute provides information on the shape of the vector function via
>> the string `_ZGVcN2v_sin`, mangled according to the Vector Function ABI
>> for Intel, and remaps the standard Vector Function ABI name to the
>> non-standard name `__svml_sin2`.
>>
>> This metadata is compatible with the proposal "Proposal for
function
>> vectorization and loop vectorization with function calls",[^3]
that uses
>> Vector Function ABI mangled names to inform the vectorizer about the
>> availability of vector functions. The proposal extends the original by
>> allowing the explicit mapping of the Vector Function ABI mangled name
to
>> a non-standard name, which allows the use of existing vector libraries.
>>
>> The `vector-variant` attribute needs to be attached on a per-call basis
>> to avoid conflicts when merging modules with different vector variants.
>>
>> The query infrastructure: SVFS {#infrastructure}
>> ------------------------------
>>
>> The Search Vector Function System (SVFS) is constructed from an
>> `llvm::Module` instance so it can create function definitions. The SVFS
>> exposes an API with two methods.
>>
>> ### `SVFS::isFunctionVectorizable`
>>
>> This method queries the avilability of a vectorized version of a
>> function. The signature of the method is as follows.
>>
>>      bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap
Params);
>>
>> The method determine the availability of vector version of the function
>> invoked by the `Call` parameter by looking at the `vector-variant`
>> metadata.
>>
>> The `Params` argument is a map that associates the position of a
>> parameter in the `CallInst` to its `ParameterType` descriptor. The
>> `ParameterType` descriptor holds information about the shape of the
>> correspondend parameter in the signature of the vector function. This
>> `ParamaterType` is used to query the SVMS about the availability of
>> vector version that have `linear`, `uniform` or `align` parameters (in
>> the sense of OpenMP 4.0 and onwards).
>>
>> The method `isFunctionVectorizable`, when invoked with an empty
>> `ParTypeMap`, is equivalent to the `TargetLibraryInfo` method
>> `isFunctionVectorizable(StrinRef Name)`.
>>
>> ### `SVFS::getVectorizedFunction`
>>
>> This method returns the vector function declaration that correspond to
>> the needs of the vectorization technique that is being run.
>>
>> The signature of the function is as follows.
>>
>>      std::pair<llvm::FunctionType *, std::string>
getVectorizedFunction(
>>        llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet
Params);
>>
>> The `Call` parameter is the call instance that is being vectorized, the
>> `VF` parameter represent the vectorization factor (how many lanes), the
>> `IsMasked` parameter decides whether or not the signature of the vector
>> function is required to have a mask parameter, the `Params` parameter
>> describes the shape of the vector function as in the
>> `isFunctionVectorizable` method.
>>
>> The methods uses the `vector-variant` metadata and returns the function
>> signature and the name of the function based on the input parameters.
>>
>> The SVFS can add new function definitions, in the same module as the
>> `Call`, to provide vector functions that are not present within the
>> vector-variant metadata. For example, if a library provides a vector
>> version of a function with a vectorization factor of 2, but the
>> vectorizer is requesting a vectorization factor of 4, the SVFS is
>> allowed to create a definition that calls the 2-lane version twice.
This
>> capability applies similarly for providing masked and unmasked versions
>> when the request does not match what is available in the library.
>>
>> This method is equivalent to the TLI method
>> `StringRef getVectorizedFunction(StringRef F, unsigned VF) const;`.
>>
>> Notice that to fully support OpenMP vectorization we need to think
about
>> a fuzzy matching mechanism that is able to select a candidate in the
>> calling context. However, this proposal is intended for
scalar-to-vector
>> mappings of math-like functions that are most likely to associate a
>> unique vector candidate in most contexts. Therefore, extending this
>> behavior to a generic one is an aspect of the implementation that will
>> be treated in a separate RFC about the vectorization pass.
>>
>> ### Scalable vectorization
>>
>> Both methods of the SVFS API will be extended with a boolean parameter
>> to specify whether scalable signatures are needed by the user of the
>> SVFS.
>>
>> Changes in clang {#clang}
>> ----------------
>>
>> We use clang to generate the metadata described above.
>>
>> In the compilation unit, the vector function definition or declaration
>> must be visible and associated to the scalar version via the
>> `#pragma clang declare variant` according to the rule defined by the
>> correspondent `#pragma omp declare variant` defined in OpenMP 5.0, as
in
>> the following example.
>>
>>      #pragma clang declare variant(vector_sinf) \
>>      match(construct=simd(simdlen(4),notinbranch),
device={isa("simd")})
>>      extern float sinf(float);
>>
>>      float32x4_t vector_sinf(float32x4_t x);
>>
>> The `construct` set in the directive, together with the `device` set,
is
>> used to generate the vector mangled name to be used in the
>> `vector-variant` attribute, for example `_ZGVnN2v_sin`, when targeting
>> AArch64 Advanced SIMD code generation. The rule for mangling the name
of
>> the scalar function in the vector name are defined in the the Vector
>> Function ABI specification of the target.
>>
>> The part of the vector-variant attribute that redirects the call to
>> `vector_sinf` is derived from the `variant-id` specified in the
>> `variant` clause.
>>
>> Summary
>> ======>>
>> New `clang` directive in clang
>> ------------------------------
>>
>> `#pragma omp declare variant`, same as `#pragma omp declare variant`
>> restricted to the `simd` context selector, from OpenMP 5.0+.
>>
>> Option behavior, and interaction with OpenMP
>> --------------------------------------------
>>
>> The behavior described below makes sure that
>> `#pragma cland declare variant` function vectorization and OpenMP
>> function vectorization are orthogonal.
>>
>> `-fclang-declare-variant`
>>
>> :   The `#pragma clang declare variant` directives are parsed and used
>>      to populate the `vector-variant` attribute.
>>
>> `-fopenmp[-simd]`
>>
>> :   The `#pragma omp declare variant` directives are parsed and used to
>>      populate the `vector-variant` attribute.
>>
>> `-fopenmp[-simd]`and `-fno-clang-declare-variant`
>>
>> :   The directive `#pragma omp declare variant` is used to populate the
>>      `vector-variant` attribute in IR. The directive
>>      `#pragma   clang declare variant` are ignored.
>>
>> [^1]:
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
>>
>> [^2]: Vector Function ABI for x86:
>>     
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
>>      Vector Function ABI for AArch64:
>>     
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi
>>
>> [^3]:
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Philip Reames via llvm-dev

2019-May-29 18:52 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

On 5/28/19 7:55 PM, Finkel, Hal J. wrote:> On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote:
>> I generally like the idea of having support in IR for vectorization of
>> custom functions.  I have several use cases which would benefit from
this.
>>
>> I'd suggest a couple of reframings to the IR representation though.
>>
>> First, this should probably be specified as metadata/attribute on a
>> function declaration.  Allowing the callsite variant is fine, but it
>> should primarily be a property of the called function, not of the call
>> site.  Being able to specify it once per declaration is much cleaner.
>
> I agree. We should support this both on the function declaration and on 
> the call sites.
>
>
>> Second, I really don't like the mangling use here.  We need a
better way
>> to specify the properties of the function then it's mangled name. 
One
>> thought to explore is to directly use the Value of the function
>> declaration (since this is metadata and we can do that), and then tie
>> the properties to the function declaration in some way?  Sorry, I
don't
>> really have a specific suggestion here.
>
> Is the problem the mangling or the fact that the mangling is 
> ABI/target-specific? One option is to use LLVM's mangling scheme (the 
> one we use for intrinsics) and then provide some backend infrastructure 
> to translate later.Well, both honestly.  But mangling with a non-target specific scheme is
a lot better, so I might be okay with that.   Good
idea. >
>
>   -Hal
>
>
>> Philip
>>
>> On 5/28/19 12:44 PM, Francesco Petrogalli via llvm-dev wrote:
>>> Dear all,
>>>
>>> This RFC is a proposal to provide auto-vectorization functionality
for user provided vector functions.
>>>
>>> The proposal is a modification of an RFC that I have sent out a
couple of months ago, with the title `[RFC] Re-implementing -fveclib with
OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
>>>
>>> The original RFC was proposing to re-implement the `-fveclib`
command line option. This proposal avoids that, and limits its scope to the
mechanics of providing vector function in user code that the compiler can pick
up for auto-vectorization. This narrower scope limits the impact of changes that
are needed in both clang and LLVM.
>>>
>>> Please let me know what you think.
>>>
>>> Kind regards,
>>>
>>> Francesco
>>>
>>>
>>>
================================================================================>>>
>>> Introduction
>>> ===========>>>
>>> This RFC encompasses the proposal of informing the vectorizer about
the
>>> availability of vector functions provided by the user. The
mechanism is
>>> based on the use of the directive `declare variant` introduced in
OpenMP
>>> 5.0 [^1].
>>>
>>> The mechanism proposed has the following properties:
>>>
>>> 1.  Decouples the compiler front-end that knows about the
availability
>>>      of vectorized routines, from the back-end that knows how to
make use
>>>      of them.
>>> 2.  Enable support for a developer's own vector libraries
without
>>>      requiring changes to the compiler.
>>> 3.  Enables other frontends (e.g. f18) to add scalar-to-vector
function
>>>      mappings as relevant for their own runtime libraries, etc.
>>>
>>> The implemetation consists of two separate sets of changes.
>>>
>>> The first set is a set o changes in `llvm`, and consists of:
>>>
>>> 1.  [Changes in LLVM IR](#llvmIR) to provide information about the
>>>      availability of user-defined vector functions via metadata
attached
>>>      to an `llvm::CallInst`.
>>> 2.  [An infrastructure](#infrastructure) that can be queried to
retrive
>>>      information about the available vector functions associated to
a
>>>      `llvm::CallInst`.
>>> 3.  [Changes in the LoopVectorizer](#LV) to use the API to query
the
>>>      metadata.
>>>
>>> The second set consists of the changes [changes in clang](#clang)
that
>>> are needed too to recognize the `#pragma clang declare variant`
>>> directive.
>>>
>>> Proposed changes
>>> ===============>>>
>>> We propose an implementation that uses `#pragma clang declare
variant`
>>> to inform the backend components about the availability of vector
>>> version of scalar functions found in IR. The mechanism relies in
storing
>>> such information in IR metadata, and therefore makes the
>>> auto-vectorization of function calls a mid-end (`opt`) process that
is
>>> independent on the front-end that generated such IR metadata.
>>>
>>> This implementation provides a generic mechanism that the users of
the
>>> LLVM compiler will be able to use for interfacing their own vector
>>> routines for generic code.
>>>
>>> The implementation can also expose vectorization-specific
descriptors --
>>> for example, like the `linear` and `uniform` clauses of the OpenMP
>>> `declare simd` directive -- that could be used to finely tune the
>>> automatic vectorization of some functions (think for example the
>>> vectorization of `double sincos(double , double *, double *)`,
where
>>> `linear` can be used to give extra information about the memory
layout
>>> of the 2 pointers parameters in the vector version).
>>>
>>> The directive `#pragma clang declare variant` follows the syntax of
the
>>> `#pragma omp declare variant` directive of OpenMP.
>>>
>>> We define the new directive in the `clang` namespace instead of
using
>>> the `omp` one of OpenMP to allow the compiler to perform
>>> auto-vectorization outside of an OpenMP SIMD context.
>>>
>>> The mechanism is base on OpenMP to provide a uniform user
experience
>>> across the two mechanism, and to maximise the number of shared
>>> components of the infrastructure needed in the compiler frontend to
>>> enable the feature.
>>>
>>> Changes in LLVM IR {#llvmIR}
>>> ------------------
>>>
>>> The IR is enriched with metadata that details the availability of
vector
>>> versions of an associated scalar function. This metadata is
attached to
>>> the call site of the scalar function.
>>>
>>> The metadata takes the form of an attribute containing a comma
separated
>>> list of vector function mappings. Each entry has a unique name that
>>> follows the Vector Function ABI[^2] and real name that is used when
>>> generating calls to this vector function.
>>>
>>>      vfunc_name1(real_name1), vfunc_name2(real_name2)
>>>
>>> The Vector Function ABI name describes the signature of the vector
>>> function so that properties like vectorisation factor can be
queried
>>> during compilation.
>>>
>>> The `(real name)` token is optional and assumed to match the Vector
>>> Function ABI name when omitted.
>>>
>>> For example, the availability of a 2-lane double precision `sin`
>>> function via SVML when targeting AVX on x86 is provided by the
following
>>> IR.
>>>
>>>      // ...
>>>      ... = call double @sin(double) #0
>>>      // ...
>>>
>>>      #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2),
>>>                                _ZGVdN4v_sin(__svml_sin4),
>>>                                ..."} }
>>>
>>> The string `"_ZGVcN2v_sin(__svml_sin2)"` in this
vector-variant
>>> attribute provides information on the shape of the vector function
via
>>> the string `_ZGVcN2v_sin`, mangled according to the Vector Function
ABI
>>> for Intel, and remaps the standard Vector Function ABI name to the
>>> non-standard name `__svml_sin2`.
>>>
>>> This metadata is compatible with the proposal "Proposal for
function
>>> vectorization and loop vectorization with function calls",[^3]
that uses
>>> Vector Function ABI mangled names to inform the vectorizer about
the
>>> availability of vector functions. The proposal extends the original
by
>>> allowing the explicit mapping of the Vector Function ABI mangled
name to
>>> a non-standard name, which allows the use of existing vector
libraries.
>>>
>>> The `vector-variant` attribute needs to be attached on a per-call
basis
>>> to avoid conflicts when merging modules with different vector
variants.
>>>
>>> The query infrastructure: SVFS {#infrastructure}
>>> ------------------------------
>>>
>>> The Search Vector Function System (SVFS) is constructed from an
>>> `llvm::Module` instance so it can create function definitions. The
SVFS
>>> exposes an API with two methods.
>>>
>>> ### `SVFS::isFunctionVectorizable`
>>>
>>> This method queries the avilability of a vectorized version of a
>>> function. The signature of the method is as follows.
>>>
>>>      bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap
Params);
>>>
>>> The method determine the availability of vector version of the
function
>>> invoked by the `Call` parameter by looking at the `vector-variant`
>>> metadata.
>>>
>>> The `Params` argument is a map that associates the position of a
>>> parameter in the `CallInst` to its `ParameterType` descriptor. The
>>> `ParameterType` descriptor holds information about the shape of the
>>> correspondend parameter in the signature of the vector function.
This
>>> `ParamaterType` is used to query the SVMS about the availability of
>>> vector version that have `linear`, `uniform` or `align` parameters
(in
>>> the sense of OpenMP 4.0 and onwards).
>>>
>>> The method `isFunctionVectorizable`, when invoked with an empty
>>> `ParTypeMap`, is equivalent to the `TargetLibraryInfo` method
>>> `isFunctionVectorizable(StrinRef Name)`.
>>>
>>> ### `SVFS::getVectorizedFunction`
>>>
>>> This method returns the vector function declaration that correspond
to
>>> the needs of the vectorization technique that is being run.
>>>
>>> The signature of the function is as follows.
>>>
>>>      std::pair<llvm::FunctionType *, std::string>
getVectorizedFunction(
>>>        llvm::CallInst * Call, unsigned VF, bool IsMasked,
ParTypeSet Params);
>>>
>>> The `Call` parameter is the call instance that is being vectorized,
the
>>> `VF` parameter represent the vectorization factor (how many lanes),
the
>>> `IsMasked` parameter decides whether or not the signature of the
vector
>>> function is required to have a mask parameter, the `Params`
parameter
>>> describes the shape of the vector function as in the
>>> `isFunctionVectorizable` method.
>>>
>>> The methods uses the `vector-variant` metadata and returns the
function
>>> signature and the name of the function based on the input
parameters.
>>>
>>> The SVFS can add new function definitions, in the same module as
the
>>> `Call`, to provide vector functions that are not present within the
>>> vector-variant metadata. For example, if a library provides a
vector
>>> version of a function with a vectorization factor of 2, but the
>>> vectorizer is requesting a vectorization factor of 4, the SVFS is
>>> allowed to create a definition that calls the 2-lane version twice.
This
>>> capability applies similarly for providing masked and unmasked
versions
>>> when the request does not match what is available in the library.
>>>
>>> This method is equivalent to the TLI method
>>> `StringRef getVectorizedFunction(StringRef F, unsigned VF) const;`.
>>>
>>> Notice that to fully support OpenMP vectorization we need to think
about
>>> a fuzzy matching mechanism that is able to select a candidate in
the
>>> calling context. However, this proposal is intended for
scalar-to-vector
>>> mappings of math-like functions that are most likely to associate a
>>> unique vector candidate in most contexts. Therefore, extending this
>>> behavior to a generic one is an aspect of the implementation that
will
>>> be treated in a separate RFC about the vectorization pass.
>>>
>>> ### Scalable vectorization
>>>
>>> Both methods of the SVFS API will be extended with a boolean
parameter
>>> to specify whether scalable signatures are needed by the user of
the
>>> SVFS.
>>>
>>> Changes in clang {#clang}
>>> ----------------
>>>
>>> We use clang to generate the metadata described above.
>>>
>>> In the compilation unit, the vector function definition or
declaration
>>> must be visible and associated to the scalar version via the
>>> `#pragma clang declare variant` according to the rule defined by
the
>>> correspondent `#pragma omp declare variant` defined in OpenMP 5.0,
as in
>>> the following example.
>>>
>>>      #pragma clang declare variant(vector_sinf) \
>>>      match(construct=simd(simdlen(4),notinbranch),
device={isa("simd")})
>>>      extern float sinf(float);
>>>
>>>      float32x4_t vector_sinf(float32x4_t x);
>>>
>>> The `construct` set in the directive, together with the `device`
set, is
>>> used to generate the vector mangled name to be used in the
>>> `vector-variant` attribute, for example `_ZGVnN2v_sin`, when
targeting
>>> AArch64 Advanced SIMD code generation. The rule for mangling the
name of
>>> the scalar function in the vector name are defined in the the
Vector
>>> Function ABI specification of the target.
>>>
>>> The part of the vector-variant attribute that redirects the call to
>>> `vector_sinf` is derived from the `variant-id` specified in the
>>> `variant` clause.
>>>
>>> Summary
>>> ======>>>
>>> New `clang` directive in clang
>>> ------------------------------
>>>
>>> `#pragma omp declare variant`, same as `#pragma omp declare
variant`
>>> restricted to the `simd` context selector, from OpenMP 5.0+.
>>>
>>> Option behavior, and interaction with OpenMP
>>> --------------------------------------------
>>>
>>> The behavior described below makes sure that
>>> `#pragma cland declare variant` function vectorization and OpenMP
>>> function vectorization are orthogonal.
>>>
>>> `-fclang-declare-variant`
>>>
>>> :   The `#pragma clang declare variant` directives are parsed and
used
>>>      to populate the `vector-variant` attribute.
>>>
>>> `-fopenmp[-simd]`
>>>
>>> :   The `#pragma omp declare variant` directives are parsed and
used to
>>>      populate the `vector-variant` attribute.
>>>
>>> `-fopenmp[-simd]`and `-fno-clang-declare-variant`
>>>
>>> :   The directive `#pragma omp declare variant` is used to populate
the
>>>      `vector-variant` attribute in IR. The directive
>>>      `#pragma   clang declare variant` are ignored.
>>>
>>> [^1]:
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
>>>
>>> [^2]: Vector Function ABI for x86:
>>>     
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
>>>      Vector Function ABI for AArch64:
>>>     
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi
>>>
>>> [^3]:
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Finkel, Hal J. via llvm-dev

2019-May-29 19:16 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

On 5/29/19 1:52 PM, Philip Reames wrote:> On 5/28/19 7:55 PM, Finkel, Hal J. wrote:
>> On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote:
>>> I generally like the idea of having support in IR for vectorization
of
>>> custom functions.  I have several use cases which would benefit
from this.
>>>
>>> I'd suggest a couple of reframings to the IR representation
though.
>>>
>>> First, this should probably be specified as metadata/attribute on a
>>> function declaration.  Allowing the callsite variant is fine, but
it
>>> should primarily be a property of the called function, not of the
call
>>> site.  Being able to specify it once per declaration is much
cleaner.
>> I agree. We should support this both on the function declaration and on
>> the call sites.
>>
>>
>>> Second, I really don't like the mangling use here.  We need a
better way
>>> to specify the properties of the function then it's mangled
name.  One
>>> thought to explore is to directly use the Value of the function
>>> declaration (since this is metadata and we can do that), and then
tie
>>> the properties to the function declaration in some way?  Sorry, I
don't
>>> really have a specific suggestion here.
>> Is the problem the mangling or the fact that the mangling is
>> ABI/target-specific? One option is to use LLVM's mangling scheme
(the
>> one we use for intrinsics) and then provide some backend infrastructure
>> to translate later.
> Well, both honestly.  But mangling with a non-target specific scheme is
> a lot better, so I might be okay with that.   Good idea.

I liked your idea of directly encoding the signature in the metadata, 
but I think that we want to continue to use attributes, and not 
metadata, and the options for attributes seem more limited - unless we 
allow attributes to take metadata arguments - maybe that's an 
enhancement worth considering.

  -Hal

>>
>>    -Hal
>>
>>
>>> Philip
>>>
>>> On 5/28/19 12:44 PM, Francesco Petrogalli via llvm-dev wrote:
>>>> Dear all,
>>>>
>>>> This RFC is a proposal to provide auto-vectorization
functionality for user provided vector functions.
>>>>
>>>> The proposal is a modification of an RFC that I have sent out a
couple of months ago, with the title `[RFC] Re-implementing -fveclib with
OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
>>>>
>>>> The original RFC was proposing to re-implement the `-fveclib`
command line option. This proposal avoids that, and limits its scope to the
mechanics of providing vector function in user code that the compiler can pick
up for auto-vectorization. This narrower scope limits the impact of changes that
are needed in both clang and LLVM.
>>>>
>>>> Please let me know what you think.
>>>>
>>>> Kind regards,
>>>>
>>>> Francesco
>>>>
>>>>
>>>>
================================================================================>>>>
>>>> Introduction
>>>> ===========>>>>
>>>> This RFC encompasses the proposal of informing the vectorizer
about the
>>>> availability of vector functions provided by the user. The
mechanism is
>>>> based on the use of the directive `declare variant` introduced
in OpenMP
>>>> 5.0 [^1].
>>>>
>>>> The mechanism proposed has the following properties:
>>>>
>>>> 1.  Decouples the compiler front-end that knows about the
availability
>>>>       of vectorized routines, from the back-end that knows how
to make use
>>>>       of them.
>>>> 2.  Enable support for a developer's own vector libraries
without
>>>>       requiring changes to the compiler.
>>>> 3.  Enables other frontends (e.g. f18) to add scalar-to-vector
function
>>>>       mappings as relevant for their own runtime libraries,
etc.
>>>>
>>>> The implemetation consists of two separate sets of changes.
>>>>
>>>> The first set is a set o changes in `llvm`, and consists of:
>>>>
>>>> 1.  [Changes in LLVM IR](#llvmIR) to provide information about
the
>>>>       availability of user-defined vector functions via
metadata attached
>>>>       to an `llvm::CallInst`.
>>>> 2.  [An infrastructure](#infrastructure) that can be queried to
retrive
>>>>       information about the available vector functions
associated to a
>>>>       `llvm::CallInst`.
>>>> 3.  [Changes in the LoopVectorizer](#LV) to use the API to
query the
>>>>       metadata.
>>>>
>>>> The second set consists of the changes [changes in
clang](#clang) that
>>>> are needed too to recognize the `#pragma clang declare variant`
>>>> directive.
>>>>
>>>> Proposed changes
>>>> ===============>>>>
>>>> We propose an implementation that uses `#pragma clang declare
variant`
>>>> to inform the backend components about the availability of
vector
>>>> version of scalar functions found in IR. The mechanism relies
in storing
>>>> such information in IR metadata, and therefore makes the
>>>> auto-vectorization of function calls a mid-end (`opt`) process
that is
>>>> independent on the front-end that generated such IR metadata.
>>>>
>>>> This implementation provides a generic mechanism that the users
of the
>>>> LLVM compiler will be able to use for interfacing their own
vector
>>>> routines for generic code.
>>>>
>>>> The implementation can also expose vectorization-specific
descriptors --
>>>> for example, like the `linear` and `uniform` clauses of the
OpenMP
>>>> `declare simd` directive -- that could be used to finely tune
the
>>>> automatic vectorization of some functions (think for example
the
>>>> vectorization of `double sincos(double , double *, double *)`,
where
>>>> `linear` can be used to give extra information about the memory
layout
>>>> of the 2 pointers parameters in the vector version).
>>>>
>>>> The directive `#pragma clang declare variant` follows the
syntax of the
>>>> `#pragma omp declare variant` directive of OpenMP.
>>>>
>>>> We define the new directive in the `clang` namespace instead of
using
>>>> the `omp` one of OpenMP to allow the compiler to perform
>>>> auto-vectorization outside of an OpenMP SIMD context.
>>>>
>>>> The mechanism is base on OpenMP to provide a uniform user
experience
>>>> across the two mechanism, and to maximise the number of shared
>>>> components of the infrastructure needed in the compiler
frontend to
>>>> enable the feature.
>>>>
>>>> Changes in LLVM IR {#llvmIR}
>>>> ------------------
>>>>
>>>> The IR is enriched with metadata that details the availability
of vector
>>>> versions of an associated scalar function. This metadata is
attached to
>>>> the call site of the scalar function.
>>>>
>>>> The metadata takes the form of an attribute containing a comma
separated
>>>> list of vector function mappings. Each entry has a unique name
that
>>>> follows the Vector Function ABI[^2] and real name that is used
when
>>>> generating calls to this vector function.
>>>>
>>>>       vfunc_name1(real_name1), vfunc_name2(real_name2)
>>>>
>>>> The Vector Function ABI name describes the signature of the
vector
>>>> function so that properties like vectorisation factor can be
queried
>>>> during compilation.
>>>>
>>>> The `(real name)` token is optional and assumed to match the
Vector
>>>> Function ABI name when omitted.
>>>>
>>>> For example, the availability of a 2-lane double precision
`sin`
>>>> function via SVML when targeting AVX on x86 is provided by the
following
>>>> IR.
>>>>
>>>>       // ...
>>>>       ... = call double @sin(double) #0
>>>>       // ...
>>>>
>>>>       #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2),
>>>>                                 _ZGVdN4v_sin(__svml_sin4),
>>>>                                 ..."} }
>>>>
>>>> The string `"_ZGVcN2v_sin(__svml_sin2)"` in this
vector-variant
>>>> attribute provides information on the shape of the vector
function via
>>>> the string `_ZGVcN2v_sin`, mangled according to the Vector
Function ABI
>>>> for Intel, and remaps the standard Vector Function ABI name to
the
>>>> non-standard name `__svml_sin2`.
>>>>
>>>> This metadata is compatible with the proposal "Proposal
for function
>>>> vectorization and loop vectorization with function
calls",[^3] that uses
>>>> Vector Function ABI mangled names to inform the vectorizer
about the
>>>> availability of vector functions. The proposal extends the
original by
>>>> allowing the explicit mapping of the Vector Function ABI
mangled name to
>>>> a non-standard name, which allows the use of existing vector
libraries.
>>>>
>>>> The `vector-variant` attribute needs to be attached on a
per-call basis
>>>> to avoid conflicts when merging modules with different vector
variants.
>>>>
>>>> The query infrastructure: SVFS {#infrastructure}
>>>> ------------------------------
>>>>
>>>> The Search Vector Function System (SVFS) is constructed from an
>>>> `llvm::Module` instance so it can create function definitions.
The SVFS
>>>> exposes an API with two methods.
>>>>
>>>> ### `SVFS::isFunctionVectorizable`
>>>>
>>>> This method queries the avilability of a vectorized version of
a
>>>> function. The signature of the method is as follows.
>>>>
>>>>       bool isFunctionVectorizable(llvm::CallInst * Call,
ParTypeMap Params);
>>>>
>>>> The method determine the availability of vector version of the
function
>>>> invoked by the `Call` parameter by looking at the
`vector-variant`
>>>> metadata.
>>>>
>>>> The `Params` argument is a map that associates the position of
a
>>>> parameter in the `CallInst` to its `ParameterType` descriptor.
The
>>>> `ParameterType` descriptor holds information about the shape of
the
>>>> correspondend parameter in the signature of the vector
function. This
>>>> `ParamaterType` is used to query the SVMS about the
availability of
>>>> vector version that have `linear`, `uniform` or `align`
parameters (in
>>>> the sense of OpenMP 4.0 and onwards).
>>>>
>>>> The method `isFunctionVectorizable`, when invoked with an empty
>>>> `ParTypeMap`, is equivalent to the `TargetLibraryInfo` method
>>>> `isFunctionVectorizable(StrinRef Name)`.
>>>>
>>>> ### `SVFS::getVectorizedFunction`
>>>>
>>>> This method returns the vector function declaration that
correspond to
>>>> the needs of the vectorization technique that is being run.
>>>>
>>>> The signature of the function is as follows.
>>>>
>>>>       std::pair<llvm::FunctionType *, std::string>
getVectorizedFunction(
>>>>         llvm::CallInst * Call, unsigned VF, bool IsMasked,
ParTypeSet Params);
>>>>
>>>> The `Call` parameter is the call instance that is being
vectorized, the
>>>> `VF` parameter represent the vectorization factor (how many
lanes), the
>>>> `IsMasked` parameter decides whether or not the signature of
the vector
>>>> function is required to have a mask parameter, the `Params`
parameter
>>>> describes the shape of the vector function as in the
>>>> `isFunctionVectorizable` method.
>>>>
>>>> The methods uses the `vector-variant` metadata and returns the
function
>>>> signature and the name of the function based on the input
parameters.
>>>>
>>>> The SVFS can add new function definitions, in the same module
as the
>>>> `Call`, to provide vector functions that are not present within
the
>>>> vector-variant metadata. For example, if a library provides a
vector
>>>> version of a function with a vectorization factor of 2, but the
>>>> vectorizer is requesting a vectorization factor of 4, the SVFS
is
>>>> allowed to create a definition that calls the 2-lane version
twice. This
>>>> capability applies similarly for providing masked and unmasked
versions
>>>> when the request does not match what is available in the
library.
>>>>
>>>> This method is equivalent to the TLI method
>>>> `StringRef getVectorizedFunction(StringRef F, unsigned VF)
const;`.
>>>>
>>>> Notice that to fully support OpenMP vectorization we need to
think about
>>>> a fuzzy matching mechanism that is able to select a candidate
in the
>>>> calling context. However, this proposal is intended for
scalar-to-vector
>>>> mappings of math-like functions that are most likely to
associate a
>>>> unique vector candidate in most contexts. Therefore, extending
this
>>>> behavior to a generic one is an aspect of the implementation
that will
>>>> be treated in a separate RFC about the vectorization pass.
>>>>
>>>> ### Scalable vectorization
>>>>
>>>> Both methods of the SVFS API will be extended with a boolean
parameter
>>>> to specify whether scalable signatures are needed by the user
of the
>>>> SVFS.
>>>>
>>>> Changes in clang {#clang}
>>>> ----------------
>>>>
>>>> We use clang to generate the metadata described above.
>>>>
>>>> In the compilation unit, the vector function definition or
declaration
>>>> must be visible and associated to the scalar version via the
>>>> `#pragma clang declare variant` according to the rule defined
by the
>>>> correspondent `#pragma omp declare variant` defined in OpenMP
5.0, as in
>>>> the following example.
>>>>
>>>>       #pragma clang declare variant(vector_sinf) \
>>>>       match(construct=simd(simdlen(4),notinbranch),
device={isa("simd")})
>>>>       extern float sinf(float);
>>>>
>>>>       float32x4_t vector_sinf(float32x4_t x);
>>>>
>>>> The `construct` set in the directive, together with the
`device` set, is
>>>> used to generate the vector mangled name to be used in the
>>>> `vector-variant` attribute, for example `_ZGVnN2v_sin`, when
targeting
>>>> AArch64 Advanced SIMD code generation. The rule for mangling
the name of
>>>> the scalar function in the vector name are defined in the the
Vector
>>>> Function ABI specification of the target.
>>>>
>>>> The part of the vector-variant attribute that redirects the
call to
>>>> `vector_sinf` is derived from the `variant-id` specified in the
>>>> `variant` clause.
>>>>
>>>> Summary
>>>> ======>>>>
>>>> New `clang` directive in clang
>>>> ------------------------------
>>>>
>>>> `#pragma omp declare variant`, same as `#pragma omp declare
variant`
>>>> restricted to the `simd` context selector, from OpenMP 5.0+.
>>>>
>>>> Option behavior, and interaction with OpenMP
>>>> --------------------------------------------
>>>>
>>>> The behavior described below makes sure that
>>>> `#pragma cland declare variant` function vectorization and
OpenMP
>>>> function vectorization are orthogonal.
>>>>
>>>> `-fclang-declare-variant`
>>>>
>>>> :   The `#pragma clang declare variant` directives are parsed
and used
>>>>       to populate the `vector-variant` attribute.
>>>>
>>>> `-fopenmp[-simd]`
>>>>
>>>> :   The `#pragma omp declare variant` directives are parsed and
used to
>>>>       populate the `vector-variant` attribute.
>>>>
>>>> `-fopenmp[-simd]`and `-fno-clang-declare-variant`
>>>>
>>>> :   The directive `#pragma omp declare variant` is used to
populate the
>>>>       `vector-variant` attribute in IR. The directive
>>>>       `#pragma   clang declare variant` are ignored.
>>>>
>>>> [^1]:
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
>>>>
>>>> [^2]: Vector Function ABI for x86:
>>>>      
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
>>>>       Vector Function ABI for AArch64:
>>>>      
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi
>>>>
>>>> [^3]:
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - May 2019 - [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

Maybe Matching Threads