thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization. [May 2019]

If this information is useful, please help other people find it:
Share via:

Philip Reames via llvm-dev

2019-May-30 17:53 UTC

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

On 5/30/19 9:05 AM, Doerfert, Johannes wrote:> On 05/29, Finkel, Hal J. via cfe-dev wrote:
>> On 5/29/19 1:52 PM, Philip Reames wrote:
>>> On 5/28/19 7:55 PM, Finkel, Hal J. wrote:
>>>> On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote:
>>>>> I generally like the idea of having support in IR for
vectorization of
>>>>> custom functions.  I have several use cases which would
benefit from this.
>>>>>
>>>>> I'd suggest a couple of reframings to the IR
representation though.
>>>>>
>>>>> First, this should probably be specified as
metadata/attribute on a
>>>>> function declaration.  Allowing the callsite variant is
fine, but it
>>>>> should primarily be a property of the called function, not
of the call
>>>>> site.  Being able to specify it once per declaration is
much cleaner.
>>>> I agree. We should support this both on the function
declaration and on
>>>> the call sites.
>>>>
>>>>
>>>>> Second, I really don't like the mangling use here.  We
need a better way
>>>>> to specify the properties of the function then it's
mangled name.  One
>>>>> thought to explore is to directly use the Value of the
function
>>>>> declaration (since this is metadata and we can do that),
and then tie
>>>>> the properties to the function declaration in some way? 
Sorry, I don't
>>>>> really have a specific suggestion here.
>>>> Is the problem the mangling or the fact that the mangling is
>>>> ABI/target-specific? One option is to use LLVM's mangling
scheme (the
>>>> one we use for intrinsics) and then provide some backend
infrastructure
>>>> to translate later.
>>> Well, both honestly.  But mangling with a non-target specific
scheme is
>>> a lot better, so I might be okay with that.   Good idea.
>>
>> I liked your idea of directly encoding the signature in the metadata, 
>> but I think that we want to continue to use attributes, and not 
>> metadata, and the options for attributes seem more limited - unless we 
>> allow attributes to take metadata arguments - maybe that's an 
>> enhancement worth considering.
> I recently talked to people in the OpenMP language committee meeting
> about this and, thinking forward to the actual implementation/use of the
> OpenMP 5.x declare variant feature, I'd say:
>
>   - We will need a mangling scheme if we want to allow variants on
>     declarations that are defined elsewhere.
>   - We will need a (OpenMP) standardized mangling scheme if we want
>     interoperability between compilers.
>
> I assume we want both so I think we will need both.If I'm reading this correctly, this describes a need for the frontend to
have a mangling scheme.  Nothing in here would seem to prevent the
frontend for generating a declaration for a mangled external symbol and
then referencing that declaration.  Am I missing
something?>
> That said, I think this should allow us to avoid attributes/metadata
> which seems to me like a good thing right now.
>
> Cheers,
>   Johannes
>
>
>>>>> On 5/28/19 12:44 PM, Francesco Petrogalli via llvm-dev
wrote:
>>>>>> Dear all,
>>>>>>
>>>>>> This RFC is a proposal to provide auto-vectorization
functionality for user provided vector functions.
>>>>>>
>>>>>> The proposal is a modification of an RFC that I have
sent out a couple of months ago, with the title `[RFC] Re-implementing -fveclib
with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
>>>>>>
>>>>>> The original RFC was proposing to re-implement the
`-fveclib` command line option. This proposal avoids that, and limits its scope
to the mechanics of providing vector function in user code that the compiler can
pick up for auto-vectorization. This narrower scope limits the impact of changes
that are needed in both clang and LLVM.
>>>>>>
>>>>>> Please let me know what you think.
>>>>>>
>>>>>> Kind regards,
>>>>>>
>>>>>> Francesco
>>>>>>
>>>>>>
>>>>>>
================================================================================>>>>>>
>>>>>> Introduction
>>>>>> ===========>>>>>>
>>>>>> This RFC encompasses the proposal of informing the
vectorizer about the
>>>>>> availability of vector functions provided by the user.
The mechanism is
>>>>>> based on the use of the directive `declare variant`
introduced in OpenMP
>>>>>> 5.0 [^1].
>>>>>>
>>>>>> The mechanism proposed has the following properties:
>>>>>>
>>>>>> 1.  Decouples the compiler front-end that knows about
the availability
>>>>>>       of vectorized routines, from the back-end that
knows how to make use
>>>>>>       of them.
>>>>>> 2.  Enable support for a developer's own vector
libraries without
>>>>>>       requiring changes to the compiler.
>>>>>> 3.  Enables other frontends (e.g. f18) to add
scalar-to-vector function
>>>>>>       mappings as relevant for their own runtime
libraries, etc.
>>>>>>
>>>>>> The implemetation consists of two separate sets of
changes.
>>>>>>
>>>>>> The first set is a set o changes in `llvm`, and
consists of:
>>>>>>
>>>>>> 1.  [Changes in LLVM IR](#llvmIR) to provide
information about the
>>>>>>       availability of user-defined vector functions via
metadata attached
>>>>>>       to an `llvm::CallInst`.
>>>>>> 2.  [An infrastructure](#infrastructure) that can be
queried to retrive
>>>>>>       information about the available vector functions
associated to a
>>>>>>       `llvm::CallInst`.
>>>>>> 3.  [Changes in the LoopVectorizer](#LV) to use the API
to query the
>>>>>>       metadata.
>>>>>>
>>>>>> The second set consists of the changes [changes in
clang](#clang) that
>>>>>> are needed too to recognize the `#pragma clang declare
variant`
>>>>>> directive.
>>>>>>
>>>>>> Proposed changes
>>>>>> ===============>>>>>>
>>>>>> We propose an implementation that uses `#pragma clang
declare variant`
>>>>>> to inform the backend components about the availability
of vector
>>>>>> version of scalar functions found in IR. The mechanism
relies in storing
>>>>>> such information in IR metadata, and therefore makes
the
>>>>>> auto-vectorization of function calls a mid-end (`opt`)
process that is
>>>>>> independent on the front-end that generated such IR
metadata.
>>>>>>
>>>>>> This implementation provides a generic mechanism that
the users of the
>>>>>> LLVM compiler will be able to use for interfacing their
own vector
>>>>>> routines for generic code.
>>>>>>
>>>>>> The implementation can also expose
vectorization-specific descriptors --
>>>>>> for example, like the `linear` and `uniform` clauses of
the OpenMP
>>>>>> `declare simd` directive -- that could be used to
finely tune the
>>>>>> automatic vectorization of some functions (think for
example the
>>>>>> vectorization of `double sincos(double , double *,
double *)`, where
>>>>>> `linear` can be used to give extra information about
the memory layout
>>>>>> of the 2 pointers parameters in the vector version).
>>>>>>
>>>>>> The directive `#pragma clang declare variant` follows
the syntax of the
>>>>>> `#pragma omp declare variant` directive of OpenMP.
>>>>>>
>>>>>> We define the new directive in the `clang` namespace
instead of using
>>>>>> the `omp` one of OpenMP to allow the compiler to
perform
>>>>>> auto-vectorization outside of an OpenMP SIMD context.
>>>>>>
>>>>>> The mechanism is base on OpenMP to provide a uniform
user experience
>>>>>> across the two mechanism, and to maximise the number of
shared
>>>>>> components of the infrastructure needed in the compiler
frontend to
>>>>>> enable the feature.
>>>>>>
>>>>>> Changes in LLVM IR {#llvmIR}
>>>>>> ------------------
>>>>>>
>>>>>> The IR is enriched with metadata that details the
availability of vector
>>>>>> versions of an associated scalar function. This
metadata is attached to
>>>>>> the call site of the scalar function.
>>>>>>
>>>>>> The metadata takes the form of an attribute containing
a comma separated
>>>>>> list of vector function mappings. Each entry has a
unique name that
>>>>>> follows the Vector Function ABI[^2] and real name that
is used when
>>>>>> generating calls to this vector function.
>>>>>>
>>>>>>       vfunc_name1(real_name1), vfunc_name2(real_name2)
>>>>>>
>>>>>> The Vector Function ABI name describes the signature of
the vector
>>>>>> function so that properties like vectorisation factor
can be queried
>>>>>> during compilation.
>>>>>>
>>>>>> The `(real name)` token is optional and assumed to
match the Vector
>>>>>> Function ABI name when omitted.
>>>>>>
>>>>>> For example, the availability of a 2-lane double
precision `sin`
>>>>>> function via SVML when targeting AVX on x86 is provided
by the following
>>>>>> IR.
>>>>>>
>>>>>>       // ...
>>>>>>       ... = call double @sin(double) #0
>>>>>>       // ...
>>>>>>
>>>>>>       #0 = { vector-variant =
{"_ZGVcN2v_sin(__svml_sin2),
>>>>>>                                
_ZGVdN4v_sin(__svml_sin4),
>>>>>>                                 ..."} }
>>>>>>
>>>>>> The string `"_ZGVcN2v_sin(__svml_sin2)"` in
this vector-variant
>>>>>> attribute provides information on the shape of the
vector function via
>>>>>> the string `_ZGVcN2v_sin`, mangled according to the
Vector Function ABI
>>>>>> for Intel, and remaps the standard Vector Function ABI
name to the
>>>>>> non-standard name `__svml_sin2`.
>>>>>>
>>>>>> This metadata is compatible with the proposal
"Proposal for function
>>>>>> vectorization and loop vectorization with function
calls",[^3] that uses
>>>>>> Vector Function ABI mangled names to inform the
vectorizer about the
>>>>>> availability of vector functions. The proposal extends
the original by
>>>>>> allowing the explicit mapping of the Vector Function
ABI mangled name to
>>>>>> a non-standard name, which allows the use of existing
vector libraries.
>>>>>>
>>>>>> The `vector-variant` attribute needs to be attached on
a per-call basis
>>>>>> to avoid conflicts when merging modules with different
vector variants.
>>>>>>
>>>>>> The query infrastructure: SVFS {#infrastructure}
>>>>>> ------------------------------
>>>>>>
>>>>>> The Search Vector Function System (SVFS) is constructed
from an
>>>>>> `llvm::Module` instance so it can create function
definitions. The SVFS
>>>>>> exposes an API with two methods.
>>>>>>
>>>>>> ### `SVFS::isFunctionVectorizable`
>>>>>>
>>>>>> This method queries the avilability of a vectorized
version of a
>>>>>> function. The signature of the method is as follows.
>>>>>>
>>>>>>       bool isFunctionVectorizable(llvm::CallInst *
Call, ParTypeMap Params);
>>>>>>
>>>>>> The method determine the availability of vector version
of the function
>>>>>> invoked by the `Call` parameter by looking at the
`vector-variant`
>>>>>> metadata.
>>>>>>
>>>>>> The `Params` argument is a map that associates the
position of a
>>>>>> parameter in the `CallInst` to its `ParameterType`
descriptor. The
>>>>>> `ParameterType` descriptor holds information about the
shape of the
>>>>>> correspondend parameter in the signature of the vector
function. This
>>>>>> `ParamaterType` is used to query the SVMS about the
availability of
>>>>>> vector version that have `linear`, `uniform` or `align`
parameters (in
>>>>>> the sense of OpenMP 4.0 and onwards).
>>>>>>
>>>>>> The method `isFunctionVectorizable`, when invoked with
an empty
>>>>>> `ParTypeMap`, is equivalent to the `TargetLibraryInfo`
method
>>>>>> `isFunctionVectorizable(StrinRef Name)`.
>>>>>>
>>>>>> ### `SVFS::getVectorizedFunction`
>>>>>>
>>>>>> This method returns the vector function declaration
that correspond to
>>>>>> the needs of the vectorization technique that is being
run.
>>>>>>
>>>>>> The signature of the function is as follows.
>>>>>>
>>>>>>       std::pair<llvm::FunctionType *,
std::string> getVectorizedFunction(
>>>>>>         llvm::CallInst * Call, unsigned VF, bool
IsMasked, ParTypeSet Params);
>>>>>>
>>>>>> The `Call` parameter is the call instance that is being
vectorized, the
>>>>>> `VF` parameter represent the vectorization factor (how
many lanes), the
>>>>>> `IsMasked` parameter decides whether or not the
signature of the vector
>>>>>> function is required to have a mask parameter, the
`Params` parameter
>>>>>> describes the shape of the vector function as in the
>>>>>> `isFunctionVectorizable` method.
>>>>>>
>>>>>> The methods uses the `vector-variant` metadata and
returns the function
>>>>>> signature and the name of the function based on the
input parameters.
>>>>>>
>>>>>> The SVFS can add new function definitions, in the same
module as the
>>>>>> `Call`, to provide vector functions that are not
present within the
>>>>>> vector-variant metadata. For example, if a library
provides a vector
>>>>>> version of a function with a vectorization factor of 2,
but the
>>>>>> vectorizer is requesting a vectorization factor of 4,
the SVFS is
>>>>>> allowed to create a definition that calls the 2-lane
version twice. This
>>>>>> capability applies similarly for providing masked and
unmasked versions
>>>>>> when the request does not match what is available in
the library.
>>>>>>
>>>>>> This method is equivalent to the TLI method
>>>>>> `StringRef getVectorizedFunction(StringRef F, unsigned
VF) const;`.
>>>>>>
>>>>>> Notice that to fully support OpenMP vectorization we
need to think about
>>>>>> a fuzzy matching mechanism that is able to select a
candidate in the
>>>>>> calling context. However, this proposal is intended for
scalar-to-vector
>>>>>> mappings of math-like functions that are most likely to
associate a
>>>>>> unique vector candidate in most contexts. Therefore,
extending this
>>>>>> behavior to a generic one is an aspect of the
implementation that will
>>>>>> be treated in a separate RFC about the vectorization
pass.
>>>>>>
>>>>>> ### Scalable vectorization
>>>>>>
>>>>>> Both methods of the SVFS API will be extended with a
boolean parameter
>>>>>> to specify whether scalable signatures are needed by
the user of the
>>>>>> SVFS.
>>>>>>
>>>>>> Changes in clang {#clang}
>>>>>> ----------------
>>>>>>
>>>>>> We use clang to generate the metadata described above.
>>>>>>
>>>>>> In the compilation unit, the vector function definition
or declaration
>>>>>> must be visible and associated to the scalar version
via the
>>>>>> `#pragma clang declare variant` according to the rule
defined by the
>>>>>> correspondent `#pragma omp declare variant` defined in
OpenMP 5.0, as in
>>>>>> the following example.
>>>>>>
>>>>>>       #pragma clang declare variant(vector_sinf) \
>>>>>>       match(construct=simd(simdlen(4),notinbranch),
device={isa("simd")})
>>>>>>       extern float sinf(float);
>>>>>>
>>>>>>       float32x4_t vector_sinf(float32x4_t x);
>>>>>>
>>>>>> The `construct` set in the directive, together with the
`device` set, is
>>>>>> used to generate the vector mangled name to be used in
the
>>>>>> `vector-variant` attribute, for example `_ZGVnN2v_sin`,
when targeting
>>>>>> AArch64 Advanced SIMD code generation. The rule for
mangling the name of
>>>>>> the scalar function in the vector name are defined in
the the Vector
>>>>>> Function ABI specification of the target.
>>>>>>
>>>>>> The part of the vector-variant attribute that redirects
the call to
>>>>>> `vector_sinf` is derived from the `variant-id`
specified in the
>>>>>> `variant` clause.
>>>>>>
>>>>>> Summary
>>>>>> ======>>>>>>
>>>>>> New `clang` directive in clang
>>>>>> ------------------------------
>>>>>>
>>>>>> `#pragma omp declare variant`, same as `#pragma omp
declare variant`
>>>>>> restricted to the `simd` context selector, from OpenMP
5.0+.
>>>>>>
>>>>>> Option behavior, and interaction with OpenMP
>>>>>> --------------------------------------------
>>>>>>
>>>>>> The behavior described below makes sure that
>>>>>> `#pragma cland declare variant` function vectorization
and OpenMP
>>>>>> function vectorization are orthogonal.
>>>>>>
>>>>>> `-fclang-declare-variant`
>>>>>>
>>>>>> :   The `#pragma clang declare variant` directives are
parsed and used
>>>>>>       to populate the `vector-variant` attribute.
>>>>>>
>>>>>> `-fopenmp[-simd]`
>>>>>>
>>>>>> :   The `#pragma omp declare variant` directives are
parsed and used to
>>>>>>       populate the `vector-variant` attribute.
>>>>>>
>>>>>> `-fopenmp[-simd]`and `-fno-clang-declare-variant`
>>>>>>
>>>>>> :   The directive `#pragma omp declare variant` is used
to populate the
>>>>>>       `vector-variant` attribute in IR. The directive
>>>>>>       `#pragma   clang declare variant` are ignored.
>>>>>>
>>>>>> [^1]:
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
>>>>>>
>>>>>> [^2]: Vector Function ABI for x86:
>>>>>>      
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
>>>>>>       Vector Function ABI for AArch64:
>>>>>>      
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi
>>>>>>
>>>>>> [^3]:
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> -- 
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Doerfert, Johannes via llvm-dev

2019-May-30 19:20 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

I think that a standardized naming scheme is needed and that it solves the
problem motivating the RFC without the need for attributes or metadata.

If we want to use a vectorized version at a call site we know what the symbol is
supposed to look like and we can check if it's available.

Maybe I misunderstood the problem people want to solve here but the way I see it
the above is all we need.

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Philip Reames <listmail at philipreames.com>
Sent: Thursday, May 30, 2019 12:53:02 PM
To: Doerfert, Johannes; Finkel, Hal J.
Cc: Francesco Petrogalli; LLVM Development List; nd; Hideki Saito; Clang Dev;
scogland1 at llnl.gov
Subject: Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided vector function for
auto-vectorization.


On 5/30/19 9:05 AM, Doerfert, Johannes wrote:> On 05/29, Finkel, Hal J. via cfe-dev wrote:
>> On 5/29/19 1:52 PM, Philip Reames wrote:
>>> On 5/28/19 7:55 PM, Finkel, Hal J. wrote:
>>>> On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote:
>>>>> I generally like the idea of having support in IR for
vectorization of
>>>>> custom functions.  I have several use cases which would
benefit from this.
>>>>>
>>>>> I'd suggest a couple of reframings to the IR
representation though.
>>>>>
>>>>> First, this should probably be specified as
metadata/attribute on a
>>>>> function declaration.  Allowing the callsite variant is
fine, but it
>>>>> should primarily be a property of the called function, not
of the call
>>>>> site.  Being able to specify it once per declaration is
much cleaner.
>>>> I agree. We should support this both on the function
declaration and on
>>>> the call sites.
>>>>
>>>>
>>>>> Second, I really don't like the mangling use here.  We
need a better way
>>>>> to specify the properties of the function then it's
mangled name.  One
>>>>> thought to explore is to directly use the Value of the
function
>>>>> declaration (since this is metadata and we can do that),
and then tie
>>>>> the properties to the function declaration in some way? 
Sorry, I don't
>>>>> really have a specific suggestion here.
>>>> Is the problem the mangling or the fact that the mangling is
>>>> ABI/target-specific? One option is to use LLVM's mangling
scheme (the
>>>> one we use for intrinsics) and then provide some backend
infrastructure
>>>> to translate later.
>>> Well, both honestly.  But mangling with a non-target specific
scheme is
>>> a lot better, so I might be okay with that.   Good idea.
>>
>> I liked your idea of directly encoding the signature in the metadata,
>> but I think that we want to continue to use attributes, and not
>> metadata, and the options for attributes seem more limited - unless we
>> allow attributes to take metadata arguments - maybe that's an
>> enhancement worth considering.
> I recently talked to people in the OpenMP language committee meeting
> about this and, thinking forward to the actual implementation/use of the
> OpenMP 5.x declare variant feature, I'd say:
>
>   - We will need a mangling scheme if we want to allow variants on
>     declarations that are defined elsewhere.
>   - We will need a (OpenMP) standardized mangling scheme if we want
>     interoperability between compilers.
>
> I assume we want both so I think we will need both.If I'm reading this correctly, this describes a need for the frontend to
have a mangling scheme.  Nothing in here would seem to prevent the
frontend for generating a declaration for a mangled external symbol and
then referencing that declaration.  Am I missing
something?>
> That said, I think this should allow us to avoid attributes/metadata
> which seems to me like a good thing right now.
>
> Cheers,
>   Johannes
>
>
>>>>> On 5/28/19 12:44 PM, Francesco Petrogalli via llvm-dev
wrote:
>>>>>> Dear all,
>>>>>>
>>>>>> This RFC is a proposal to provide auto-vectorization
functionality for user provided vector functions.
>>>>>>
>>>>>> The proposal is a modification of an RFC that I have
sent out a couple of months ago, with the title `[RFC] Re-implementing -fveclib
with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
>>>>>>
>>>>>> The original RFC was proposing to re-implement the
`-fveclib` command line option. This proposal avoids that, and limits its scope
to the mechanics of providing vector function in user code that the compiler can
pick up for auto-vectorization. This narrower scope limits the impact of changes
that are needed in both clang and LLVM.
>>>>>>
>>>>>> Please let me know what you think.
>>>>>>
>>>>>> Kind regards,
>>>>>>
>>>>>> Francesco
>>>>>>
>>>>>>
>>>>>>
================================================================================>>>>>>
>>>>>> Introduction
>>>>>> ===========>>>>>>
>>>>>> This RFC encompasses the proposal of informing the
vectorizer about the
>>>>>> availability of vector functions provided by the user.
The mechanism is
>>>>>> based on the use of the directive `declare variant`
introduced in OpenMP
>>>>>> 5.0 [^1].
>>>>>>
>>>>>> The mechanism proposed has the following properties:
>>>>>>
>>>>>> 1.  Decouples the compiler front-end that knows about
the availability
>>>>>>       of vectorized routines, from the back-end that
knows how to make use
>>>>>>       of them.
>>>>>> 2.  Enable support for a developer's own vector
libraries without
>>>>>>       requiring changes to the compiler.
>>>>>> 3.  Enables other frontends (e.g. f18) to add
scalar-to-vector function
>>>>>>       mappings as relevant for their own runtime
libraries, etc.
>>>>>>
>>>>>> The implemetation consists of two separate sets of
changes.
>>>>>>
>>>>>> The first set is a set o changes in `llvm`, and
consists of:
>>>>>>
>>>>>> 1.  [Changes in LLVM IR](#llvmIR) to provide
information about the
>>>>>>       availability of user-defined vector functions via
metadata attached
>>>>>>       to an `llvm::CallInst`.
>>>>>> 2.  [An infrastructure](#infrastructure) that can be
queried to retrive
>>>>>>       information about the available vector functions
associated to a
>>>>>>       `llvm::CallInst`.
>>>>>> 3.  [Changes in the LoopVectorizer](#LV) to use the API
to query the
>>>>>>       metadata.
>>>>>>
>>>>>> The second set consists of the changes [changes in
clang](#clang) that
>>>>>> are needed too to recognize the `#pragma clang declare
variant`
>>>>>> directive.
>>>>>>
>>>>>> Proposed changes
>>>>>> ===============>>>>>>
>>>>>> We propose an implementation that uses `#pragma clang
declare variant`
>>>>>> to inform the backend components about the availability
of vector
>>>>>> version of scalar functions found in IR. The mechanism
relies in storing
>>>>>> such information in IR metadata, and therefore makes
the
>>>>>> auto-vectorization of function calls a mid-end (`opt`)
process that is
>>>>>> independent on the front-end that generated such IR
metadata.
>>>>>>
>>>>>> This implementation provides a generic mechanism that
the users of the
>>>>>> LLVM compiler will be able to use for interfacing their
own vector
>>>>>> routines for generic code.
>>>>>>
>>>>>> The implementation can also expose
vectorization-specific descriptors --
>>>>>> for example, like the `linear` and `uniform` clauses of
the OpenMP
>>>>>> `declare simd` directive -- that could be used to
finely tune the
>>>>>> automatic vectorization of some functions (think for
example the
>>>>>> vectorization of `double sincos(double , double *,
double *)`, where
>>>>>> `linear` can be used to give extra information about
the memory layout
>>>>>> of the 2 pointers parameters in the vector version).
>>>>>>
>>>>>> The directive `#pragma clang declare variant` follows
the syntax of the
>>>>>> `#pragma omp declare variant` directive of OpenMP.
>>>>>>
>>>>>> We define the new directive in the `clang` namespace
instead of using
>>>>>> the `omp` one of OpenMP to allow the compiler to
perform
>>>>>> auto-vectorization outside of an OpenMP SIMD context.
>>>>>>
>>>>>> The mechanism is base on OpenMP to provide a uniform
user experience
>>>>>> across the two mechanism, and to maximise the number of
shared
>>>>>> components of the infrastructure needed in the compiler
frontend to
>>>>>> enable the feature.
>>>>>>
>>>>>> Changes in LLVM IR {#llvmIR}
>>>>>> ------------------
>>>>>>
>>>>>> The IR is enriched with metadata that details the
availability of vector
>>>>>> versions of an associated scalar function. This
metadata is attached to
>>>>>> the call site of the scalar function.
>>>>>>
>>>>>> The metadata takes the form of an attribute containing
a comma separated
>>>>>> list of vector function mappings. Each entry has a
unique name that
>>>>>> follows the Vector Function ABI[^2] and real name that
is used when
>>>>>> generating calls to this vector function.
>>>>>>
>>>>>>       vfunc_name1(real_name1), vfunc_name2(real_name2)
>>>>>>
>>>>>> The Vector Function ABI name describes the signature of
the vector
>>>>>> function so that properties like vectorisation factor
can be queried
>>>>>> during compilation.
>>>>>>
>>>>>> The `(real name)` token is optional and assumed to
match the Vector
>>>>>> Function ABI name when omitted.
>>>>>>
>>>>>> For example, the availability of a 2-lane double
precision `sin`
>>>>>> function via SVML when targeting AVX on x86 is provided
by the following
>>>>>> IR.
>>>>>>
>>>>>>       // ...
>>>>>>       ... = call double @sin(double) #0
>>>>>>       // ...
>>>>>>
>>>>>>       #0 = { vector-variant =
{"_ZGVcN2v_sin(__svml_sin2),
>>>>>>                                
_ZGVdN4v_sin(__svml_sin4),
>>>>>>                                 ..."} }
>>>>>>
>>>>>> The string `"_ZGVcN2v_sin(__svml_sin2)"` in
this vector-variant
>>>>>> attribute provides information on the shape of the
vector function via
>>>>>> the string `_ZGVcN2v_sin`, mangled according to the
Vector Function ABI
>>>>>> for Intel, and remaps the standard Vector Function ABI
name to the
>>>>>> non-standard name `__svml_sin2`.
>>>>>>
>>>>>> This metadata is compatible with the proposal
"Proposal for function
>>>>>> vectorization and loop vectorization with function
calls",[^3] that uses
>>>>>> Vector Function ABI mangled names to inform the
vectorizer about the
>>>>>> availability of vector functions. The proposal extends
the original by
>>>>>> allowing the explicit mapping of the Vector Function
ABI mangled name to
>>>>>> a non-standard name, which allows the use of existing
vector libraries.
>>>>>>
>>>>>> The `vector-variant` attribute needs to be attached on
a per-call basis
>>>>>> to avoid conflicts when merging modules with different
vector variants.
>>>>>>
>>>>>> The query infrastructure: SVFS {#infrastructure}
>>>>>> ------------------------------
>>>>>>
>>>>>> The Search Vector Function System (SVFS) is constructed
from an
>>>>>> `llvm::Module` instance so it can create function
definitions. The SVFS
>>>>>> exposes an API with two methods.
>>>>>>
>>>>>> ### `SVFS::isFunctionVectorizable`
>>>>>>
>>>>>> This method queries the avilability of a vectorized
version of a
>>>>>> function. The signature of the method is as follows.
>>>>>>
>>>>>>       bool isFunctionVectorizable(llvm::CallInst *
Call, ParTypeMap Params);
>>>>>>
>>>>>> The method determine the availability of vector version
of the function
>>>>>> invoked by the `Call` parameter by looking at the
`vector-variant`
>>>>>> metadata.
>>>>>>
>>>>>> The `Params` argument is a map that associates the
position of a
>>>>>> parameter in the `CallInst` to its `ParameterType`
descriptor. The
>>>>>> `ParameterType` descriptor holds information about the
shape of the
>>>>>> correspondend parameter in the signature of the vector
function. This
>>>>>> `ParamaterType` is used to query the SVMS about the
availability of
>>>>>> vector version that have `linear`, `uniform` or `align`
parameters (in
>>>>>> the sense of OpenMP 4.0 and onwards).
>>>>>>
>>>>>> The method `isFunctionVectorizable`, when invoked with
an empty
>>>>>> `ParTypeMap`, is equivalent to the `TargetLibraryInfo`
method
>>>>>> `isFunctionVectorizable(StrinRef Name)`.
>>>>>>
>>>>>> ### `SVFS::getVectorizedFunction`
>>>>>>
>>>>>> This method returns the vector function declaration
that correspond to
>>>>>> the needs of the vectorization technique that is being
run.
>>>>>>
>>>>>> The signature of the function is as follows.
>>>>>>
>>>>>>       std::pair<llvm::FunctionType *,
std::string> getVectorizedFunction(
>>>>>>         llvm::CallInst * Call, unsigned VF, bool
IsMasked, ParTypeSet Params);
>>>>>>
>>>>>> The `Call` parameter is the call instance that is being
vectorized, the
>>>>>> `VF` parameter represent the vectorization factor (how
many lanes), the
>>>>>> `IsMasked` parameter decides whether or not the
signature of the vector
>>>>>> function is required to have a mask parameter, the
`Params` parameter
>>>>>> describes the shape of the vector function as in the
>>>>>> `isFunctionVectorizable` method.
>>>>>>
>>>>>> The methods uses the `vector-variant` metadata and
returns the function
>>>>>> signature and the name of the function based on the
input parameters.
>>>>>>
>>>>>> The SVFS can add new function definitions, in the same
module as the
>>>>>> `Call`, to provide vector functions that are not
present within the
>>>>>> vector-variant metadata. For example, if a library
provides a vector
>>>>>> version of a function with a vectorization factor of 2,
but the
>>>>>> vectorizer is requesting a vectorization factor of 4,
the SVFS is
>>>>>> allowed to create a definition that calls the 2-lane
version twice. This
>>>>>> capability applies similarly for providing masked and
unmasked versions
>>>>>> when the request does not match what is available in
the library.
>>>>>>
>>>>>> This method is equivalent to the TLI method
>>>>>> `StringRef getVectorizedFunction(StringRef F, unsigned
VF) const;`.
>>>>>>
>>>>>> Notice that to fully support OpenMP vectorization we
need to think about
>>>>>> a fuzzy matching mechanism that is able to select a
candidate in the
>>>>>> calling context. However, this proposal is intended for
scalar-to-vector
>>>>>> mappings of math-like functions that are most likely to
associate a
>>>>>> unique vector candidate in most contexts. Therefore,
extending this
>>>>>> behavior to a generic one is an aspect of the
implementation that will
>>>>>> be treated in a separate RFC about the vectorization
pass.
>>>>>>
>>>>>> ### Scalable vectorization
>>>>>>
>>>>>> Both methods of the SVFS API will be extended with a
boolean parameter
>>>>>> to specify whether scalable signatures are needed by
the user of the
>>>>>> SVFS.
>>>>>>
>>>>>> Changes in clang {#clang}
>>>>>> ----------------
>>>>>>
>>>>>> We use clang to generate the metadata described above.
>>>>>>
>>>>>> In the compilation unit, the vector function definition
or declaration
>>>>>> must be visible and associated to the scalar version
via the
>>>>>> `#pragma clang declare variant` according to the rule
defined by the
>>>>>> correspondent `#pragma omp declare variant` defined in
OpenMP 5.0, as in
>>>>>> the following example.
>>>>>>
>>>>>>       #pragma clang declare variant(vector_sinf) \
>>>>>>       match(construct=simd(simdlen(4),notinbranch),
device={isa("simd")})
>>>>>>       extern float sinf(float);
>>>>>>
>>>>>>       float32x4_t vector_sinf(float32x4_t x);
>>>>>>
>>>>>> The `construct` set in the directive, together with the
`device` set, is
>>>>>> used to generate the vector mangled name to be used in
the
>>>>>> `vector-variant` attribute, for example `_ZGVnN2v_sin`,
when targeting
>>>>>> AArch64 Advanced SIMD code generation. The rule for
mangling the name of
>>>>>> the scalar function in the vector name are defined in
the the Vector
>>>>>> Function ABI specification of the target.
>>>>>>
>>>>>> The part of the vector-variant attribute that redirects
the call to
>>>>>> `vector_sinf` is derived from the `variant-id`
specified in the
>>>>>> `variant` clause.
>>>>>>
>>>>>> Summary
>>>>>> ======>>>>>>
>>>>>> New `clang` directive in clang
>>>>>> ------------------------------
>>>>>>
>>>>>> `#pragma omp declare variant`, same as `#pragma omp
declare variant`
>>>>>> restricted to the `simd` context selector, from OpenMP
5.0+.
>>>>>>
>>>>>> Option behavior, and interaction with OpenMP
>>>>>> --------------------------------------------
>>>>>>
>>>>>> The behavior described below makes sure that
>>>>>> `#pragma cland declare variant` function vectorization
and OpenMP
>>>>>> function vectorization are orthogonal.
>>>>>>
>>>>>> `-fclang-declare-variant`
>>>>>>
>>>>>> :   The `#pragma clang declare variant` directives are
parsed and used
>>>>>>       to populate the `vector-variant` attribute.
>>>>>>
>>>>>> `-fopenmp[-simd]`
>>>>>>
>>>>>> :   The `#pragma omp declare variant` directives are
parsed and used to
>>>>>>       populate the `vector-variant` attribute.
>>>>>>
>>>>>> `-fopenmp[-simd]`and `-fno-clang-declare-variant`
>>>>>>
>>>>>> :   The directive `#pragma omp declare variant` is used
to populate the
>>>>>>       `vector-variant` attribute in IR. The directive
>>>>>>       `#pragma   clang declare variant` are ignored.
>>>>>>
>>>>>> [^1]:
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
>>>>>>
>>>>>> [^2]: Vector Function ABI for x86:
>>>>>>      
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
>>>>>>       Vector Function ABI for AArch64:
>>>>>>      
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi
>>>>>>
>>>>>> [^3]:
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190530/69c0eb91/attachment.html>

Francesco Petrogalli via llvm-dev

2019-May-31 16:18 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

Hi All,

Thank you for the feedback so far.

I am replying to all your questions/concerns/suggestions in this single email.
Please let me know if I have missed any.

I will update the RFC accordingly to what we end up deciding here.

Kind regards,

Francesco


# TOPIC 1: concerns about name mangling

I understand that there are concerns in using the mangling scheme I proposed,
and that it would be preferred to have a mangling scheme that is based on (and
standardized by) OpenMP. I hear the argument on having some common ground here.
In fact, there is already common ground between the x86 and aarch64 backend, who
have based their respective Vector Function ABI specifications on OpenMP.

In fact, the mangled name grammar can be summarized as follows:

_ZGV<isa><masking><VLEN><parameter type>_<scalar
name>

Across vector extensions the only <token> that will differ is the
<isa> token.

This might lead people to think that we could drop the _ZGV<isa> prefix
and consider the <masking><VLEN><parameter type>_<scalar
name> part as a sort of unofficial OpenMP mangling scheme: in fact, the
signature of an “unmasked 2-lane vector vector of `sin`” will always be `<2 x
double>(2 x double>).

The problem with this choice is the number of vector version available for a
target is not unique.

In particular, the following declaration generates multiple vector versions,
depending on the target:

#pragma omp declare simd simdlen(2) notinbranch
double foo(double) {…};

On x86, this generates at least 4 symbols (one for SSE, one for AVX, one for
AVX2, and one for AVX512: https://godbolt.org/z/TLYXPi)

On aarch64, the same declaration generates a unique symbol, as specified in the
Vector Function ABI.

This means that the attribute (or metadata) that carries the information on the
available vector version needs to deal also with things that are not usually
visible at IR level, but that might still need to be provided to be able to
decide which particular instruction set/ vector extension needs to be targeted.

I used an example based on `declare simd` instead of `declare variant` because
the attribute/metadata needed for `declare variant` is a modification of the one
needed for `declare simd`, which has already been agreed in a previous RFC
proposed by Intel [1], and for which Intel has already provided an
implementation [2]. The changes proposed in this RFC are fully compatible with
the work that is being don for the VecClone pass in [2].

[1] http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
[2] VecCLone pass: https://reviews.llvm.org/D22792

The good news is that as far as AArch64 and x86 are concerned, the only thing
that will differ in the mangled name is the “<isa>” token. As far as I can
tell, the mangling scheme of the rest of the vector name is the same, therefore
a lot of infrastructure in terms of mangling and demangling can be reused. In
fact, the `mangleVectorParameters` function in
https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could
already be shared among x86 and aarch64.

TOPIC 2: metadata vs attribute

From a functionality point of view, I don’t care whether we use metadata or
attributes. The VecClone pass mentioned in TOPIC 1 uses the following:

attributes #0 = { nounwind uwtable
“vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16”}

This is an attribute (I though it was metadata?), I am happy to reword the RFC
using the right terminology (sorry for messing this up).

Also, @Renato expressed concern that metadata might be dropped by optimization
passes - would using attributes prevent that?

TOPIC 3: "There is no way to notify the backend how conformant the SIMD
versions are.”

@Shawn, I am afraid I don’t understand what you mean by “conformant” here. Can
you elaborate with an example?

TOPIC 3: interaction of the `omp declare variant` with `clang declare variant`

I believe this is described in the `Option behavior, and interaction with
OpenMP`. The option `-fclang-declare-variant` is there to make the OpenMP based
one orthogonal. Of course, we might decide to make -fclang-declare-variant
on/off by default, and have default behavior when interacting with
-fopenmp-simd. For the sake of compatibility with other compilers, we might need
to require -fno-clang-declare-variant when targeting -fopenmp-[simd].

TOPIC 3: "there are no special arguments / flags / status regs that are
used / changed in the vector version that the compiler will have to "just
know”

I believe that this concern is raised by the problem of handling FP exceptions?
If that’s the case, the compiler is not allowed to do any assumption on the
vector function about that, and treat it with the same knowledge of any other
function, depending on the visibility it has in the compilation unit. @Renato,
does this answer your question?

TOPIC 4: attribute in function declaration vs attribute function call site

We discussed this in the previous version of the proposal. Having it in the call
sites guarantees that incompatible vector version are used when merging modules
compiled for different targets. I don’t have a use case for this, if I remember
correctly this was asked by @Hideki Saito. Hideki, any comment on this?

TOPIC 5: overriding system header (the discussion on #pragma omp/clang/system
variants initiated by @Hal Finkel).

I though that the split among #pragma clang declare variant and #pragma omp
declare variant was already providing the orthogonality between system header
and user header. Meaning that a user should always prefer the omp version (for
portability to other compilers) instead of the #pragma clang one, which would be
relegated to system headers and headers provided by the compiler. Am I missing
something? If so, I am happy to add a “system” version of the directive, as it
would be quite easy to do given most of the parsing infrastructure will be
shared.

> On May 30, 2019, at 12:53 PM, Philip Reames <listmail at
philipreames.com> wrote:
> 
> 
> On 5/30/19 9:05 AM, Doerfert, Johannes wrote:
>> On 05/29, Finkel, Hal J. via cfe-dev wrote:
>>> On 5/29/19 1:52 PM, Philip Reames wrote:
>>>> On 5/28/19 7:55 PM, Finkel, Hal J. wrote:
>>>>> On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote:
>>>>>> I generally like the idea of having support in IR for
vectorization of
>>>>>> custom functions.  I have several use cases which would
benefit from this.
>>>>>> 
>>>>>> I'd suggest a couple of reframings to the IR
representation though.
>>>>>> 
>>>>>> First, this should probably be specified as
metadata/attribute on a
>>>>>> function declaration.  Allowing the callsite variant is
fine, but it
>>>>>> should primarily be a property of the called function,
not of the call
>>>>>> site.  Being able to specify it once per declaration is
much cleaner.
>>>>> I agree. We should support this both on the function
declaration and on
>>>>> the call sites.
>>>>> 
>>>>> 
>>>>>> Second, I really don't like the mangling use here. 
We need a better way
>>>>>> to specify the properties of the function then it's
mangled name.  One
>>>>>> thought to explore is to directly use the Value of the
function
>>>>>> declaration (since this is metadata and we can do
that), and then tie
>>>>>> the properties to the function declaration in some way?
Sorry, I don't
>>>>>> really have a specific suggestion here.
>>>>> Is the problem the mangling or the fact that the mangling
is
>>>>> ABI/target-specific? One option is to use LLVM's
mangling scheme (the
>>>>> one we use for intrinsics) and then provide some backend
infrastructure
>>>>> to translate later.
>>>> Well, both honestly.  But mangling with a non-target specific
scheme is
>>>> a lot better, so I might be okay with that.   Good idea.
>>> 
>>> I liked your idea of directly encoding the signature in the
metadata,
>>> but I think that we want to continue to use attributes, and not 
>>> metadata, and the options for attributes seem more limited - unless
we
>>> allow attributes to take metadata arguments - maybe that's an 
>>> enhancement worth considering.
>> I recently talked to people in the OpenMP language committee meeting
>> about this and, thinking forward to the actual implementation/use of
the
>> OpenMP 5.x declare variant feature, I'd say:
>> 
>>  - We will need a mangling scheme if we want to allow variants on
>>    declarations that are defined elsewhere.
>>  - We will need a (OpenMP) standardized mangling scheme if we want
>>    interoperability between compilers.
>> 
>> I assume we want both so I think we will need both.
> If I'm reading this correctly, this describes a need for the frontend
to
> have a mangling scheme.  Nothing in here would seem to prevent the
> frontend for generating a declaration for a mangled external symbol and
> then referencing that declaration.  Am I missing something?
>> 
>> That said, I think this should allow us to avoid attributes/metadata
>> which seems to me like a good thing right now.
>> 
>> Cheers,
>>  Johannes
>> 
>> 
>>>>>> On 5/28/19 12:44 PM, Francesco Petrogalli via llvm-dev
wrote:
>>>>>>> Dear all,
>>>>>>> 
>>>>>>> This RFC is a proposal to provide
auto-vectorization functionality for user provided vector functions.
>>>>>>> 
>>>>>>> The proposal is a modification of an RFC that I
have sent out a couple of months ago, with the title `[RFC] Re-implementing
-fveclib with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
>>>>>>> 
>>>>>>> The original RFC was proposing to re-implement the
`-fveclib` command line option. This proposal avoids that, and limits its scope
to the mechanics of providing vector function in user code that the compiler can
pick up for auto-vectorization. This narrower scope limits the impact of changes
that are needed in both clang and LLVM.
>>>>>>> 
>>>>>>> Please let me know what you think.
>>>>>>> 
>>>>>>> Kind regards,
>>>>>>> 
>>>>>>> Francesco
>>>>>>> 
>>>>>>> 
>>>>>>>
================================================================================>>>>>>>
>>>>>>> Introduction
>>>>>>> ===========>>>>>>> 
>>>>>>> This RFC encompasses the proposal of informing the
vectorizer about the
>>>>>>> availability of vector functions provided by the
user. The mechanism is
>>>>>>> based on the use of the directive `declare variant`
introduced in OpenMP
>>>>>>> 5.0 [^1].
>>>>>>> 
>>>>>>> The mechanism proposed has the following
properties:
>>>>>>> 
>>>>>>> 1.  Decouples the compiler front-end that knows
about the availability
>>>>>>>      of vectorized routines, from the back-end that
knows how to make use
>>>>>>>      of them.
>>>>>>> 2.  Enable support for a developer's own vector
libraries without
>>>>>>>      requiring changes to the compiler.
>>>>>>> 3.  Enables other frontends (e.g. f18) to add
scalar-to-vector function
>>>>>>>      mappings as relevant for their own runtime
libraries, etc.
>>>>>>> 
>>>>>>> The implemetation consists of two separate sets of
changes.
>>>>>>> 
>>>>>>> The first set is a set o changes in `llvm`, and
consists of:
>>>>>>> 
>>>>>>> 1.  [Changes in LLVM IR](#llvmIR) to provide
information about the
>>>>>>>      availability of user-defined vector functions
via metadata attached
>>>>>>>      to an `llvm::CallInst`.
>>>>>>> 2.  [An infrastructure](#infrastructure) that can
be queried to retrive
>>>>>>>      information about the available vector
functions associated to a
>>>>>>>      `llvm::CallInst`.
>>>>>>> 3.  [Changes in the LoopVectorizer](#LV) to use the
API to query the
>>>>>>>      metadata.
>>>>>>> 
>>>>>>> The second set consists of the changes [changes in
clang](#clang) that
>>>>>>> are needed too to recognize the `#pragma clang
declare variant`
>>>>>>> directive.
>>>>>>> 
>>>>>>> Proposed changes
>>>>>>> ===============>>>>>>> 
>>>>>>> We propose an implementation that uses `#pragma
clang declare variant`
>>>>>>> to inform the backend components about the
availability of vector
>>>>>>> version of scalar functions found in IR. The
mechanism relies in storing
>>>>>>> such information in IR metadata, and therefore
makes the
>>>>>>> auto-vectorization of function calls a mid-end
(`opt`) process that is
>>>>>>> independent on the front-end that generated such IR
metadata.
>>>>>>> 
>>>>>>> This implementation provides a generic mechanism
that the users of the
>>>>>>> LLVM compiler will be able to use for interfacing
their own vector
>>>>>>> routines for generic code.
>>>>>>> 
>>>>>>> The implementation can also expose
vectorization-specific descriptors --
>>>>>>> for example, like the `linear` and `uniform`
clauses of the OpenMP
>>>>>>> `declare simd` directive -- that could be used to
finely tune the
>>>>>>> automatic vectorization of some functions (think
for example the
>>>>>>> vectorization of `double sincos(double , double *,
double *)`, where
>>>>>>> `linear` can be used to give extra information
about the memory layout
>>>>>>> of the 2 pointers parameters in the vector
version).
>>>>>>> 
>>>>>>> The directive `#pragma clang declare variant`
follows the syntax of the
>>>>>>> `#pragma omp declare variant` directive of OpenMP.
>>>>>>> 
>>>>>>> We define the new directive in the `clang`
namespace instead of using
>>>>>>> the `omp` one of OpenMP to allow the compiler to
perform
>>>>>>> auto-vectorization outside of an OpenMP SIMD
context.
>>>>>>> 
>>>>>>> The mechanism is base on OpenMP to provide a
uniform user experience
>>>>>>> across the two mechanism, and to maximise the
number of shared
>>>>>>> components of the infrastructure needed in the
compiler frontend to
>>>>>>> enable the feature.
>>>>>>> 
>>>>>>> Changes in LLVM IR {#llvmIR}
>>>>>>> ------------------
>>>>>>> 
>>>>>>> The IR is enriched with metadata that details the
availability of vector
>>>>>>> versions of an associated scalar function. This
metadata is attached to
>>>>>>> the call site of the scalar function.
>>>>>>> 
>>>>>>> The metadata takes the form of an attribute
containing a comma separated
>>>>>>> list of vector function mappings. Each entry has a
unique name that
>>>>>>> follows the Vector Function ABI[^2] and real name
that is used when
>>>>>>> generating calls to this vector function.
>>>>>>> 
>>>>>>>      vfunc_name1(real_name1),
vfunc_name2(real_name2)
>>>>>>> 
>>>>>>> The Vector Function ABI name describes the
signature of the vector
>>>>>>> function so that properties like vectorisation
factor can be queried
>>>>>>> during compilation.
>>>>>>> 
>>>>>>> The `(real name)` token is optional and assumed to
match the Vector
>>>>>>> Function ABI name when omitted.
>>>>>>> 
>>>>>>> For example, the availability of a 2-lane double
precision `sin`
>>>>>>> function via SVML when targeting AVX on x86 is
provided by the following
>>>>>>> IR.
>>>>>>> 
>>>>>>>      // ...
>>>>>>>      ... = call double @sin(double) #0
>>>>>>>      // ...
>>>>>>> 
>>>>>>>      #0 = { vector-variant =
{"_ZGVcN2v_sin(__svml_sin2),
>>>>>>>                               
_ZGVdN4v_sin(__svml_sin4),
>>>>>>>                                ..."} }
>>>>>>> 
>>>>>>> The string `"_ZGVcN2v_sin(__svml_sin2)"`
in this vector-variant
>>>>>>> attribute provides information on the shape of the
vector function via
>>>>>>> the string `_ZGVcN2v_sin`, mangled according to the
Vector Function ABI
>>>>>>> for Intel, and remaps the standard Vector Function
ABI name to the
>>>>>>> non-standard name `__svml_sin2`.
>>>>>>> 
>>>>>>> This metadata is compatible with the proposal
"Proposal for function
>>>>>>> vectorization and loop vectorization with function
calls",[^3] that uses
>>>>>>> Vector Function ABI mangled names to inform the
vectorizer about the
>>>>>>> availability of vector functions. The proposal
extends the original by
>>>>>>> allowing the explicit mapping of the Vector
Function ABI mangled name to
>>>>>>> a non-standard name, which allows the use of
existing vector libraries.
>>>>>>> 
>>>>>>> The `vector-variant` attribute needs to be attached
on a per-call basis
>>>>>>> to avoid conflicts when merging modules with
different vector variants.
>>>>>>> 
>>>>>>> The query infrastructure: SVFS {#infrastructure}
>>>>>>> ------------------------------
>>>>>>> 
>>>>>>> The Search Vector Function System (SVFS) is
constructed from an
>>>>>>> `llvm::Module` instance so it can create function
definitions. The SVFS
>>>>>>> exposes an API with two methods.
>>>>>>> 
>>>>>>> ### `SVFS::isFunctionVectorizable`
>>>>>>> 
>>>>>>> This method queries the avilability of a vectorized
version of a
>>>>>>> function. The signature of the method is as
follows.
>>>>>>> 
>>>>>>>      bool isFunctionVectorizable(llvm::CallInst *
Call, ParTypeMap Params);
>>>>>>> 
>>>>>>> The method determine the availability of vector
version of the function
>>>>>>> invoked by the `Call` parameter by looking at the
`vector-variant`
>>>>>>> metadata.
>>>>>>> 
>>>>>>> The `Params` argument is a map that associates the
position of a
>>>>>>> parameter in the `CallInst` to its `ParameterType`
descriptor. The
>>>>>>> `ParameterType` descriptor holds information about
the shape of the
>>>>>>> correspondend parameter in the signature of the
vector function. This
>>>>>>> `ParamaterType` is used to query the SVMS about the
availability of
>>>>>>> vector version that have `linear`, `uniform` or
`align` parameters (in
>>>>>>> the sense of OpenMP 4.0 and onwards).
>>>>>>> 
>>>>>>> The method `isFunctionVectorizable`, when invoked
with an empty
>>>>>>> `ParTypeMap`, is equivalent to the
`TargetLibraryInfo` method
>>>>>>> `isFunctionVectorizable(StrinRef Name)`.
>>>>>>> 
>>>>>>> ### `SVFS::getVectorizedFunction`
>>>>>>> 
>>>>>>> This method returns the vector function declaration
that correspond to
>>>>>>> the needs of the vectorization technique that is
being run.
>>>>>>> 
>>>>>>> The signature of the function is as follows.
>>>>>>> 
>>>>>>>      std::pair<llvm::FunctionType *,
std::string> getVectorizedFunction(
>>>>>>>        llvm::CallInst * Call, unsigned VF, bool
IsMasked, ParTypeSet Params);
>>>>>>> 
>>>>>>> The `Call` parameter is the call instance that is
being vectorized, the
>>>>>>> `VF` parameter represent the vectorization factor
(how many lanes), the
>>>>>>> `IsMasked` parameter decides whether or not the
signature of the vector
>>>>>>> function is required to have a mask parameter, the
`Params` parameter
>>>>>>> describes the shape of the vector function as in
the
>>>>>>> `isFunctionVectorizable` method.
>>>>>>> 
>>>>>>> The methods uses the `vector-variant` metadata and
returns the function
>>>>>>> signature and the name of the function based on the
input parameters.
>>>>>>> 
>>>>>>> The SVFS can add new function definitions, in the
same module as the
>>>>>>> `Call`, to provide vector functions that are not
present within the
>>>>>>> vector-variant metadata. For example, if a library
provides a vector
>>>>>>> version of a function with a vectorization factor
of 2, but the
>>>>>>> vectorizer is requesting a vectorization factor of
4, the SVFS is
>>>>>>> allowed to create a definition that calls the
2-lane version twice. This
>>>>>>> capability applies similarly for providing masked
and unmasked versions
>>>>>>> when the request does not match what is available
in the library.
>>>>>>> 
>>>>>>> This method is equivalent to the TLI method
>>>>>>> `StringRef getVectorizedFunction(StringRef F,
unsigned VF) const;`.
>>>>>>> 
>>>>>>> Notice that to fully support OpenMP vectorization
we need to think about
>>>>>>> a fuzzy matching mechanism that is able to select a
candidate in the
>>>>>>> calling context. However, this proposal is intended
for scalar-to-vector
>>>>>>> mappings of math-like functions that are most
likely to associate a
>>>>>>> unique vector candidate in most contexts.
Therefore, extending this
>>>>>>> behavior to a generic one is an aspect of the
implementation that will
>>>>>>> be treated in a separate RFC about the
vectorization pass.
>>>>>>> 
>>>>>>> ### Scalable vectorization
>>>>>>> 
>>>>>>> Both methods of the SVFS API will be extended with
a boolean parameter
>>>>>>> to specify whether scalable signatures are needed
by the user of the
>>>>>>> SVFS.
>>>>>>> 
>>>>>>> Changes in clang {#clang}
>>>>>>> ----------------
>>>>>>> 
>>>>>>> We use clang to generate the metadata described
above.
>>>>>>> 
>>>>>>> In the compilation unit, the vector function
definition or declaration
>>>>>>> must be visible and associated to the scalar
version via the
>>>>>>> `#pragma clang declare variant` according to the
rule defined by the
>>>>>>> correspondent `#pragma omp declare variant` defined
in OpenMP 5.0, as in
>>>>>>> the following example.
>>>>>>> 
>>>>>>>      #pragma clang declare variant(vector_sinf) \
>>>>>>>      match(construct=simd(simdlen(4),notinbranch),
device={isa("simd")})
>>>>>>>      extern float sinf(float);
>>>>>>> 
>>>>>>>      float32x4_t vector_sinf(float32x4_t x);
>>>>>>> 
>>>>>>> The `construct` set in the directive, together with
the `device` set, is
>>>>>>> used to generate the vector mangled name to be used
in the
>>>>>>> `vector-variant` attribute, for example
`_ZGVnN2v_sin`, when targeting
>>>>>>> AArch64 Advanced SIMD code generation. The rule for
mangling the name of
>>>>>>> the scalar function in the vector name are defined
in the the Vector
>>>>>>> Function ABI specification of the target.
>>>>>>> 
>>>>>>> The part of the vector-variant attribute that
redirects the call to
>>>>>>> `vector_sinf` is derived from the `variant-id`
specified in the
>>>>>>> `variant` clause.
>>>>>>> 
>>>>>>> Summary
>>>>>>> ======>>>>>>> 
>>>>>>> New `clang` directive in clang
>>>>>>> ------------------------------
>>>>>>> 
>>>>>>> `#pragma omp declare variant`, same as `#pragma omp
declare variant`
>>>>>>> restricted to the `simd` context selector, from
OpenMP 5.0+.
>>>>>>> 
>>>>>>> Option behavior, and interaction with OpenMP
>>>>>>> --------------------------------------------
>>>>>>> 
>>>>>>> The behavior described below makes sure that
>>>>>>> `#pragma cland declare variant` function
vectorization and OpenMP
>>>>>>> function vectorization are orthogonal.
>>>>>>> 
>>>>>>> `-fclang-declare-variant`
>>>>>>> 
>>>>>>> :   The `#pragma clang declare variant` directives
are parsed and used
>>>>>>>      to populate the `vector-variant` attribute.
>>>>>>> 
>>>>>>> `-fopenmp[-simd]`
>>>>>>> 
>>>>>>> :   The `#pragma omp declare variant` directives
are parsed and used to
>>>>>>>      populate the `vector-variant` attribute.
>>>>>>> 
>>>>>>> `-fopenmp[-simd]`and `-fno-clang-declare-variant`
>>>>>>> 
>>>>>>> :   The directive `#pragma omp declare variant` is
used to populate the
>>>>>>>      `vector-variant` attribute in IR. The
directive
>>>>>>>      `#pragma   clang declare variant` are ignored.
>>>>>>> 
>>>>>>> [^1]:
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
>>>>>>> 
>>>>>>> [^2]: Vector Function ABI for x86:
>>>>>>>     
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
>>>>>>>      Vector Function ABI for AArch64:
>>>>>>>     
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi
>>>>>>> 
>>>>>>> [^3]:
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> cfe-dev at lists.llvm.org
>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>> -- 
>>> Hal Finkel
>>> Lead, Compiler Technology and Programming Languages
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>> 
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Roman Lebedev via llvm-dev

2019-May-31 16:27 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

On Fri, May 31, 2019 at 7:19 PM Francesco Petrogalli via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> Hi All,
>
> Thank you for the feedback so far.
>
> I am replying to all your questions/concerns/suggestions in this single
email. Please let me know if I have missed any.
>
> I will update the RFC accordingly to what we end up deciding here.
>
> Kind regards,
>
> Francesco
>
>
> # TOPIC 1: concerns about name mangling
>
> I understand that there are concerns in using the mangling scheme I
proposed, and that it would be preferred to have a mangling scheme that is based
on (and standardized by) OpenMP. I hear the argument on having some common
ground here. In fact, there is already common ground between the x86 and aarch64
backend, who have based their respective Vector Function ABI specifications on
OpenMP.
>
> In fact, the mangled name grammar can be summarized as follows:
>
> _ZGV<isa><masking><VLEN><parameter type>_<scalar
name>
>
> Across vector extensions the only <token> that will differ is the
<isa> token.
>
> This might lead people to think that we could drop the _ZGV<isa>
prefix and consider the <masking><VLEN><parameter
type>_<scalar name> part as a sort of unofficial OpenMP mangling
scheme: in fact, the signature of an “unmasked 2-lane vector vector of `sin`”
will always be `<2 x double>(2 x double>).
>
> The problem with this choice is the number of vector version available for
a target is not unique.
>
> In particular, the following declaration generates multiple vector
versions, depending on the target:
>
> #pragma omp declare simd simdlen(2) notinbranch
> double foo(double) {…};
>
> On x86, this generates at least 4 symbols (one for SSE, one for AVX, one
for AVX2, and one for AVX512: https://godbolt.org/z/TLYXPi)
>
> On aarch64, the same declaration generates a unique symbol, as specified in
the Vector Function ABI.
>
> This means that the attribute (or metadata) that carries the information on
the available vector version needs to deal also with things that are not usually
visible at IR level, but that might still need to be provided to be able to
decide which particular instruction set/ vector extension needs to be targeted.
>
> I used an example based on `declare simd` instead of `declare variant`
because the attribute/metadata needed for `declare variant` is a modification of
the one needed for `declare simd`, which has already been agreed in a previous
RFC proposed by Intel [1], and for which Intel has already provided an
implementation [2]. The changes proposed in this RFC are fully compatible with
the work that is being don for the VecClone pass in [2].
>
> [1] http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
> [2] VecCLone pass: https://reviews.llvm.org/D22792
>
> The good news is that as far as AArch64 and x86 are concerned, the only
thing that will differ in the mangled name is the “<isa>” token. As far as
I can tell, the mangling scheme of the rest of the vector name is the same,
therefore a lot of infrastructure in terms of mangling and demangling can be
reused. In fact, the `mangleVectorParameters` function in
https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could
already be shared among x86 and aarch64.
>
> TOPIC 2: metadata vs attribute
>
> From a functionality point of view, I don’t care whether we use metadata or
attributes. The VecClone pass mentioned in TOPIC 1 uses the following:
>
> attributes #0 = { nounwind uwtable
“vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16”}
>
> This is an attribute (I though it was metadata?), I am happy to reword the
RFC using the right terminology (sorry for messing this up).
>
> Also, @Renato expressed concern that metadata might be dropped by
optimization passes - would using attributes prevent that?
>
> TOPIC 3: "There is no way to notify the backend how conformant the
SIMD versions are.”
>
> @Shawn, I am afraid I don’t understand what you mean by “conformant” here.
Can you elaborate with an example?
>
> TOPIC 3: interaction of the `omp declare variant` with `clang declare
variant`
>
> I believe this is described in the `Option behavior, and interaction with
OpenMP`. The option `-fclang-declare-variant` is there to make the OpenMP based
one orthogonal. Of course, we might decide to make -fclang-declare-variant
on/off by default, and have default behavior when interacting with
-fopenmp-simd. For the sake of compatibility with other compilers, we might need
to require -fno-clang-declare-variant when targeting -fopenmp-[simd].
>
> TOPIC 3: "there are no special arguments / flags / status regs that
are used / changed in the vector version that the compiler will have to
"just know”
>
> I believe that this concern is raised by the problem of handling FP
exceptions? If that’s the case, the compiler is not allowed to do any assumption
on the vector function about that, and treat it with the same knowledge of any
other function, depending on the visibility it has in the compilation unit.
@Renato, does this answer your question?
>
> TOPIC 4: attribute in function declaration vs attribute function call site
>
> We discussed this in the previous version of the proposal. Having it in the
call sites guarantees that incompatible vector version are used when merging
modules compiled for different targets. I don’t have a use case for this, if I
remember correctly this was asked by @Hideki Saito. Hideki, any comment on this?
>
> TOPIC 5: overriding system header (the discussion on #pragma
omp/clang/system variants initiated by @Hal Finkel).
>
> I though that the split among #pragma clang declare variant and #pragma omp
declare variant was already providing the orthogonality between system header
and user header. Meaning that a user should always prefer the omp version (for
portability to other compilers) instead of the #pragma clang one, which would be
relegated to system headers and headers provided by the compiler. Am I missing
something? If so, I am happy to add a “system” version of the directive, as it
would be quite easy to do given most of the parsing infrastructure will be
shared.
One more point to consider - is there prior art? Does e.g. GCC already
do something like that?
The question in particular: will this work across the DSO boundary?

I.e. if the library A contains some function 'c', that has multiple
versions,
but only the declaration of the function is exposed in the header file
(with some pragmas),
and the definition is in a source file (not header file).
So when that function is used by some other program, will the variants
be picked up?

Roman.
> > On May 30, 2019, at 12:53 PM, Philip Reames <listmail at
philipreames.com> wrote:
> >
> >
> > On 5/30/19 9:05 AM, Doerfert, Johannes wrote:
> >> On 05/29, Finkel, Hal J. via cfe-dev wrote:
> >>> On 5/29/19 1:52 PM, Philip Reames wrote:
> >>>> On 5/28/19 7:55 PM, Finkel, Hal J. wrote:
> >>>>> On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote:
> >>>>>> I generally like the idea of having support in IR
for vectorization of
> >>>>>> custom functions.  I have several use cases which
would benefit from this.
> >>>>>>
> >>>>>> I'd suggest a couple of reframings to the IR
representation though.
> >>>>>>
> >>>>>> First, this should probably be specified as
metadata/attribute on a
> >>>>>> function declaration.  Allowing the callsite
variant is fine, but it
> >>>>>> should primarily be a property of the called
function, not of the call
> >>>>>> site.  Being able to specify it once per
declaration is much cleaner.
> >>>>> I agree. We should support this both on the function
declaration and on
> >>>>> the call sites.
> >>>>>
> >>>>>
> >>>>>> Second, I really don't like the mangling use
here.  We need a better way
> >>>>>> to specify the properties of the function then
it's mangled name.  One
> >>>>>> thought to explore is to directly use the Value of
the function
> >>>>>> declaration (since this is metadata and we can do
that), and then tie
> >>>>>> the properties to the function declaration in some
way?  Sorry, I don't
> >>>>>> really have a specific suggestion here.
> >>>>> Is the problem the mangling or the fact that the
mangling is
> >>>>> ABI/target-specific? One option is to use LLVM's
mangling scheme (the
> >>>>> one we use for intrinsics) and then provide some
backend infrastructure
> >>>>> to translate later.
> >>>> Well, both honestly.  But mangling with a non-target
specific scheme is
> >>>> a lot better, so I might be okay with that.   Good idea.
> >>>
> >>> I liked your idea of directly encoding the signature in the
metadata,
> >>> but I think that we want to continue to use attributes, and
not
> >>> metadata, and the options for attributes seem more limited -
unless we
> >>> allow attributes to take metadata arguments - maybe that's
an
> >>> enhancement worth considering.
> >> I recently talked to people in the OpenMP language committee
meeting
> >> about this and, thinking forward to the actual implementation/use
of the
> >> OpenMP 5.x declare variant feature, I'd say:
> >>
> >>  - We will need a mangling scheme if we want to allow variants on
> >>    declarations that are defined elsewhere.
> >>  - We will need a (OpenMP) standardized mangling scheme if we want
> >>    interoperability between compilers.
> >>
> >> I assume we want both so I think we will need both.
> > If I'm reading this correctly, this describes a need for the
frontend to
> > have a mangling scheme.  Nothing in here would seem to prevent the
> > frontend for generating a declaration for a mangled external symbol
and
> > then referencing that declaration.  Am I missing something?
> >>
> >> That said, I think this should allow us to avoid
attributes/metadata
> >> which seems to me like a good thing right now.
> >>
> >> Cheers,
> >>  Johannes
> >>
> >>
> >>>>>> On 5/28/19 12:44 PM, Francesco Petrogalli via
llvm-dev wrote:
> >>>>>>> Dear all,
> >>>>>>>
> >>>>>>> This RFC is a proposal to provide
auto-vectorization functionality for user provided vector functions.
> >>>>>>>
> >>>>>>> The proposal is a modification of an RFC that
I have sent out a couple of months ago, with the title `[RFC] Re-implementing
-fveclib with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
> >>>>>>>
> >>>>>>> The original RFC was proposing to re-implement
the `-fveclib` command line option. This proposal avoids that, and limits its
scope to the mechanics of providing vector function in user code that the
compiler can pick up for auto-vectorization. This narrower scope limits the
impact of changes that are needed in both clang and LLVM.
> >>>>>>>
> >>>>>>> Please let me know what you think.
> >>>>>>>
> >>>>>>> Kind regards,
> >>>>>>>
> >>>>>>> Francesco
> >>>>>>>
> >>>>>>>
> >>>>>>>
================================================================================>
>>>>>>>
> >>>>>>> Introduction
> >>>>>>> ===========> >>>>>>>
> >>>>>>> This RFC encompasses the proposal of informing
the vectorizer about the
> >>>>>>> availability of vector functions provided by
the user. The mechanism is
> >>>>>>> based on the use of the directive `declare
variant` introduced in OpenMP
> >>>>>>> 5.0 [^1].
> >>>>>>>
> >>>>>>> The mechanism proposed has the following
properties:
> >>>>>>>
> >>>>>>> 1.  Decouples the compiler front-end that
knows about the availability
> >>>>>>>      of vectorized routines, from the back-end
that knows how to make use
> >>>>>>>      of them.
> >>>>>>> 2.  Enable support for a developer's own
vector libraries without
> >>>>>>>      requiring changes to the compiler.
> >>>>>>> 3.  Enables other frontends (e.g. f18) to add
scalar-to-vector function
> >>>>>>>      mappings as relevant for their own
runtime libraries, etc.
> >>>>>>>
> >>>>>>> The implemetation consists of two separate
sets of changes.
> >>>>>>>
> >>>>>>> The first set is a set o changes in `llvm`,
and consists of:
> >>>>>>>
> >>>>>>> 1.  [Changes in LLVM IR](#llvmIR) to provide
information about the
> >>>>>>>      availability of user-defined vector
functions via metadata attached
> >>>>>>>      to an `llvm::CallInst`.
> >>>>>>> 2.  [An infrastructure](#infrastructure) that
can be queried to retrive
> >>>>>>>      information about the available vector
functions associated to a
> >>>>>>>      `llvm::CallInst`.
> >>>>>>> 3.  [Changes in the LoopVectorizer](#LV) to
use the API to query the
> >>>>>>>      metadata.
> >>>>>>>
> >>>>>>> The second set consists of the changes
[changes in clang](#clang) that
> >>>>>>> are needed too to recognize the `#pragma clang
declare variant`
> >>>>>>> directive.
> >>>>>>>
> >>>>>>> Proposed changes
> >>>>>>> ===============>
>>>>>>>
> >>>>>>> We propose an implementation that uses
`#pragma clang declare variant`
> >>>>>>> to inform the backend components about the
availability of vector
> >>>>>>> version of scalar functions found in IR. The
mechanism relies in storing
> >>>>>>> such information in IR metadata, and therefore
makes the
> >>>>>>> auto-vectorization of function calls a mid-end
(`opt`) process that is
> >>>>>>> independent on the front-end that generated
such IR metadata.
> >>>>>>>
> >>>>>>> This implementation provides a generic
mechanism that the users of the
> >>>>>>> LLVM compiler will be able to use for
interfacing their own vector
> >>>>>>> routines for generic code.
> >>>>>>>
> >>>>>>> The implementation can also expose
vectorization-specific descriptors --
> >>>>>>> for example, like the `linear` and `uniform`
clauses of the OpenMP
> >>>>>>> `declare simd` directive -- that could be used
to finely tune the
> >>>>>>> automatic vectorization of some functions
(think for example the
> >>>>>>> vectorization of `double sincos(double ,
double *, double *)`, where
> >>>>>>> `linear` can be used to give extra information
about the memory layout
> >>>>>>> of the 2 pointers parameters in the vector
version).
> >>>>>>>
> >>>>>>> The directive `#pragma clang declare variant`
follows the syntax of the
> >>>>>>> `#pragma omp declare variant` directive of
OpenMP.
> >>>>>>>
> >>>>>>> We define the new directive in the `clang`
namespace instead of using
> >>>>>>> the `omp` one of OpenMP to allow the compiler
to perform
> >>>>>>> auto-vectorization outside of an OpenMP SIMD
context.
> >>>>>>>
> >>>>>>> The mechanism is base on OpenMP to provide a
uniform user experience
> >>>>>>> across the two mechanism, and to maximise the
number of shared
> >>>>>>> components of the infrastructure needed in the
compiler frontend to
> >>>>>>> enable the feature.
> >>>>>>>
> >>>>>>> Changes in LLVM IR {#llvmIR}
> >>>>>>> ------------------
> >>>>>>>
> >>>>>>> The IR is enriched with metadata that details
the availability of vector
> >>>>>>> versions of an associated scalar function.
This metadata is attached to
> >>>>>>> the call site of the scalar function.
> >>>>>>>
> >>>>>>> The metadata takes the form of an attribute
containing a comma separated
> >>>>>>> list of vector function mappings. Each entry
has a unique name that
> >>>>>>> follows the Vector Function ABI[^2] and real
name that is used when
> >>>>>>> generating calls to this vector function.
> >>>>>>>
> >>>>>>>      vfunc_name1(real_name1),
vfunc_name2(real_name2)
> >>>>>>>
> >>>>>>> The Vector Function ABI name describes the
signature of the vector
> >>>>>>> function so that properties like vectorisation
factor can be queried
> >>>>>>> during compilation.
> >>>>>>>
> >>>>>>> The `(real name)` token is optional and
assumed to match the Vector
> >>>>>>> Function ABI name when omitted.
> >>>>>>>
> >>>>>>> For example, the availability of a 2-lane
double precision `sin`
> >>>>>>> function via SVML when targeting AVX on x86 is
provided by the following
> >>>>>>> IR.
> >>>>>>>
> >>>>>>>      // ...
> >>>>>>>      ... = call double @sin(double) #0
> >>>>>>>      // ...
> >>>>>>>
> >>>>>>>      #0 = { vector-variant =
{"_ZGVcN2v_sin(__svml_sin2),
> >>>>>>>                               
_ZGVdN4v_sin(__svml_sin4),
> >>>>>>>                                ..."} }
> >>>>>>>
> >>>>>>> The string
`"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant
> >>>>>>> attribute provides information on the shape of
the vector function via
> >>>>>>> the string `_ZGVcN2v_sin`, mangled according
to the Vector Function ABI
> >>>>>>> for Intel, and remaps the standard Vector
Function ABI name to the
> >>>>>>> non-standard name `__svml_sin2`.
> >>>>>>>
> >>>>>>> This metadata is compatible with the proposal
"Proposal for function
> >>>>>>> vectorization and loop vectorization with
function calls",[^3] that uses
> >>>>>>> Vector Function ABI mangled names to inform
the vectorizer about the
> >>>>>>> availability of vector functions. The proposal
extends the original by
> >>>>>>> allowing the explicit mapping of the Vector
Function ABI mangled name to
> >>>>>>> a non-standard name, which allows the use of
existing vector libraries.
> >>>>>>>
> >>>>>>> The `vector-variant` attribute needs to be
attached on a per-call basis
> >>>>>>> to avoid conflicts when merging modules with
different vector variants.
> >>>>>>>
> >>>>>>> The query infrastructure: SVFS
{#infrastructure}
> >>>>>>> ------------------------------
> >>>>>>>
> >>>>>>> The Search Vector Function System (SVFS) is
constructed from an
> >>>>>>> `llvm::Module` instance so it can create
function definitions. The SVFS
> >>>>>>> exposes an API with two methods.
> >>>>>>>
> >>>>>>> ### `SVFS::isFunctionVectorizable`
> >>>>>>>
> >>>>>>> This method queries the avilability of a
vectorized version of a
> >>>>>>> function. The signature of the method is as
follows.
> >>>>>>>
> >>>>>>>      bool
isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap Params);
> >>>>>>>
> >>>>>>> The method determine the availability of
vector version of the function
> >>>>>>> invoked by the `Call` parameter by looking at
the `vector-variant`
> >>>>>>> metadata.
> >>>>>>>
> >>>>>>> The `Params` argument is a map that associates
the position of a
> >>>>>>> parameter in the `CallInst` to its
`ParameterType` descriptor. The
> >>>>>>> `ParameterType` descriptor holds information
about the shape of the
> >>>>>>> correspondend parameter in the signature of
the vector function. This
> >>>>>>> `ParamaterType` is used to query the SVMS
about the availability of
> >>>>>>> vector version that have `linear`, `uniform`
or `align` parameters (in
> >>>>>>> the sense of OpenMP 4.0 and onwards).
> >>>>>>>
> >>>>>>> The method `isFunctionVectorizable`, when
invoked with an empty
> >>>>>>> `ParTypeMap`, is equivalent to the
`TargetLibraryInfo` method
> >>>>>>> `isFunctionVectorizable(StrinRef Name)`.
> >>>>>>>
> >>>>>>> ### `SVFS::getVectorizedFunction`
> >>>>>>>
> >>>>>>> This method returns the vector function
declaration that correspond to
> >>>>>>> the needs of the vectorization technique that
is being run.
> >>>>>>>
> >>>>>>> The signature of the function is as follows.
> >>>>>>>
> >>>>>>>      std::pair<llvm::FunctionType *,
std::string> getVectorizedFunction(
> >>>>>>>        llvm::CallInst * Call, unsigned VF,
bool IsMasked, ParTypeSet Params);
> >>>>>>>
> >>>>>>> The `Call` parameter is the call instance that
is being vectorized, the
> >>>>>>> `VF` parameter represent the vectorization
factor (how many lanes), the
> >>>>>>> `IsMasked` parameter decides whether or not
the signature of the vector
> >>>>>>> function is required to have a mask parameter,
the `Params` parameter
> >>>>>>> describes the shape of the vector function as
in the
> >>>>>>> `isFunctionVectorizable` method.
> >>>>>>>
> >>>>>>> The methods uses the `vector-variant` metadata
and returns the function
> >>>>>>> signature and the name of the function based
on the input parameters.
> >>>>>>>
> >>>>>>> The SVFS can add new function definitions, in
the same module as the
> >>>>>>> `Call`, to provide vector functions that are
not present within the
> >>>>>>> vector-variant metadata. For example, if a
library provides a vector
> >>>>>>> version of a function with a vectorization
factor of 2, but the
> >>>>>>> vectorizer is requesting a vectorization
factor of 4, the SVFS is
> >>>>>>> allowed to create a definition that calls the
2-lane version twice. This
> >>>>>>> capability applies similarly for providing
masked and unmasked versions
> >>>>>>> when the request does not match what is
available in the library.
> >>>>>>>
> >>>>>>> This method is equivalent to the TLI method
> >>>>>>> `StringRef getVectorizedFunction(StringRef F,
unsigned VF) const;`.
> >>>>>>>
> >>>>>>> Notice that to fully support OpenMP
vectorization we need to think about
> >>>>>>> a fuzzy matching mechanism that is able to
select a candidate in the
> >>>>>>> calling context. However, this proposal is
intended for scalar-to-vector
> >>>>>>> mappings of math-like functions that are most
likely to associate a
> >>>>>>> unique vector candidate in most contexts.
Therefore, extending this
> >>>>>>> behavior to a generic one is an aspect of the
implementation that will
> >>>>>>> be treated in a separate RFC about the
vectorization pass.
> >>>>>>>
> >>>>>>> ### Scalable vectorization
> >>>>>>>
> >>>>>>> Both methods of the SVFS API will be extended
with a boolean parameter
> >>>>>>> to specify whether scalable signatures are
needed by the user of the
> >>>>>>> SVFS.
> >>>>>>>
> >>>>>>> Changes in clang {#clang}
> >>>>>>> ----------------
> >>>>>>>
> >>>>>>> We use clang to generate the metadata
described above.
> >>>>>>>
> >>>>>>> In the compilation unit, the vector function
definition or declaration
> >>>>>>> must be visible and associated to the scalar
version via the
> >>>>>>> `#pragma clang declare variant` according to
the rule defined by the
> >>>>>>> correspondent `#pragma omp declare variant`
defined in OpenMP 5.0, as in
> >>>>>>> the following example.
> >>>>>>>
> >>>>>>>      #pragma clang declare
variant(vector_sinf) \
> >>>>>>>     
match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})
> >>>>>>>      extern float sinf(float);
> >>>>>>>
> >>>>>>>      float32x4_t vector_sinf(float32x4_t x);
> >>>>>>>
> >>>>>>> The `construct` set in the directive, together
with the `device` set, is
> >>>>>>> used to generate the vector mangled name to be
used in the
> >>>>>>> `vector-variant` attribute, for example
`_ZGVnN2v_sin`, when targeting
> >>>>>>> AArch64 Advanced SIMD code generation. The
rule for mangling the name of
> >>>>>>> the scalar function in the vector name are
defined in the the Vector
> >>>>>>> Function ABI specification of the target.
> >>>>>>>
> >>>>>>> The part of the vector-variant attribute that
redirects the call to
> >>>>>>> `vector_sinf` is derived from the `variant-id`
specified in the
> >>>>>>> `variant` clause.
> >>>>>>>
> >>>>>>> Summary
> >>>>>>> ======> >>>>>>>
> >>>>>>> New `clang` directive in clang
> >>>>>>> ------------------------------
> >>>>>>>
> >>>>>>> `#pragma omp declare variant`, same as
`#pragma omp declare variant`
> >>>>>>> restricted to the `simd` context selector,
from OpenMP 5.0+.
> >>>>>>>
> >>>>>>> Option behavior, and interaction with OpenMP
> >>>>>>> --------------------------------------------
> >>>>>>>
> >>>>>>> The behavior described below makes sure that
> >>>>>>> `#pragma cland declare variant` function
vectorization and OpenMP
> >>>>>>> function vectorization are orthogonal.
> >>>>>>>
> >>>>>>> `-fclang-declare-variant`
> >>>>>>>
> >>>>>>> :   The `#pragma clang declare variant`
directives are parsed and used
> >>>>>>>      to populate the `vector-variant`
attribute.
> >>>>>>>
> >>>>>>> `-fopenmp[-simd]`
> >>>>>>>
> >>>>>>> :   The `#pragma omp declare variant`
directives are parsed and used to
> >>>>>>>      populate the `vector-variant` attribute.
> >>>>>>>
> >>>>>>> `-fopenmp[-simd]`and
`-fno-clang-declare-variant`
> >>>>>>>
> >>>>>>> :   The directive `#pragma omp declare
variant` is used to populate the
> >>>>>>>      `vector-variant` attribute in IR. The
directive
> >>>>>>>      `#pragma   clang declare variant` are
ignored.
> >>>>>>>
> >>>>>>> [^1]:
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
> >>>>>>>
> >>>>>>> [^2]: Vector Function ABI for x86:
> >>>>>>>     
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
> >>>>>>>      Vector Function ABI for AArch64:
> >>>>>>>     
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi
> >>>>>>>
> >>>>>>> [^3]:
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
> >>>>>>>
> >>>>>>>
_______________________________________________
> >>>>>>> LLVM Developers mailing list
> >>>>>>> llvm-dev at lists.llvm.org
> >>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>>>> _______________________________________________
> >>>>>> cfe-dev mailing list
> >>>>>> cfe-dev at lists.llvm.org
> >>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>> --
> >>> Hal Finkel
> >>> Lead, Compiler Technology and Programming Languages
> >>> Leadership Computing Facility
> >>> Argonne National Laboratory
> >>>
> >>> _______________________________________________
> >>> cfe-dev mailing list
> >>> cfe-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Doerfert, Johannes via llvm-dev

2019-May-31 16:47 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

I think we should split this discussion:
  TOPIC 1 & 2 & 4: How do implement all use cases and OpenMP 5.X
                   features, including compatibility with other
                   compilers and cross module support.
  TOPIC 3b & 5: Interoperability with clang declare (system vs. user
                 declares)
  TOPIC 3a & 3c: floating point issues?

I inlined comments for Topic 1 below.

I hope that we do not have to discuss topic 2 if we agree neither
attributes nor metadata is necessary, or better, will solve the actual
problem at hand. I don't have strong feeling on topic 4 but I have the
feeling this will become less problematic once we figure out topic 1.

Thanks,
  Johannes


On 05/31, Francesco Petrogalli wrote:> # TOPIC 1: concerns about name mangling
> 
> I understand that there are concerns in using the mangling scheme I
> proposed, and that it would be preferred to have a mangling scheme
> that is based on (and standardized by) OpenMP. 
I still think it will be required to have a standardized one, not
only preferred.

> I hear the argument on having some common ground here. In fact, there
> is already common ground between the x86 and aarch64 backend, who have
> based their respective Vector Function ABI specifications on OpenMP.
> 
> In fact, the mangled name grammar can be summarized as follows:
> 
> _ZGV<isa><masking><VLEN><parameter type>_<scalar
name>
> 
> Across vector extensions the only <token> that will differ is the
> <isa> token.
> 
> This might lead people to think that we could drop the _ZGV<isa>
> prefix and consider the <masking><VLEN><parameter
type>_<scalar name>
> part as a sort of unofficial OpenMP mangling scheme: in fact, the
> signature of an “unmasked 2-lane vector vector of `sin`” will always
> be `<2 x double>(2 x double>).
> 
> The problem with this choice is the number of vector version available
> for a target is not unique.
For me, this simply means this mangling scheme is not sufficient.

> In particular, the following declaration generates multiple vector
> versions, depending on the target:
> 
> #pragma omp declare simd simdlen(2) notinbranch
> double foo(double) {…};
> 
> On x86, this generates at least 4 symbols (one for SSE, one for AVX,
> one for AVX2, and one for AVX512: https://godbolt.org/z/TLYXPi)
> 
> On aarch64, the same declaration generates a unique symbol, as
> specified in the Vector Function ABI.
I fail to see the problem. We generate X symbols for X different
contexts. Once we get to the point where we vectorize, we determine
which context fits best and choose the corresponding symbol version.

Maybe my view is to naive here, please feel free to correct me.

> This means that the attribute (or metadata) that carries the
> information on the available vector version needs to deal also with
> things that are not usually visible at IR level, but that might still
> need to be provided to be able to decide which particular instruction
> set/ vector extension needs to be targeted.
The symbol names should carry all the information we need. If they do
not, we need to improve the mangling scheme such that they do. There is
no attributes/metadata we could use at library boundaries.

> I used an example based on `declare simd` instead of `declare variant`
> because the attribute/metadata needed for `declare variant` is a
> modification of the one needed for `declare simd`, which has already
> been agreed in a previous RFC proposed by Intel [1], and for which
> Intel has already provided an implementation [2]. The changes proposed
> in this RFC are fully compatible with the work that is being don for
> the VecClone pass in [2].
> 
> [1] http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
> [2] VecCLone pass: https://reviews.llvm.org/D22792
Having an agreed upon mangling for the older feature is not necessarily
important here. We will need more functionality for variants and keeping
the old scheme around with some metadata is not an extensible long-term
solution. So, I would not try to fit variants into the existing
simd-scheme but instead do it the other way around. We define what we
need for variants and implement simd in that scheme.

> The good news is that as far as AArch64 and x86 are concerned, the only
thing that will differ in the mangled name is the “<isa>” token. As far as
I can tell, the mangling scheme of the rest of the vector name is the same,
therefore a lot of infrastructure in terms of mangling and demangling can be
reused. In fact, the `mangleVectorParameters` function in
https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could
already be shared among x86 and aarch64.
> 
> TOPIC 2: metadata vs attribute
> 
> From a functionality point of view, I don’t care whether we use metadata or
attributes. The VecClone pass mentioned in TOPIC 1 uses the following:
> 
> attributes #0 = { nounwind uwtable
“vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16”}
> 
> This is an attribute (I though it was metadata?), I am happy to reword the
RFC using the right terminology (sorry for messing this up).
> 
> Also, @Renato expressed concern that metadata might be dropped by
optimization passes - would using attributes prevent that?
> 
> TOPIC 3: "There is no way to notify the backend how conformant the
SIMD versions are.”
> 
> @Shawn, I am afraid I don’t understand what you mean by “conformant” here.
Can you elaborate with an example?
> 
> TOPIC 3: interaction of the `omp declare variant` with `clang declare
variant`
> 
> I believe this is described in the `Option behavior, and interaction with
OpenMP`. The option `-fclang-declare-variant` is there to make the OpenMP based
one orthogonal. Of course, we might decide to make -fclang-declare-variant
on/off by default, and have default behavior when interacting with
-fopenmp-simd. For the sake of compatibility with other compilers, we might need
to require -fno-clang-declare-variant when targeting -fopenmp-[simd].
> 
> TOPIC 3: "there are no special arguments / flags / status regs that
are used / changed in the vector version that the compiler will have to
"just know”
> 
> I believe that this concern is raised by the problem of handling FP
exceptions? If that’s the case, the compiler is not allowed to do any assumption
on the vector function about that, and treat it with the same knowledge of any
other function, depending on the visibility it has in the compilation unit.
@Renato, does this answer your question?
> 
> TOPIC 4: attribute in function declaration vs attribute function call site
> 
> We discussed this in the previous version of the proposal. Having it in the
call sites guarantees that incompatible vector version are used when merging
modules compiled for different targets. I don’t have a use case for this, if I
remember correctly this was asked by @Hideki Saito. Hideki, any comment on this?
> 
> TOPIC 5: overriding system header (the discussion on #pragma
omp/clang/system variants initiated by @Hal Finkel).
> 
> I though that the split among #pragma clang declare variant and #pragma omp
declare variant was already providing the orthogonality between system header
and user header. Meaning that a user should always prefer the omp version (for
portability to other compilers) instead of the #pragma clang one, which would be
relegated to system headers and headers provided by the compiler. Am I missing
something? If so, I am happy to add a “system” version of the directive, as it
would be quite easy to do given most of the parsing infrastructure will be
shared.
> 
> 
> > On May 30, 2019, at 12:53 PM, Philip Reames <listmail at
philipreames.com> wrote:
> > 
> > 
> > On 5/30/19 9:05 AM, Doerfert, Johannes wrote:
> >> On 05/29, Finkel, Hal J. via cfe-dev wrote:
> >>> On 5/29/19 1:52 PM, Philip Reames wrote:
> >>>> On 5/28/19 7:55 PM, Finkel, Hal J. wrote:
> >>>>> On 5/28/19 3:31 PM, Philip Reames via cfe-dev wrote:
> >>>>>> I generally like the idea of having support in IR
for vectorization of
> >>>>>> custom functions.  I have several use cases which
would benefit from this.
> >>>>>> 
> >>>>>> I'd suggest a couple of reframings to the IR
representation though.
> >>>>>> 
> >>>>>> First, this should probably be specified as
metadata/attribute on a
> >>>>>> function declaration.  Allowing the callsite
variant is fine, but it
> >>>>>> should primarily be a property of the called
function, not of the call
> >>>>>> site.  Being able to specify it once per
declaration is much cleaner.
> >>>>> I agree. We should support this both on the function
declaration and on
> >>>>> the call sites.
> >>>>> 
> >>>>> 
> >>>>>> Second, I really don't like the mangling use
here.  We need a better way
> >>>>>> to specify the properties of the function then
it's mangled name.  One
> >>>>>> thought to explore is to directly use the Value of
the function
> >>>>>> declaration (since this is metadata and we can do
that), and then tie
> >>>>>> the properties to the function declaration in some
way?  Sorry, I don't
> >>>>>> really have a specific suggestion here.
> >>>>> Is the problem the mangling or the fact that the
mangling is
> >>>>> ABI/target-specific? One option is to use LLVM's
mangling scheme (the
> >>>>> one we use for intrinsics) and then provide some
backend infrastructure
> >>>>> to translate later.
> >>>> Well, both honestly.  But mangling with a non-target
specific scheme is
> >>>> a lot better, so I might be okay with that.   Good idea.
> >>> 
> >>> I liked your idea of directly encoding the signature in the
metadata,
> >>> but I think that we want to continue to use attributes, and
not
> >>> metadata, and the options for attributes seem more limited -
unless we
> >>> allow attributes to take metadata arguments - maybe that's
an
> >>> enhancement worth considering.
> >> I recently talked to people in the OpenMP language committee
meeting
> >> about this and, thinking forward to the actual implementation/use
of the
> >> OpenMP 5.x declare variant feature, I'd say:
> >> 
> >>  - We will need a mangling scheme if we want to allow variants on
> >>    declarations that are defined elsewhere.
> >>  - We will need a (OpenMP) standardized mangling scheme if we want
> >>    interoperability between compilers.
> >> 
> >> I assume we want both so I think we will need both.
> > If I'm reading this correctly, this describes a need for the
frontend to
> > have a mangling scheme.  Nothing in here would seem to prevent the
> > frontend for generating a declaration for a mangled external symbol
and
> > then referencing that declaration.  Am I missing something?
> >> 
> >> That said, I think this should allow us to avoid
attributes/metadata
> >> which seems to me like a good thing right now.
> >> 
> >> Cheers,
> >>  Johannes
> >> 
> >> 
> >>>>>> On 5/28/19 12:44 PM, Francesco Petrogalli via
llvm-dev wrote:
> >>>>>>> Dear all,
> >>>>>>> 
> >>>>>>> This RFC is a proposal to provide
auto-vectorization functionality for user provided vector functions.
> >>>>>>> 
> >>>>>>> The proposal is a modification of an RFC that
I have sent out a couple of months ago, with the title `[RFC] Re-implementing
-fveclib with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
> >>>>>>> 
> >>>>>>> The original RFC was proposing to re-implement
the `-fveclib` command line option. This proposal avoids that, and limits its
scope to the mechanics of providing vector function in user code that the
compiler can pick up for auto-vectorization. This narrower scope limits the
impact of changes that are needed in both clang and LLVM.
> >>>>>>> 
> >>>>>>> Please let me know what you think.
> >>>>>>> 
> >>>>>>> Kind regards,
> >>>>>>> 
> >>>>>>> Francesco
> >>>>>>> 
> >>>>>>> 
> >>>>>>>
================================================================================>
>>>>>>>
> >>>>>>> Introduction
> >>>>>>> ===========> >>>>>>> 
> >>>>>>> This RFC encompasses the proposal of informing
the vectorizer about the
> >>>>>>> availability of vector functions provided by
the user. The mechanism is
> >>>>>>> based on the use of the directive `declare
variant` introduced in OpenMP
> >>>>>>> 5.0 [^1].
> >>>>>>> 
> >>>>>>> The mechanism proposed has the following
properties:
> >>>>>>> 
> >>>>>>> 1.  Decouples the compiler front-end that
knows about the availability
> >>>>>>>      of vectorized routines, from the back-end
that knows how to make use
> >>>>>>>      of them.
> >>>>>>> 2.  Enable support for a developer's own
vector libraries without
> >>>>>>>      requiring changes to the compiler.
> >>>>>>> 3.  Enables other frontends (e.g. f18) to add
scalar-to-vector function
> >>>>>>>      mappings as relevant for their own
runtime libraries, etc.
> >>>>>>> 
> >>>>>>> The implemetation consists of two separate
sets of changes.
> >>>>>>> 
> >>>>>>> The first set is a set o changes in `llvm`,
and consists of:
> >>>>>>> 
> >>>>>>> 1.  [Changes in LLVM IR](#llvmIR) to provide
information about the
> >>>>>>>      availability of user-defined vector
functions via metadata attached
> >>>>>>>      to an `llvm::CallInst`.
> >>>>>>> 2.  [An infrastructure](#infrastructure) that
can be queried to retrive
> >>>>>>>      information about the available vector
functions associated to a
> >>>>>>>      `llvm::CallInst`.
> >>>>>>> 3.  [Changes in the LoopVectorizer](#LV) to
use the API to query the
> >>>>>>>      metadata.
> >>>>>>> 
> >>>>>>> The second set consists of the changes
[changes in clang](#clang) that
> >>>>>>> are needed too to recognize the `#pragma clang
declare variant`
> >>>>>>> directive.
> >>>>>>> 
> >>>>>>> Proposed changes
> >>>>>>> ===============>
>>>>>>>
> >>>>>>> We propose an implementation that uses
`#pragma clang declare variant`
> >>>>>>> to inform the backend components about the
availability of vector
> >>>>>>> version of scalar functions found in IR. The
mechanism relies in storing
> >>>>>>> such information in IR metadata, and therefore
makes the
> >>>>>>> auto-vectorization of function calls a mid-end
(`opt`) process that is
> >>>>>>> independent on the front-end that generated
such IR metadata.
> >>>>>>> 
> >>>>>>> This implementation provides a generic
mechanism that the users of the
> >>>>>>> LLVM compiler will be able to use for
interfacing their own vector
> >>>>>>> routines for generic code.
> >>>>>>> 
> >>>>>>> The implementation can also expose
vectorization-specific descriptors --
> >>>>>>> for example, like the `linear` and `uniform`
clauses of the OpenMP
> >>>>>>> `declare simd` directive -- that could be used
to finely tune the
> >>>>>>> automatic vectorization of some functions
(think for example the
> >>>>>>> vectorization of `double sincos(double ,
double *, double *)`, where
> >>>>>>> `linear` can be used to give extra information
about the memory layout
> >>>>>>> of the 2 pointers parameters in the vector
version).
> >>>>>>> 
> >>>>>>> The directive `#pragma clang declare variant`
follows the syntax of the
> >>>>>>> `#pragma omp declare variant` directive of
OpenMP.
> >>>>>>> 
> >>>>>>> We define the new directive in the `clang`
namespace instead of using
> >>>>>>> the `omp` one of OpenMP to allow the compiler
to perform
> >>>>>>> auto-vectorization outside of an OpenMP SIMD
context.
> >>>>>>> 
> >>>>>>> The mechanism is base on OpenMP to provide a
uniform user experience
> >>>>>>> across the two mechanism, and to maximise the
number of shared
> >>>>>>> components of the infrastructure needed in the
compiler frontend to
> >>>>>>> enable the feature.
> >>>>>>> 
> >>>>>>> Changes in LLVM IR {#llvmIR}
> >>>>>>> ------------------
> >>>>>>> 
> >>>>>>> The IR is enriched with metadata that details
the availability of vector
> >>>>>>> versions of an associated scalar function.
This metadata is attached to
> >>>>>>> the call site of the scalar function.
> >>>>>>> 
> >>>>>>> The metadata takes the form of an attribute
containing a comma separated
> >>>>>>> list of vector function mappings. Each entry
has a unique name that
> >>>>>>> follows the Vector Function ABI[^2] and real
name that is used when
> >>>>>>> generating calls to this vector function.
> >>>>>>> 
> >>>>>>>      vfunc_name1(real_name1),
vfunc_name2(real_name2)
> >>>>>>> 
> >>>>>>> The Vector Function ABI name describes the
signature of the vector
> >>>>>>> function so that properties like vectorisation
factor can be queried
> >>>>>>> during compilation.
> >>>>>>> 
> >>>>>>> The `(real name)` token is optional and
assumed to match the Vector
> >>>>>>> Function ABI name when omitted.
> >>>>>>> 
> >>>>>>> For example, the availability of a 2-lane
double precision `sin`
> >>>>>>> function via SVML when targeting AVX on x86 is
provided by the following
> >>>>>>> IR.
> >>>>>>> 
> >>>>>>>      // ...
> >>>>>>>      ... = call double @sin(double) #0
> >>>>>>>      // ...
> >>>>>>> 
> >>>>>>>      #0 = { vector-variant =
{"_ZGVcN2v_sin(__svml_sin2),
> >>>>>>>                               
_ZGVdN4v_sin(__svml_sin4),
> >>>>>>>                                ..."} }
> >>>>>>> 
> >>>>>>> The string
`"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant
> >>>>>>> attribute provides information on the shape of
the vector function via
> >>>>>>> the string `_ZGVcN2v_sin`, mangled according
to the Vector Function ABI
> >>>>>>> for Intel, and remaps the standard Vector
Function ABI name to the
> >>>>>>> non-standard name `__svml_sin2`.
> >>>>>>> 
> >>>>>>> This metadata is compatible with the proposal
"Proposal for function
> >>>>>>> vectorization and loop vectorization with
function calls",[^3] that uses
> >>>>>>> Vector Function ABI mangled names to inform
the vectorizer about the
> >>>>>>> availability of vector functions. The proposal
extends the original by
> >>>>>>> allowing the explicit mapping of the Vector
Function ABI mangled name to
> >>>>>>> a non-standard name, which allows the use of
existing vector libraries.
> >>>>>>> 
> >>>>>>> The `vector-variant` attribute needs to be
attached on a per-call basis
> >>>>>>> to avoid conflicts when merging modules with
different vector variants.
> >>>>>>> 
> >>>>>>> The query infrastructure: SVFS
{#infrastructure}
> >>>>>>> ------------------------------
> >>>>>>> 
> >>>>>>> The Search Vector Function System (SVFS) is
constructed from an
> >>>>>>> `llvm::Module` instance so it can create
function definitions. The SVFS
> >>>>>>> exposes an API with two methods.
> >>>>>>> 
> >>>>>>> ### `SVFS::isFunctionVectorizable`
> >>>>>>> 
> >>>>>>> This method queries the avilability of a
vectorized version of a
> >>>>>>> function. The signature of the method is as
follows.
> >>>>>>> 
> >>>>>>>      bool
isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap Params);
> >>>>>>> 
> >>>>>>> The method determine the availability of
vector version of the function
> >>>>>>> invoked by the `Call` parameter by looking at
the `vector-variant`
> >>>>>>> metadata.
> >>>>>>> 
> >>>>>>> The `Params` argument is a map that associates
the position of a
> >>>>>>> parameter in the `CallInst` to its
`ParameterType` descriptor. The
> >>>>>>> `ParameterType` descriptor holds information
about the shape of the
> >>>>>>> correspondend parameter in the signature of
the vector function. This
> >>>>>>> `ParamaterType` is used to query the SVMS
about the availability of
> >>>>>>> vector version that have `linear`, `uniform`
or `align` parameters (in
> >>>>>>> the sense of OpenMP 4.0 and onwards).
> >>>>>>> 
> >>>>>>> The method `isFunctionVectorizable`, when
invoked with an empty
> >>>>>>> `ParTypeMap`, is equivalent to the
`TargetLibraryInfo` method
> >>>>>>> `isFunctionVectorizable(StrinRef Name)`.
> >>>>>>> 
> >>>>>>> ### `SVFS::getVectorizedFunction`
> >>>>>>> 
> >>>>>>> This method returns the vector function
declaration that correspond to
> >>>>>>> the needs of the vectorization technique that
is being run.
> >>>>>>> 
> >>>>>>> The signature of the function is as follows.
> >>>>>>> 
> >>>>>>>      std::pair<llvm::FunctionType *,
std::string> getVectorizedFunction(
> >>>>>>>        llvm::CallInst * Call, unsigned VF,
bool IsMasked, ParTypeSet Params);
> >>>>>>> 
> >>>>>>> The `Call` parameter is the call instance that
is being vectorized, the
> >>>>>>> `VF` parameter represent the vectorization
factor (how many lanes), the
> >>>>>>> `IsMasked` parameter decides whether or not
the signature of the vector
> >>>>>>> function is required to have a mask parameter,
the `Params` parameter
> >>>>>>> describes the shape of the vector function as
in the
> >>>>>>> `isFunctionVectorizable` method.
> >>>>>>> 
> >>>>>>> The methods uses the `vector-variant` metadata
and returns the function
> >>>>>>> signature and the name of the function based
on the input parameters.
> >>>>>>> 
> >>>>>>> The SVFS can add new function definitions, in
the same module as the
> >>>>>>> `Call`, to provide vector functions that are
not present within the
> >>>>>>> vector-variant metadata. For example, if a
library provides a vector
> >>>>>>> version of a function with a vectorization
factor of 2, but the
> >>>>>>> vectorizer is requesting a vectorization
factor of 4, the SVFS is
> >>>>>>> allowed to create a definition that calls the
2-lane version twice. This
> >>>>>>> capability applies similarly for providing
masked and unmasked versions
> >>>>>>> when the request does not match what is
available in the library.
> >>>>>>> 
> >>>>>>> This method is equivalent to the TLI method
> >>>>>>> `StringRef getVectorizedFunction(StringRef F,
unsigned VF) const;`.
> >>>>>>> 
> >>>>>>> Notice that to fully support OpenMP
vectorization we need to think about
> >>>>>>> a fuzzy matching mechanism that is able to
select a candidate in the
> >>>>>>> calling context. However, this proposal is
intended for scalar-to-vector
> >>>>>>> mappings of math-like functions that are most
likely to associate a
> >>>>>>> unique vector candidate in most contexts.
Therefore, extending this
> >>>>>>> behavior to a generic one is an aspect of the
implementation that will
> >>>>>>> be treated in a separate RFC about the
vectorization pass.
> >>>>>>> 
> >>>>>>> ### Scalable vectorization
> >>>>>>> 
> >>>>>>> Both methods of the SVFS API will be extended
with a boolean parameter
> >>>>>>> to specify whether scalable signatures are
needed by the user of the
> >>>>>>> SVFS.
> >>>>>>> 
> >>>>>>> Changes in clang {#clang}
> >>>>>>> ----------------
> >>>>>>> 
> >>>>>>> We use clang to generate the metadata
described above.
> >>>>>>> 
> >>>>>>> In the compilation unit, the vector function
definition or declaration
> >>>>>>> must be visible and associated to the scalar
version via the
> >>>>>>> `#pragma clang declare variant` according to
the rule defined by the
> >>>>>>> correspondent `#pragma omp declare variant`
defined in OpenMP 5.0, as in
> >>>>>>> the following example.
> >>>>>>> 
> >>>>>>>      #pragma clang declare
variant(vector_sinf) \
> >>>>>>>     
match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})
> >>>>>>>      extern float sinf(float);
> >>>>>>> 
> >>>>>>>      float32x4_t vector_sinf(float32x4_t x);
> >>>>>>> 
> >>>>>>> The `construct` set in the directive, together
with the `device` set, is
> >>>>>>> used to generate the vector mangled name to be
used in the
> >>>>>>> `vector-variant` attribute, for example
`_ZGVnN2v_sin`, when targeting
> >>>>>>> AArch64 Advanced SIMD code generation. The
rule for mangling the name of
> >>>>>>> the scalar function in the vector name are
defined in the the Vector
> >>>>>>> Function ABI specification of the target.
> >>>>>>> 
> >>>>>>> The part of the vector-variant attribute that
redirects the call to
> >>>>>>> `vector_sinf` is derived from the `variant-id`
specified in the
> >>>>>>> `variant` clause.
> >>>>>>> 
> >>>>>>> Summary
> >>>>>>> ======> >>>>>>> 
> >>>>>>> New `clang` directive in clang
> >>>>>>> ------------------------------
> >>>>>>> 
> >>>>>>> `#pragma omp declare variant`, same as
`#pragma omp declare variant`
> >>>>>>> restricted to the `simd` context selector,
from OpenMP 5.0+.
> >>>>>>> 
> >>>>>>> Option behavior, and interaction with OpenMP
> >>>>>>> --------------------------------------------
> >>>>>>> 
> >>>>>>> The behavior described below makes sure that
> >>>>>>> `#pragma cland declare variant` function
vectorization and OpenMP
> >>>>>>> function vectorization are orthogonal.
> >>>>>>> 
> >>>>>>> `-fclang-declare-variant`
> >>>>>>> 
> >>>>>>> :   The `#pragma clang declare variant`
directives are parsed and used
> >>>>>>>      to populate the `vector-variant`
attribute.
> >>>>>>> 
> >>>>>>> `-fopenmp[-simd]`
> >>>>>>> 
> >>>>>>> :   The `#pragma omp declare variant`
directives are parsed and used to
> >>>>>>>      populate the `vector-variant` attribute.
> >>>>>>> 
> >>>>>>> `-fopenmp[-simd]`and
`-fno-clang-declare-variant`
> >>>>>>> 
> >>>>>>> :   The directive `#pragma omp declare
variant` is used to populate the
> >>>>>>>      `vector-variant` attribute in IR. The
directive
> >>>>>>>      `#pragma   clang declare variant` are
ignored.
> >>>>>>> 
> >>>>>>> [^1]:
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
> >>>>>>> 
> >>>>>>> [^2]: Vector Function ABI for x86:
> >>>>>>>     
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
> >>>>>>>      Vector Function ABI for AArch64:
> >>>>>>>     
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi
> >>>>>>> 
> >>>>>>> [^3]:
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
> >>>>>>> 
> >>>>>>>
_______________________________________________
> >>>>>>> LLVM Developers mailing list
> >>>>>>> llvm-dev at lists.llvm.org
> >>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>>>> _______________________________________________
> >>>>>> cfe-dev mailing list
> >>>>>> cfe-dev at lists.llvm.org
> >>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>> -- 
> >>> Hal Finkel
> >>> Lead, Compiler Technology and Programming Languages
> >>> Leadership Computing Facility
> >>> Argonne National Laboratory
> >>> 
> >>> _______________________________________________
> >>> cfe-dev mailing list
> >>> cfe-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> 
-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190531/a6311de4/attachment-0001.sig>

Renato Golin via llvm-dev

2019-May-31 16:57 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

On Fri, 31 May 2019 at 17:19, Francesco Petrogalli via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> TOPIC 2: metadata vs attribute
>
> Also, @Renato expressed concern that metadata might be dropped by
optimization passes - would using attributes prevent that?
I think it would, thanks!

> TOPIC 3: "there are no special arguments / flags / status regs that
are used / changed in the vector version that the compiler will have to
"just know”
>
> I believe that this concern is raised by the problem of handling FP
exceptions? If that’s the case, the compiler is not allowed to do any assumption
on the vector function about that, and treat it with the same knowledge of any
other function, depending on the visibility it has in the compilation unit.
@Renato, does this answer your question?
So, if there are side-effects on the scalar version, there will be
also in the vector version? Unfortunately, this does not work in
practice by default (different units have different rules).

If we want to enforce this, it's up to the library implementation to
provide similar behaviour (either hide or create side-effects) and it
will be "library error" if they do not.

This seems a bit heavy handed, though...

--renato

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - May 2019 - [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

Apparently Analagous Threads