thr3ads.net - llvm dev - [llvm-dev] RFC: Interface user provided vector functions with the vectorizer. [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Saito, Hideki via llvm-dev

2019-Jun-24 18:32 UTC

[llvm-dev] RFC: Interface user provided vector functions with the vectorizer.

>Thank you everybody for their input, and for your patience. This is proving
harder than expected! :)
Thank you for doing the hard part of the work.

Hideki

-----Original Message-----
From: Francesco Petrogalli [mailto:Francesco.Petrogalli at arm.com] 
Sent: Monday, June 24, 2019 11:26 AM
To: Saito, Hideki <hideki.saito at intel.com>
Cc: Doerfert, Johannes <jdoerfert at anl.gov>; Tian, Xinmin
<xinmin.tian at intel.com>; Simon Moll <moll at cs.uni-saarland.de>;
LLVM Development List <llvm-dev at lists.llvm.org>; Clang Dev <cfe-dev
at lists.llvm.org>; Renato Golin <rengolin at gmail.com>; Finkel, Hal
J. <hfinkel at anl.gov>; Andrea Bocci <andrea.bocci at cern.ch>;
Elovikov, Andrei <andrei.elovikov at intel.com>; Alexey Bataev
<a.bataev at hotmail.com>; nd <nd at arm.com>; Roman Lebedev
<lebedev.ri at gmail.com>; Philip Reames <listmail at
philipreames.com>; Shawn Landden <slandden at gmail.com>
Subject: Re: RFC: Interface user provided vector functions with the vectorizer.


> On Jun 24, 2019, at 12:20 PM, Saito, Hideki <hideki.saito at
intel.com> wrote:
> 
>  
> For example, Type 2 case, scalar-foo used call by value while vector-foo
used call by ref.

Yes, the call-by-ref is an important feature that can be used with linear
modifiers.
> The question Johannes is asking is whether we can decipher that after the
fact, only by looking at the two function signatures, or need some more info
(what kind, what’s minimal)?
In the draft implementation we have been working on, we came up with a ParamKind
enum that holds the following information:

enum class ParamKind
{
    Vector,
    OMP_Linear,
    OMP_LinearRef,
    OMP_LinearVal,
    OMP_LinearUVal,
    OMP_LinearPos,
    OMP_LinearValPos,
    OMP_LinearRefPos,
    OMP_LinearUValPos,
    OMP_Uniform
};

The enum is used to classify the `ParameterType`, a class that is attached to
each parameter and describes things like uniformity, linearity (with and without
modifiers). The list of parameter types is then stored in the
VectorFunctionShape:

struct VectorFunctionShape {
            unsigned VF; // Vectorization factor 
            bool IsMasked;
            bool IsScalable;
            ISAKind ISA;
            std::vector<ParamType> Parameters; };

Here OpenMP is used to classify the parameter types (OMP_*), but nothing
prevents the ParamKind and the VectorFunctionShape to be extended to be able to
handle other vector paradigms.

I think that we have to handle all 9 different linear and uniform cases in the
ParamKind separately, because there is no way to get such information from the
vector function shape.
> I think we need to list up cases of interest, and for each vector ABI of
interest, we need to work on the requirements and determine whether deciphering
after the fact is feasible.
>  

I will provide the cases for AArch64.

> I think we can make further progress on trivial cases (where FE doesn’t
“change” type) while we continue working out the details on non-trivial cases.
>  

I think we can agree to proceed these way. The SVFS and the VectorShapeInfo are
(I believe) designed with extendibility as a requirements. If we need more
metadata to represent some specific cases, we will extend the SVFS internal to
handle such metadata.

I will update the RFC by moving the vector function signature generation in the
FE.

Thank you everybody for their input, and for your patience. This is proving
harder than expected! :)

Francesco
> Thanks,
> Hideki
>  
> From: Doerfert, Johannes [mailto:jdoerfert at anl.gov]
> Sent: Monday, June 24, 2019 9:21 AM
> To: Francesco Petrogalli <Francesco.Petrogalli at arm.com>; Tian,
Xinmin
> <xinmin.tian at intel.com>
> Cc: Saito, Hideki <hideki.saito at intel.com>; Simon Moll 
> <moll at cs.uni-saarland.de>; LLVM Development List 
> <llvm-dev at lists.llvm.org>; Clang Dev <cfe-dev at
lists.llvm.org>; Renato
> Golin <rengolin at gmail.com>; Finkel, Hal J. <hfinkel at
anl.gov>; Andrea
> Bocci <andrea.bocci at cern.ch>; Elovikov, Andrei 
> <andrei.elovikov at intel.com>; Alexey Bataev <a.bataev at
hotmail.com>; nd
> <nd at arm.com>; Roman Lebedev <lebedev.ri at gmail.com>;
Philip Reames
> <listmail at philipreames.com>; Shawn Landden <slandden at
gmail.com>
> Subject: Re: RFC: Interface user provided vector functions with the
vectorizer.
>  
> I mean, the FE will create only one of the 3 vector versions matching the
one we want for a given vector length, wouldn't it? The question now is: can
we with the scalar and one vector version correctly vectorize the call. If the
answer is no, what is the minimal amount of information, in addition to the two
version, we would need?
> 
> Get Outlook for Android
>  
> From: Francesco Petrogalli <Francesco.Petrogalli at arm.com>
> Sent: Monday, June 24, 2019 6:06:12 PM
> To: Tian, Xinmin
> Cc: Doerfert, Johannes; Saito, Hideki; Simon Moll; LLVM Development 
> List; Clang Dev; Renato Golin; Finkel, Hal J.; Andrea Bocci; Elovikov, 
> Andrei; Alexey Bataev; nd; Roman Lebedev; Philip Reames; Shawn Landden
> Subject: Re: RFC: Interface user provided vector functions with the
vectorizer.
>  
> 
> 
> > On Jun 24, 2019, at 10:53 AM, Tian, Xinmin <xinmin.tian at
intel.com> wrote:
> > 
> > To me, it is also an issue related to SIMD signature matching when the
vectorizer kicks in. Losing info from FE to BE is not good in general.
> >  
> 
> Yes, we cannot loose such information. In particular, the three examples I
reported are all generating i64 in the scalar function signature:
> 
> // Type 1
> typedef _Complex int S;
> 
> // Type 2
> typedef struct x{
> int a;
> int b;
> } S;
> 
> // Type 3
> typedef uint64_t S;
> 
> S foo(S a, S b) {
> return ...;
> }
> 
> 
> On AArch64, the correspondent vector function signature in the three cases
would be (for 2-lane unmasked vectorization):
> 
> // Type 1:
> 
> <4 x int> vectorized_foo(<4 x int>, <4 x int>)
> 
> // Type 2:
> 
> %a = type struct {I 32, i32}
> 
> <2 x %a* > vectorized_foo(<2 x %a*> , <2 x %a*>)
> 
> // Type 3:
> 
> <2 x i64> vectorized_foo(<2 x i64>, <2 x i64)
> 
> To make sure that the vectorizer knows how to map the scalar function
parameters to the vector ones, we have to make sure that the original signature 
information is stored somewhere.
> 
> I will work on this, and provide examples.
> 
> Suggestions are welcome.
> 
> Thank you
> 
> Francesco

Tian, Xinmin via llvm-dev

2019-Jun-24 18:34 UTC

head link

[llvm-dev] RFC: Interface user provided vector functions with the vectorizer.

+1. 

-----Original Message-----
From: Saito, Hideki 
Sent: Monday, June 24, 2019 11:32 AM
To: Francesco Petrogalli <Francesco.Petrogalli at arm.com>
Cc: Doerfert, Johannes <jdoerfert at anl.gov>; Tian, Xinmin
<xinmin.tian at intel.com>; Simon Moll <moll at cs.uni-saarland.de>;
LLVM Development List <llvm-dev at lists.llvm.org>; Clang Dev <cfe-dev
at lists.llvm.org>; Renato Golin <rengolin at gmail.com>; Finkel, Hal
J. <hfinkel at anl.gov>; Andrea Bocci <andrea.bocci at cern.ch>;
Elovikov, Andrei <andrei.elovikov at intel.com>; Alexey Bataev
<a.bataev at hotmail.com>; nd <nd at arm.com>; Roman Lebedev
<lebedev.ri at gmail.com>; Philip Reames <listmail at
philipreames.com>; Shawn Landden <slandden at gmail.com>
Subject: RE: RFC: Interface user provided vector functions with the vectorizer.

>Thank you everybody for their input, and for your patience. This is proving
harder than expected! :)
Thank you for doing the hard part of the work.

Hideki

-----Original Message-----
From: Francesco Petrogalli [mailto:Francesco.Petrogalli at arm.com] 
Sent: Monday, June 24, 2019 11:26 AM
To: Saito, Hideki <hideki.saito at intel.com>
Cc: Doerfert, Johannes <jdoerfert at anl.gov>; Tian, Xinmin
<xinmin.tian at intel.com>; Simon Moll <moll at cs.uni-saarland.de>;
LLVM Development List <llvm-dev at lists.llvm.org>; Clang Dev <cfe-dev
at lists.llvm.org>; Renato Golin <rengolin at gmail.com>; Finkel, Hal
J. <hfinkel at anl.gov>; Andrea Bocci <andrea.bocci at cern.ch>;
Elovikov, Andrei <andrei.elovikov at intel.com>; Alexey Bataev
<a.bataev at hotmail.com>; nd <nd at arm.com>; Roman Lebedev
<lebedev.ri at gmail.com>; Philip Reames <listmail at
philipreames.com>; Shawn Landden <slandden at gmail.com>
Subject: Re: RFC: Interface user provided vector functions with the vectorizer.


> On Jun 24, 2019, at 12:20 PM, Saito, Hideki <hideki.saito at
intel.com> wrote:
> 
>  
> For example, Type 2 case, scalar-foo used call by value while vector-foo
used call by ref.

Yes, the call-by-ref is an important feature that can be used with linear
modifiers.
> The question Johannes is asking is whether we can decipher that after the
fact, only by looking at the two function signatures, or need some more info
(what kind, what’s minimal)?
In the draft implementation we have been working on, we came up with a ParamKind
enum that holds the following information:

enum class ParamKind
{
    Vector,
    OMP_Linear,
    OMP_LinearRef,
    OMP_LinearVal,
    OMP_LinearUVal,
    OMP_LinearPos,
    OMP_LinearValPos,
    OMP_LinearRefPos,
    OMP_LinearUValPos,
    OMP_Uniform
};

The enum is used to classify the `ParameterType`, a class that is attached to
each parameter and describes things like uniformity, linearity (with and without
modifiers). The list of parameter types is then stored in the
VectorFunctionShape:

struct VectorFunctionShape {
            unsigned VF; // Vectorization factor 
            bool IsMasked;
            bool IsScalable;
            ISAKind ISA;
            std::vector<ParamType> Parameters; };

Here OpenMP is used to classify the parameter types (OMP_*), but nothing
prevents the ParamKind and the VectorFunctionShape to be extended to be able to
handle other vector paradigms.

I think that we have to handle all 9 different linear and uniform cases in the
ParamKind separately, because there is no way to get such information from the
vector function shape.
> I think we need to list up cases of interest, and for each vector ABI of
interest, we need to work on the requirements and determine whether deciphering
after the fact is feasible.
>  

I will provide the cases for AArch64.

> I think we can make further progress on trivial cases (where FE doesn’t
“change” type) while we continue working out the details on non-trivial cases.
>  

I think we can agree to proceed these way. The SVFS and the VectorShapeInfo are
(I believe) designed with extendibility as a requirements. If we need more
metadata to represent some specific cases, we will extend the SVFS internal to
handle such metadata.

I will update the RFC by moving the vector function signature generation in the
FE.

Thank you everybody for their input, and for your patience. This is proving
harder than expected! :)

Francesco
> Thanks,
> Hideki
>  
> From: Doerfert, Johannes [mailto:jdoerfert at anl.gov]
> Sent: Monday, June 24, 2019 9:21 AM
> To: Francesco Petrogalli <Francesco.Petrogalli at arm.com>; Tian,
Xinmin
> <xinmin.tian at intel.com>
> Cc: Saito, Hideki <hideki.saito at intel.com>; Simon Moll 
> <moll at cs.uni-saarland.de>; LLVM Development List 
> <llvm-dev at lists.llvm.org>; Clang Dev <cfe-dev at
lists.llvm.org>; Renato
> Golin <rengolin at gmail.com>; Finkel, Hal J. <hfinkel at
anl.gov>; Andrea
> Bocci <andrea.bocci at cern.ch>; Elovikov, Andrei 
> <andrei.elovikov at intel.com>; Alexey Bataev <a.bataev at
hotmail.com>; nd
> <nd at arm.com>; Roman Lebedev <lebedev.ri at gmail.com>;
Philip Reames
> <listmail at philipreames.com>; Shawn Landden <slandden at
gmail.com>
> Subject: Re: RFC: Interface user provided vector functions with the
vectorizer.
>  
> I mean, the FE will create only one of the 3 vector versions matching the
one we want for a given vector length, wouldn't it? The question now is: can
we with the scalar and one vector version correctly vectorize the call. If the
answer is no, what is the minimal amount of information, in addition to the two
version, we would need?
> 
> Get Outlook for Android
>  
> From: Francesco Petrogalli <Francesco.Petrogalli at arm.com>
> Sent: Monday, June 24, 2019 6:06:12 PM
> To: Tian, Xinmin
> Cc: Doerfert, Johannes; Saito, Hideki; Simon Moll; LLVM Development 
> List; Clang Dev; Renato Golin; Finkel, Hal J.; Andrea Bocci; Elovikov, 
> Andrei; Alexey Bataev; nd; Roman Lebedev; Philip Reames; Shawn Landden
> Subject: Re: RFC: Interface user provided vector functions with the
vectorizer.
>  
> 
> 
> > On Jun 24, 2019, at 10:53 AM, Tian, Xinmin <xinmin.tian at
intel.com> wrote:
> > 
> > To me, it is also an issue related to SIMD signature matching when the
vectorizer kicks in. Losing info from FE to BE is not good in general.
> >  
> 
> Yes, we cannot loose such information. In particular, the three examples I
reported are all generating i64 in the scalar function signature:
> 
> // Type 1
> typedef _Complex int S;
> 
> // Type 2
> typedef struct x{
> int a;
> int b;
> } S;
> 
> // Type 3
> typedef uint64_t S;
> 
> S foo(S a, S b) {
> return ...;
> }
> 
> 
> On AArch64, the correspondent vector function signature in the three cases
would be (for 2-lane unmasked vectorization):
> 
> // Type 1:
> 
> <4 x int> vectorized_foo(<4 x int>, <4 x int>)
> 
> // Type 2:
> 
> %a = type struct {I 32, i32}
> 
> <2 x %a* > vectorized_foo(<2 x %a*> , <2 x %a*>)
> 
> // Type 3:
> 
> <2 x i64> vectorized_foo(<2 x i64>, <2 x i64)
> 
> To make sure that the vectorizer knows how to map the scalar function
parameters to the vector ones, we have to make sure that the original signature 
information is stored somewhere.
> 
> I will work on this, and provide examples.
> 
> Suggestions are welcome.
> 
> Thank you
> 
> Francesco

Francesco Petrogalli via llvm-dev

2019-Jun-28 20:16 UTC

head link

[llvm-dev] RFC: Interface user provided vector functions with the vectorizer.

Dear all,

I have updated the proposal with the changes that are required to be
able to generate the vector function signature in the front-end instead
of the back-end.

I have updated the example, showcasing the use of the
`llvm.compiler.used` intrinsics.

I have also mentioned that the `SVFS` should be wrapped in an analysis
pass. I haven't proposed a brand new pass because I suspect that there
is already one that could handle the information of the SVFS. Please
point me at such pass if it exists.

I have also CCed Sumedh, who is working on the implementation of the
SVFS described here.

Kind regards,

Francesco

*** DRAFT OF THE PROPOSAL ***

SCOPE OF THE RFC : Interface user provided vector functions with the vectorizer.
===============================================================================
Because the users care about portability (across compilers, libraries
and systems), I believe we have to base sour solution on a standard that
describes the mapping from the scalar function to the vector function.

Because OpenMP is standard and widely used, we should base our solution
on the mechanisms that the standard provides, via the directives
`declare simd` and `declare variant`, the latter when used in with the
`simd` trait in the `construct` set.

Please notice that:

1.  The scope of the proposal is not implementing full support for
    `pragma omp declare variant`.
2.  The scope of the proposal is not enabling the vectorizer to do new
    kind of vectorizations (e.g. RV-like vectorization described by
    Simon).
3.  The proposal aims to be extendible wrt 1. and 2.
4.  The IR attribute introduced in this proposal is equivalent to the
    one needed for the VecClone pass under development in
    https://reviews.llvm.org/D22792

CLANG COMPONENTS
===============
A C function attribute, `clang_declare_simd_variant`, to attach to the
scalar version. The attribute provides enough information to the
compiler about the vector shape of the user defined function. The vector
shapes handled by the attribute are those handled by the OpenMP standard
via `declare simd` (and no more than that).

1.  The function attribute handling in clang is crafted with the
    requirement that it will be possible to re-use the same components
    for the info generated by `declare variant` when used with a `simd`
    traits in the `construct` set.
2.  The attribute allows orthogonality with the vectorization that is
    done via OpenMP: the user vector function is still exposed for
    vectorization when not using `-fopenmp-[simd]` once the
    `declare simd` and `declare variant` directive of OpenMP will be
    available in the front-end.

C function attribute: `clang_declare_simd_variant`
--------------------------------------------------

The definition of this attribute has been crafted to match the semantics
of `declare variant` for a `simd` construct described in OpenMP 5.0. I
have added only the traits of the `device` set, `isa` and `arch`, which
I believe are enough to cover for the use case of this proposal. If that
is not the case, please provide an example, extending the attribute will
be easy even once the current one is implemented.

    clang_declare_simd_variant(<variant-func-id>, <simd clauses>{,
<context selector clauses>})

    <variant-func-id>:= The name of a function variant that is a base
language identifier, or,
                        for C++, a template-id.

    <simd clauses> := <simdlen>, <mask>{, <optional simd
clauses>}

    <simdlen> := simdlen(<positive number>) |
simdlen("scalable")

    <mask>    := inbranch | notinbranch

    <optional simd clauses> := <linear clause> 
                             | <uniform clause>
                             | <align clause>  | {,<optional simd
clauses>}

    <linear clause>  := linear_ref(<var>,<step>)
                      | linear_var(<var>, <step>)
                      | linear_uval(<var>, <step>)
                      | linear(<var>, <step>)

    <step> := <var> | <non zero number>

    <uniform clause> := uniform(<var>)

    <align clause>   := align(<var>, <positive number>)

    <var> := Name of a parameter in the scalar function
declaration/definition

    <non zero number> := ... | -2 | -1 | 1 | 2 | ...

    <positive number> := 1 | 2 | 3 | ...

    <context selector clauses> := {<isa>}{,} {<arch>}

    <isa> := isa(target-specific-value)

    <arch> := arch(target-specific-value)

LLVM COMPONENTS:
===============
VectorFunctionShape class
-------------------------

The object `VectorFunctionShape` contains the information about the kind
of vectorization available for an `llvm::CallInst`.

The object `VectorFunctionShape` must contain the following information:

1.  Vectorization Factor (or number or concurrent lanes executed by the
    SIMD version of the function). Encoded by unsigned integer.
2.  Whether the vector function is requested for scalable vectorization,
    encoded by a boolean.
3.  Information about masking / no masking, encoded by a boolean.
4.  Information about the parameters, encoded in a container that
    carries objects of type `ParamaterType`, to describe features like
    `linear` and `uniform`. This parameter type can be extended to
    represent concepts that are not handled by OpenMP.
5.  Vector ISA, used in the implementation of the vector function.

The `VectorFunctionShape` class can be extended in the future to include
new vectorization kinds (for example the RV-like vectorization of the
Region Vectorizer), or to add more context information that might come
from other uses of OpenMP `declare variant`, or to add new Vector
Function ABIs not based on OpenMP. Such information can be retrieved by
attributes that will be added to describe the `llvm::CallInst` instance.

IR Attribute
------------

We define a `vector-function-abi-variant` attribute that lists the
mangled names produced via the mangling function of the Vector Function
ABI rules.

    vector-function-abi-variant = "abi_mangled_name_01,
abi_mangled_name_02(user_redirection),..."

1.  Because we use only OpenMP `declare simd` vectorization, and because
    we require a vector Function ABI, we make this explicit in the name
    of the attribute.
2.  Because the Vector Function ABIs encode all the information needed
    to know the vectorization shape of the vector function in the
    mangled names, we provide the mangled name via the attribute.
3.  Function names redirection is specified by enclosing the name of the
    redirection in parenthesis, as in
    `abi_mangled_name_02(user_redirection)`.

The IR attribute is used in conjunction with the vector function
declarations or definitions that are available in the module. Each
mangled name in the `vector-function-abi-attribute` is be associated to
a correspondent declaration/definition in the module. Such definition is
provided by the front-end. The vector function declaration or definition
is passed as an argument to the `llvm.compiler.used` intrinsic to
prevent the compiler from removing it from the module (for example when
the OpenMP mapping mechanism is used via C header file).

We decided to make the vector function signature explicit in IR by
creating it with the front-end, because we have found some cases for
which it is impossible to use the backend to reconstruct the vector
function signature out of the Vector Function ABI mangled name and the
signature of the scalar function. This is due to the fact that the
layout of some C types is lost in the C-to-IR process.

As an example, the following three types can not be distinguished at IR
level, because all cases are mapped to `i64` in the signature of the
function `foo`. In fact, according to the rules of the Vector Function
ABI for AArch64, the three types, for a 2-lane vectorization factor,
will map respectively to `<4 x int>`, `<2 x
(pointer_to_the_struct)>`,
and `<2 x i64>`.

    // Type 1
    typedef _Complex int S;

    // Type 2 
    typedef struct x{
    int a;
    int b;
    } S;

    // Type 3
    typedef uint64_t S;

    S foo(S a, S b) {
    return ...;
    }

On of the problems that was raised during the discussion around these
three types was how we could make sure that the vectorizer is able to
determine how to map the values used in the scalar functions invocation
to the values that can be used in the vecgtor signature.

I was thiking to store the information needed for this in the parameter
attributes of the function, but I realised that just the size of the
scalar parameter might be enough, therefore I don't think we need to add
new attributes to handle this.

I illustrate my reasoning with an example, in which we want to vectorize
the "flattened" signature `i64 foo(i64, i64)` to a 2-lane vector
function. All exmaples are done for Advanced SIMD, with no mask
parameter.

I won't discuss `Type 3` becasue the mapping from scalar parameters from
vector parameters is trivial.

In case of `Type 1`, we will see a 2-lane vector function associated to
`foo` with signature `<4 x i32>(<4 x i32>, <4 x i32>)` (the
knowledge of
being a 2-lane vector function comes from the `<vlen>` token in the
magled name, which is always present even in case of a used defined
custom name).

The size of the scalar parameter is 8, the size of the vector parameter
is 16, therefore the fact that we are doing a 2-lane vectorization is
enough to tell the vectorizer that the two instances of `i64` values
needs to be mapped to the high and low half of the `<4 x i32>` type.

In case of `Type 2`, what we have is a situation in which two objects of
type `i64` (the scalar values) need to be mapped to two pointers, which
are pointing to instances of the same size of the scalar size. This is
enough information for the vectorizer to be able to generate the code
that can do this properly. The case of `Type 2` is distinguishable from
`Type 3` because of the use of pointers, and it is distinguishable from
the `linear` case that use references (in which vectors of pointers are
needed) because the token in the mangled name is different (`v` is used
for vector parameters that the vectorizer must to pass by value, while
the linear references use different tokens for vector parameters.).

Query interface: Search Vector Function System (SVFS)
-----------------------------------------------------

An interface that can be queried by the LLVM components to understand
whether or not a scalar function can be vectorized, and that retrieves
the vector function to be used if such vector shape is available.

1.  This component is going to be unrelated to OpenMP.
2.  This component will use internally the IR attribute defined in the
    previous section, but it will not expose any aspect of the Vector
    Function ABI via its interface.

The interface provides two methods.

    std::vector<VectorFunctionShape>
SVFS::isFunctionVectorizable(llvm::CallInst * Call);

    llvm::Function * SVFS::getVectorizedFunction(llvm::CallInst * Call,
VectorFunctionShape Info);

The first method is used to list all the vector shapes that available
and attached to a scalar function. An empty results means that no vector
versions are available.

The second method retrieves the information needed to build a call to a
vector function with a specific `VectorFunctionShape` info.

The SVFS is wrapped in an analysis pass that can be retrieved in other
passes.

(SELF) ASSESSMENT ON EXTENDIBILITY
=================================
1.  Extending the C function attribute `clang_declare_simd_variant` to
    new Vector Function ABIs that use OpenMP will be straightforward
    because the attribute is tight to such ABIs and OpenMP.
2.  The C attribute `clang_declare_simd_variant` and the
    `declare variant` directive used for the `simd` trait will be
    sharing the internals in clang, so adding the OpenMP functionality
    for `simd` traits will be mostly handling the directive in the
    OpenMP parser. How this should be done is described in
    https://clang.llvm.org/docs/InternalsManual.html\#how-to-add-an-attribute
3.  The IR attribute `vector-function-abi-variant` is not to be extended
    to represent other kind of vectorization other than those handled by
    `declare simd` and that are handled with a Vector Function ABI.
4.  The IR attribute `vector-function-abi-variant` is not defined to be
    extended to represent the information of `declare variant` in its
    totality.
5.  The IR attribute will not need to change when we will introduce non
    vector function ABI vectorization (RV-like, reductions...) or when
    we will decide to fully support `declare variant`. The information
    it carries will not need to be invalidated, but just extended with
    new attributes that will need to be handled by the
    `VectorFunctionShape` class, in a similar way the
    `llvm::FPMathOperator` does with the `llvm::FastMathFlags`, which
    operates on individual attributes to describe an overall
    functionality.
6.  The IR attribute is to be used also to provide vector function
    information via the `declare simd` directive of OpenMP (see Example
    7 below).

Examples
=======
Example 1
---------

Exposing an Advanced SIMD vector function when targeting Advanced SIMD
in AArch64.

    double foo_01(double Input)
__attribute__(clang_declare_simd_variant(“vector_foo_01", simdlen(2),
notinbranch, isa("simd"));

    // Advanced SIMD version provided by the user via an external module
    float64x2_t vector_foo_01(float64x2_t VectorInput);

    // ... loop ...
       x[i] = foo_01(y[i])

The resulting IR is:

    @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<2 x
double> (<2 x double>)* @_ZGVnN2v_foo_01 to i8*)], section
"llvm.metadata"

    declare double @foo_01(double %in) #0

    declare <2 x double> @_ZGVnN2v_foo_01(<2 x double>)

    // ... loop ...
       %xi = call double @foo_01(double %yi) #0

    attribute #0 =
{vector-abi-variant="_ZGVnN2v_foo_01(vector_foo_01)"}

Example 2
---------

Exposing an Advanced SIMD vector function when targeting Advanced SIMD
in AArch64, but with the wrong signature. The user specifies a masked
version of the function in the clauses of the attribute, the compiler
throws an error suggesting the signature expected for `vector_foo_02.`

    double foo_02(double Input)
__attribute__(clang_declare_simd_variant(“vector_foo_02", simdlen(2),
inbranch, isa("simd"));

    // Advanced SIMD version
    float64x2_t vector_foo_02(float64x2_t VectorInput); 
    // (suggested) compiler error ->                      ^ Missing mask
parameter of type `uint64x2_t`.

Example 3
---------

Targeting `sincos`-like signatures.

    void foo_03(double Input, double * Output)
__attribute__(clang_declare_simd_variant(“vector_foo_03", simdlen(2),
notinbranch, linear(Output, 1), isa("simd"));

    // Advanced SIMD version
    void vector_foo_03(float64x2_t VectorInput, double * Output); 

    // ... loop ...
       foo_03(x[i], y + i)

The resulting IR is:

    @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (void (<2 x
double>, double *)* @_ZGVnN2vl8_foo_03 to i8*)], section
"llvm.metadata"

    declare void @foo_03(double, double *) #0

    declare void @_ZGVnN2vl8_foo_03(<2 x double>, double *)

    ;; ... loop ...
    call void @foo_03(double %xi, double * %yiptr) #0

    attribute #0 =
{vector-abi-variant="_ZGVnN2vl8_foo_03(vector_foo_03)"}

Example 4
---------

Scalable vectorization targeting SVE

    double foo_04(double Input)
__attribute__(clang_declare_simd_variant(“vector_foo_04",
simdlen("scalable"), notinbranch, isa("sve"));

    // SVE version
    svfloat64_t vector_foo_04(svfloat64_t VectorInput, svbool_t Mask);

    // ... loop ...
       x[i] = foo_04(y[i])

The IR generated is:

    @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<vscale 2
x double> (<vscale 2 x double>)* @_ZGVsMxv_foo_04 to i8*)], section
"llvm.metadata"

    declare double @foo_04(double %in) #0

    declare <vscale 2 x double> @_ZGVnNxv_foo_04(<vscale 2 x
double>)

    // ... loop ...
       %xi = call double @foo_04(double %yi) #0

    attribute #0 =
{vector-abi-variant="_ZGVsMxv_foo_04(vector_foo_04)"}

Example 5
---------

Fixed length vectorization targeting SVE

    double foo_05(double Input)
__attribute__(clang_declare_simd_variant(“vector_foo_05", simdlen(4),
inbranch, isa("sve"));

    // Fixed-length SVE version
    svfloat64_t vector_foo_05(svfloat64_t VectorInput, svbool_t Mask);

The resulting IR is:

    @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<4 x
double> (<4 x double>)* @_ZGVsM4v_foo_05 to i8*)], section
"llvm.metadata"

    declare double @foo_05(double %in) #0

    declare <4 x double> @_ZGVnNxv_foo_05(<4 x double>)

    ;; ... loop ...
       %xi = call double @foo_05(double %yi) #0

    attribute #0 =
{vector-abi-variant="_ZGVsM4v_foo_04(vector_foo_04)"}

Example 6
---------

This is an x86 example, equivalent to the one provided by Andrei
Elovikow in
http://lists.llvm.org/pipermail/llvm-dev/2019-June/132885.html. Godbolt
rendering with ICC at https://godbolt.org/z/Of1NxZ

    float MyAdd(float* a, int b)
__attribute__(clang_declare_simd_variant(“MyAddVec", simdlen(8),
notinbranch, linear(a), arch("core_2nd_gen_avx"))
    { 
      return *a + b;
    }


    __m256 MyAddVec(float* v_a, __m128i v_b1, __m128i v_b2);

    // ... loop ...

      x[i] = MyAdd(a+i, b[i]);

The resulting IR is:

    @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<8 x
float> (float *, <2 x i64>, <2 x i64>)* @_ZGVbN8l4v_MyAdd to
i8*)], section "llvm.metadata"

    define float @MyAdd(float %a, i32 %b) {
      ;; return *a + b :)
    }

    define <8 x float> @_ZGVbN8l4v_MyAdd(float *, <2 x i64>, <2 x
i64>)

    ;; ... loop ...
       %xi = call float @MyAdd(float * %aiptr, i32 ) #0

    attribute #0 = {vector-abi-variant="_ZGVbN8l4v_MyAdd(MyAddVec)"}

Note: the signature of `MyAddVec` uses `<2 x i64>` instead of
`<4 x i32>`, as shown in https://godbolt.org/z/T4T8s3 (line 11). If we
would have asked the back end to generate the signature of `MyAddVec` by
looking at the signature of the scalar function and the `<vlen>=8` token
in the mangled name in the attribute, we would have end up using
`<8 x i32>` instead of two instanced of `<2 x i64>`, which would
have
been wrong.

This is another example that demonstrate that we need to generate the
vector function signatures in the front-end and not in the backend.

Example 7: showing interaction with `declare simd`
--------------------------------------------------

    #pragma omp declare simd linear(a) notinbranch
    float foo_07(float *a, int x)
__attribute__(clang_declare_simd_variant(“vector_foo_07", simdlen(4),
linear(a), notinbranch, arch("armv8.2-a+simd")) {
        return *a + x;
    }

    // Advanced SIMD version
    float32x4_t vector_foo_07(float *a, int32x4_t vx) {
    // Custom implementation.
    }

    // ... loop ...

      x[i] = foo_07(a+i, b[i]);

The resulting IR attribute is made of three symbols:

1.  `_ZGVnN2l4v_foo_07` and `_ZGVnN4l4v_foo_07`, which represent the
    ones the compiler builds by auto-vectorizing `foo_07` according to
    the rule defined in the Vector Function ABI specifications for
    AArch64.
2.  `_ZGVnN4l4v_foo_07(vector_foo_07)`, which represents the
    user-defined redirection of the 4-lane version of `foo_07` to the
    custom implementation provided by the user when targeting Advanced
    SIMD for version 8.2 of the A64 instruction set.

<!-- -->

    @llvm.compiler.used = appending global [2 x i8*] [i8* bitcast (<4 x
float> (float *, <4 x i32>)* @_ZGVnN4l4v_foo_07 to i8*), i8* bitcast
(<2 x float> (float *, <2 x i32>)* @_ZGVnN2l4v_foo_07 to i8*) ],
section "llvm.metadata"

    define <4 x float> @_ZGVnN4l4v_foo_07(float *, <4 x i32>) {
      ;; Compiler auto-vectorized version (via the VecClone pass)
    }

    define <2 x float> @_ZGVnN2l4v_foo_07(float *, <2 x i32>) {
      ;; Compiler auto-vectorized version (via the VecClone pass)
    }

    define <4 x float> @vector_foo_07(float *, <4 x i32>) {
      ;; user provided vector version
    }

    define float @foo_07(float %a, i32 %b) {
      ;; return *a + b :)
    }

    // ... loop ...
       %xi = call float @foo_07(float * %aiptr, i32 %bi) #0

    attribute #0 =
{vector-function-abi-variant="_ZGVnN2l4v_foo_07,_ZGVnN4l4v_foo_07,_ZGVnN4l4v_foo_07(vector_foo_07)"}

In this case, the body of the functions `_ZGVnN4l4v_foo_07` and
`_ZGVnN2l4v_foo_07` is auto-generated by the compiler, therefore we
might as well avoid adding them to the `@llvm.compiler.used` intrinsics.
I have left it there for consistency, let me know if you think that
there is no real reasons for requiring it, I will remove it.

llvm dev - Jun 2019 - RFC: Interface user provided vector functions with the vectorizer.

[llvm-dev] RFC: Interface user provided vector functions with the vectorizer.

[llvm-dev] RFC: Interface user provided vector functions with the vectorizer.

[llvm-dev] RFC: Interface user provided vector functions with the vectorizer.