thr3ads.net - llvm dev - [llvm-dev] Interface user provided vector functions with the vectorizer. [Jul 2019]

If this information is useful, please help other people find it:
Share via:

Sumedh Arani via llvm-dev

2019-Jul-02 20:51 UTC

[llvm-dev] Interface user provided vector functions with the vectorizer.

Greetings,

I am working on implementing the proposal in this thread. Please find the patch
for review - [https://reviews.llvm.org/D64095].

This first patch implements the SVFS(Search Vector Function System), with the
interface as described in the proposal. This initial patch will be followed by
another one that expose the SVFS via an analysis pass.

I kindly ask for feedback from everyone involved in this discussion. For now, I
have added Simon Moll and Johannes Doerfert as reviewers, as they asked
explicitly to be added.

Thank you.

--
Sumedh Arani
Research Intern | ARM Inc.
Sumedh.arani at arm.com



On 6/28/19, 15:16, "Francesco Petrogalli" <Francesco.Petrogalli at
arm.com> wrote:

    Dear all,

    I have updated the proposal with the changes that are required to be
    able to generate the vector function signature in the front-end instead
    of the back-end.

    I have updated the example, showcasing the use of the
    `llvm.compiler.used` intrinsics.

    I have also mentioned that the `SVFS` should be wrapped in an analysis
    pass. I haven't proposed a brand new pass because I suspect that there
    is already one that could handle the information of the SVFS. Please
    point me at such pass if it exists.

    I have also CCed Sumedh, who is working on the implementation of the
    SVFS described here.

    Kind regards,

    Francesco

    *** DRAFT OF THE PROPOSAL ***

    SCOPE OF THE RFC : Interface user provided vector functions with the
vectorizer.
   
===============================================================================
    Because the users care about portability (across compilers, libraries
    and systems), I believe we have to base sour solution on a standard that
    describes the mapping from the scalar function to the vector function.

    Because OpenMP is standard and widely used, we should base our solution
    on the mechanisms that the standard provides, via the directives
    `declare simd` and `declare variant`, the latter when used in with the
    `simd` trait in the `construct` set.

    Please notice that:

    1.  The scope of the proposal is not implementing full support for
        `pragma omp declare variant`.
    2.  The scope of the proposal is not enabling the vectorizer to do new
        kind of vectorizations (e.g. RV-like vectorization described by
        Simon).
    3.  The proposal aims to be extendible wrt 1. and 2.
    4.  The IR attribute introduced in this proposal is equivalent to the
        one needed for the VecClone pass under development in
        https://reviews.llvm.org/D22792

    CLANG COMPONENTS
    ===============
    A C function attribute, `clang_declare_simd_variant`, to attach to the
    scalar version. The attribute provides enough information to the
    compiler about the vector shape of the user defined function. The vector
    shapes handled by the attribute are those handled by the OpenMP standard
    via `declare simd` (and no more than that).

    1.  The function attribute handling in clang is crafted with the
        requirement that it will be possible to re-use the same components
        for the info generated by `declare variant` when used with a `simd`
        traits in the `construct` set.
    2.  The attribute allows orthogonality with the vectorization that is
        done via OpenMP: the user vector function is still exposed for
        vectorization when not using `-fopenmp-[simd]` once the
        `declare simd` and `declare variant` directive of OpenMP will be
        available in the front-end.

    C function attribute: `clang_declare_simd_variant`
    --------------------------------------------------

    The definition of this attribute has been crafted to match the semantics
    of `declare variant` for a `simd` construct described in OpenMP 5.0. I
    have added only the traits of the `device` set, `isa` and `arch`, which
    I believe are enough to cover for the use case of this proposal. If that
    is not the case, please provide an example, extending the attribute will
    be easy even once the current one is implemented.

        clang_declare_simd_variant(<variant-func-id>, <simd
clauses>{, <context selector clauses>})

        <variant-func-id>:= The name of a function variant that is a base
language identifier, or,
                            for C++, a template-id.

        <simd clauses> := <simdlen>, <mask>{, <optional
simd clauses>}

        <simdlen> := simdlen(<positive number>) |
simdlen("scalable")

        <mask>    := inbranch | notinbranch

        <optional simd clauses> := <linear clause>
                                 | <uniform clause>
                                 | <align clause>  | {,<optional simd
clauses>}

        <linear clause>  := linear_ref(<var>,<step>)
                          | linear_var(<var>, <step>)
                          | linear_uval(<var>, <step>)
                          | linear(<var>, <step>)

        <step> := <var> | <non zero number>

        <uniform clause> := uniform(<var>)

        <align clause>   := align(<var>, <positive number>)

        <var> := Name of a parameter in the scalar function
declaration/definition

        <non zero number> := ... | -2 | -1 | 1 | 2 | ...

        <positive number> := 1 | 2 | 3 | ...

        <context selector clauses> := {<isa>}{,} {<arch>}

        <isa> := isa(target-specific-value)

        <arch> := arch(target-specific-value)

    LLVM COMPONENTS:
    ===============
    VectorFunctionShape class
    -------------------------

    The object `VectorFunctionShape` contains the information about the kind
    of vectorization available for an `llvm::CallInst`.

    The object `VectorFunctionShape` must contain the following information:

    1.  Vectorization Factor (or number or concurrent lanes executed by the
        SIMD version of the function). Encoded by unsigned integer.
    2.  Whether the vector function is requested for scalable vectorization,
        encoded by a boolean.
    3.  Information about masking / no masking, encoded by a boolean.
    4.  Information about the parameters, encoded in a container that
        carries objects of type `ParamaterType`, to describe features like
        `linear` and `uniform`. This parameter type can be extended to
        represent concepts that are not handled by OpenMP.
    5.  Vector ISA, used in the implementation of the vector function.

    The `VectorFunctionShape` class can be extended in the future to include
    new vectorization kinds (for example the RV-like vectorization of the
    Region Vectorizer), or to add more context information that might come
    from other uses of OpenMP `declare variant`, or to add new Vector
    Function ABIs not based on OpenMP. Such information can be retrieved by
    attributes that will be added to describe the `llvm::CallInst` instance.

    IR Attribute
    ------------

    We define a `vector-function-abi-variant` attribute that lists the
    mangled names produced via the mangling function of the Vector Function
    ABI rules.

        vector-function-abi-variant = "abi_mangled_name_01,
abi_mangled_name_02(user_redirection),..."

    1.  Because we use only OpenMP `declare simd` vectorization, and because
        we require a vector Function ABI, we make this explicit in the name
        of the attribute.
    2.  Because the Vector Function ABIs encode all the information needed
        to know the vectorization shape of the vector function in the
        mangled names, we provide the mangled name via the attribute.
    3.  Function names redirection is specified by enclosing the name of the
        redirection in parenthesis, as in
        `abi_mangled_name_02(user_redirection)`.

    The IR attribute is used in conjunction with the vector function
    declarations or definitions that are available in the module. Each
    mangled name in the `vector-function-abi-attribute` is be associated to
    a correspondent declaration/definition in the module. Such definition is
    provided by the front-end. The vector function declaration or definition
    is passed as an argument to the `llvm.compiler.used` intrinsic to
    prevent the compiler from removing it from the module (for example when
    the OpenMP mapping mechanism is used via C header file).

    We decided to make the vector function signature explicit in IR by
    creating it with the front-end, because we have found some cases for
    which it is impossible to use the backend to reconstruct the vector
    function signature out of the Vector Function ABI mangled name and the
    signature of the scalar function. This is due to the fact that the
    layout of some C types is lost in the C-to-IR process.

    As an example, the following three types can not be distinguished at IR
    level, because all cases are mapped to `i64` in the signature of the
    function `foo`. In fact, according to the rules of the Vector Function
    ABI for AArch64, the three types, for a 2-lane vectorization factor,
    will map respectively to `<4 x int>`, `<2 x
(pointer_to_the_struct)>`,
    and `<2 x i64>`.

        // Type 1
        typedef _Complex int S;

        // Type 2
        typedef struct x{
        int a;
        int b;
        } S;

        // Type 3
        typedef uint64_t S;

        S foo(S a, S b) {
        return ...;
        }

    On of the problems that was raised during the discussion around these
    three types was how we could make sure that the vectorizer is able to
    determine how to map the values used in the scalar functions invocation
    to the values that can be used in the vecgtor signature.

    I was thiking to store the information needed for this in the parameter
    attributes of the function, but I realised that just the size of the
    scalar parameter might be enough, therefore I don't think we need to add
    new attributes to handle this.

    I illustrate my reasoning with an example, in which we want to vectorize
    the "flattened" signature `i64 foo(i64, i64)` to a 2-lane vector
    function. All exmaples are done for Advanced SIMD, with no mask
    parameter.

    I won't discuss `Type 3` becasue the mapping from scalar parameters from
    vector parameters is trivial.

    In case of `Type 1`, we will see a 2-lane vector function associated to
    `foo` with signature `<4 x i32>(<4 x i32>, <4 x i32>)`
(the knowledge of
    being a 2-lane vector function comes from the `<vlen>` token in the
    magled name, which is always present even in case of a used defined
    custom name).

    The size of the scalar parameter is 8, the size of the vector parameter
    is 16, therefore the fact that we are doing a 2-lane vectorization is
    enough to tell the vectorizer that the two instances of `i64` values
    needs to be mapped to the high and low half of the `<4 x i32>` type.

    In case of `Type 2`, what we have is a situation in which two objects of
    type `i64` (the scalar values) need to be mapped to two pointers, which
    are pointing to instances of the same size of the scalar size. This is
    enough information for the vectorizer to be able to generate the code
    that can do this properly. The case of `Type 2` is distinguishable from
    `Type 3` because of the use of pointers, and it is distinguishable from
    the `linear` case that use references (in which vectors of pointers are
    needed) because the token in the mangled name is different (`v` is used
    for vector parameters that the vectorizer must to pass by value, while
    the linear references use different tokens for vector parameters.).

    Query interface: Search Vector Function System (SVFS)
    -----------------------------------------------------

    An interface that can be queried by the LLVM components to understand
    whether or not a scalar function can be vectorized, and that retrieves
    the vector function to be used if such vector shape is available.

    1.  This component is going to be unrelated to OpenMP.
    2.  This component will use internally the IR attribute defined in the
        previous section, but it will not expose any aspect of the Vector
        Function ABI via its interface.

    The interface provides two methods.

        std::vector<VectorFunctionShape>
SVFS::isFunctionVectorizable(llvm::CallInst * Call);

        llvm::Function * SVFS::getVectorizedFunction(llvm::CallInst * Call,
VectorFunctionShape Info);

    The first method is used to list all the vector shapes that available
    and attached to a scalar function. An empty results means that no vector
    versions are available.

    The second method retrieves the information needed to build a call to a
    vector function with a specific `VectorFunctionShape` info.

    The SVFS is wrapped in an analysis pass that can be retrieved in other
    passes.

    (SELF) ASSESSMENT ON EXTENDIBILITY
    =================================
    1.  Extending the C function attribute `clang_declare_simd_variant` to
        new Vector Function ABIs that use OpenMP will be straightforward
        because the attribute is tight to such ABIs and OpenMP.
    2.  The C attribute `clang_declare_simd_variant` and the
        `declare variant` directive used for the `simd` trait will be
        sharing the internals in clang, so adding the OpenMP functionality
        for `simd` traits will be mostly handling the directive in the
        OpenMP parser. How this should be done is described in
       
https://clang.llvm.org/docs/InternalsManual.html\#how-to-add-an-attribute
    3.  The IR attribute `vector-function-abi-variant` is not to be extended
        to represent other kind of vectorization other than those handled by
        `declare simd` and that are handled with a Vector Function ABI.
    4.  The IR attribute `vector-function-abi-variant` is not defined to be
        extended to represent the information of `declare variant` in its
        totality.
    5.  The IR attribute will not need to change when we will introduce non
        vector function ABI vectorization (RV-like, reductions...) or when
        we will decide to fully support `declare variant`. The information
        it carries will not need to be invalidated, but just extended with
        new attributes that will need to be handled by the
        `VectorFunctionShape` class, in a similar way the
        `llvm::FPMathOperator` does with the `llvm::FastMathFlags`, which
        operates on individual attributes to describe an overall
        functionality.
    6.  The IR attribute is to be used also to provide vector function
        information via the `declare simd` directive of OpenMP (see Example
        7 below).

    Examples
    =======
    Example 1
    ---------

    Exposing an Advanced SIMD vector function when targeting Advanced SIMD
    in AArch64.

        double foo_01(double Input)
__attribute__(clang_declare_simd_variant(“vector_foo_01", simdlen(2),
notinbranch, isa("simd"));

        // Advanced SIMD version provided by the user via an external module
        float64x2_t vector_foo_01(float64x2_t VectorInput);

        // ... loop ...
           x[i] = foo_01(y[i])

    The resulting IR is:

        @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<2 x
double> (<2 x double>)* @_ZGVnN2v_foo_01 to i8*)], section
"llvm.metadata"

        declare double @foo_01(double %in) #0

        declare <2 x double> @_ZGVnN2v_foo_01(<2 x double>)

        // ... loop ...
           %xi = call double @foo_01(double %yi) #0

        attribute #0 =
{vector-abi-variant="_ZGVnN2v_foo_01(vector_foo_01)"}

    Example 2
    ---------

    Exposing an Advanced SIMD vector function when targeting Advanced SIMD
    in AArch64, but with the wrong signature. The user specifies a masked
    version of the function in the clauses of the attribute, the compiler
    throws an error suggesting the signature expected for `vector_foo_02.`

        double foo_02(double Input)
__attribute__(clang_declare_simd_variant(“vector_foo_02", simdlen(2),
inbranch, isa("simd"));

        // Advanced SIMD version
        float64x2_t vector_foo_02(float64x2_t VectorInput);
        // (suggested) compiler error ->                      ^ Missing mask
parameter of type `uint64x2_t`.

    Example 3
    ---------

    Targeting `sincos`-like signatures.

        void foo_03(double Input, double * Output)
__attribute__(clang_declare_simd_variant(“vector_foo_03", simdlen(2),
notinbranch, linear(Output, 1), isa("simd"));

        // Advanced SIMD version
        void vector_foo_03(float64x2_t VectorInput, double * Output);

        // ... loop ...
           foo_03(x[i], y + i)

    The resulting IR is:

        @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (void
(<2 x double>, double *)* @_ZGVnN2vl8_foo_03 to i8*)], section
"llvm.metadata"

        declare void @foo_03(double, double *) #0

        declare void @_ZGVnN2vl8_foo_03(<2 x double>, double *)

        ;; ... loop ...
        call void @foo_03(double %xi, double * %yiptr) #0

        attribute #0 =
{vector-abi-variant="_ZGVnN2vl8_foo_03(vector_foo_03)"}

    Example 4
    ---------

    Scalable vectorization targeting SVE

        double foo_04(double Input)
__attribute__(clang_declare_simd_variant(“vector_foo_04",
simdlen("scalable"), notinbranch, isa("sve"));

        // SVE version
        svfloat64_t vector_foo_04(svfloat64_t VectorInput, svbool_t Mask);

        // ... loop ...
           x[i] = foo_04(y[i])

    The IR generated is:

        @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast
(<vscale 2 x double> (<vscale 2 x double>)* @_ZGVsMxv_foo_04 to
i8*)], section "llvm.metadata"

        declare double @foo_04(double %in) #0

        declare <vscale 2 x double> @_ZGVnNxv_foo_04(<vscale 2 x
double>)

        // ... loop ...
           %xi = call double @foo_04(double %yi) #0

        attribute #0 =
{vector-abi-variant="_ZGVsMxv_foo_04(vector_foo_04)"}

    Example 5
    ---------

    Fixed length vectorization targeting SVE

        double foo_05(double Input)
__attribute__(clang_declare_simd_variant(“vector_foo_05", simdlen(4),
inbranch, isa("sve"));

        // Fixed-length SVE version
        svfloat64_t vector_foo_05(svfloat64_t VectorInput, svbool_t Mask);

    The resulting IR is:

        @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<4 x
double> (<4 x double>)* @_ZGVsM4v_foo_05 to i8*)], section
"llvm.metadata"

        declare double @foo_05(double %in) #0

        declare <4 x double> @_ZGVnNxv_foo_05(<4 x double>)

        ;; ... loop ...
           %xi = call double @foo_05(double %yi) #0

        attribute #0 =
{vector-abi-variant="_ZGVsM4v_foo_04(vector_foo_04)"}

    Example 6
    ---------

    This is an x86 example, equivalent to the one provided by Andrei
    Elovikow in
    http://lists.llvm.org/pipermail/llvm-dev/2019-June/132885.html. Godbolt
    rendering with ICC at https://godbolt.org/z/Of1NxZ

        float MyAdd(float* a, int b)
__attribute__(clang_declare_simd_variant(“MyAddVec", simdlen(8),
notinbranch, linear(a), arch("core_2nd_gen_avx"))
        {
          return *a + b;
        }


        __m256 MyAddVec(float* v_a, __m128i v_b1, __m128i v_b2);

        // ... loop ...

          x[i] = MyAdd(a+i, b[i]);

    The resulting IR is:

        @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<8 x
float> (float *, <2 x i64>, <2 x i64>)* @_ZGVbN8l4v_MyAdd to
i8*)], section "llvm.metadata"

        define float @MyAdd(float %a, i32 %b) {
          ;; return *a + b :)
        }

        define <8 x float> @_ZGVbN8l4v_MyAdd(float *, <2 x i64>,
<2 x i64>)

        ;; ... loop ...
           %xi = call float @MyAdd(float * %aiptr, i32 ) #0

        attribute #0 =
{vector-abi-variant="_ZGVbN8l4v_MyAdd(MyAddVec)"}

    Note: the signature of `MyAddVec` uses `<2 x i64>` instead of
    `<4 x i32>`, as shown in https://godbolt.org/z/T4T8s3 (line 11). If we
    would have asked the back end to generate the signature of `MyAddVec` by
    looking at the signature of the scalar function and the `<vlen>=8`
token
    in the mangled name in the attribute, we would have end up using
    `<8 x i32>` instead of two instanced of `<2 x i64>`, which would
have
    been wrong.

    This is another example that demonstrate that we need to generate the
    vector function signatures in the front-end and not in the backend.

    Example 7: showing interaction with `declare simd`
    --------------------------------------------------

        #pragma omp declare simd linear(a) notinbranch
        float foo_07(float *a, int x)
__attribute__(clang_declare_simd_variant(“vector_foo_07", simdlen(4),
linear(a), notinbranch, arch("armv8.2-a+simd")) {
            return *a + x;
        }

        // Advanced SIMD version
        float32x4_t vector_foo_07(float *a, int32x4_t vx) {
        // Custom implementation.
        }

        // ... loop ...

          x[i] = foo_07(a+i, b[i]);

    The resulting IR attribute is made of three symbols:

    1.  `_ZGVnN2l4v_foo_07` and `_ZGVnN4l4v_foo_07`, which represent the
        ones the compiler builds by auto-vectorizing `foo_07` according to
        the rule defined in the Vector Function ABI specifications for
        AArch64.
    2.  `_ZGVnN4l4v_foo_07(vector_foo_07)`, which represents the
        user-defined redirection of the 4-lane version of `foo_07` to the
        custom implementation provided by the user when targeting Advanced
        SIMD for version 8.2 of the A64 instruction set.

    <!-- -->

        @llvm.compiler.used = appending global [2 x i8*] [i8* bitcast (<4 x
float> (float *, <4 x i32>)* @_ZGVnN4l4v_foo_07 to i8*), i8* bitcast
(<2 x float> (float *, <2 x i32>)* @_ZGVnN2l4v_foo_07 to i8*) ],
section "llvm.metadata"

        define <4 x float> @_ZGVnN4l4v_foo_07(float *, <4 x i32>) {
          ;; Compiler auto-vectorized version (via the VecClone pass)
        }

        define <2 x float> @_ZGVnN2l4v_foo_07(float *, <2 x i32>) {
          ;; Compiler auto-vectorized version (via the VecClone pass)
        }

        define <4 x float> @vector_foo_07(float *, <4 x i32>) {
          ;; user provided vector version
        }

        define float @foo_07(float %a, i32 %b) {
          ;; return *a + b :)
        }

        // ... loop ...
           %xi = call float @foo_07(float * %aiptr, i32 %bi) #0

        attribute #0 =
{vector-function-abi-variant="_ZGVnN2l4v_foo_07,_ZGVnN4l4v_foo_07,_ZGVnN4l4v_foo_07(vector_foo_07)"}

    In this case, the body of the functions `_ZGVnN4l4v_foo_07` and
    `_ZGVnN2l4v_foo_07` is auto-generated by the compiler, therefore we
    might as well avoid adding them to the `@llvm.compiler.used` intrinsics.
    I have left it there for consistency, let me know if you think that
    there is no real reasons for requiring it, I will remove it.



IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.

Sumedh Arani via llvm-dev

2019-Aug-09 19:53 UTC

head link

[llvm-dev] Interface user provided vector functions with the vectorizer.

Greetings,

As suggested in the previous review, I have now split my work into two separate
patches.

I kindly ask for feedback from everyone involved in this discussion. I encourage
others as well to be a part of this review process.

Please find the following patches for review -
[https://reviews.llvm.org/D66024] - Name Demangling as specified in the Vector
Function ABI
[https://reviews.llvm.org/D66025] - SVFS implementation according to RFC:
Interface user provided vector functions with the vectorizer. Builds on top of
the previous patch.

Thank you.

--
Sumedh Arani
Research Intern | ARM Inc.
Sumedh.arani at arm.com


On 7/2/19, 15:51, "Sumedh Arani" <Sumedh.Arani at arm.com>
wrote:

    Greetings,

    I am working on implementing the proposal in this thread. Please find the
patch for review - [https://reviews.llvm.org/D64095].

    This first patch implements the SVFS(Search Vector Function System), with
the interface as described in the proposal. This initial patch will be followed
by another one that expose the SVFS via an analysis pass.

    I kindly ask for feedback from everyone involved in this discussion. For
now, I have added Simon Moll and Johannes Doerfert as reviewers, as they asked
explicitly to be added.

    Thank you.

    --
    Sumedh Arani
    Research Intern | ARM Inc.
    Sumedh.arani at arm.com



    On 6/28/19, 15:16, "Francesco Petrogalli" <Francesco.Petrogalli
at arm.com> wrote:

        Dear all,

        I have updated the proposal with the changes that are required to be
        able to generate the vector function signature in the front-end instead
        of the back-end.

        I have updated the example, showcasing the use of the
        `llvm.compiler.used` intrinsics.

        I have also mentioned that the `SVFS` should be wrapped in an analysis
        pass. I haven't proposed a brand new pass because I suspect that
there
        is already one that could handle the information of the SVFS. Please
        point me at such pass if it exists.

        I have also CCed Sumedh, who is working on the implementation of the
        SVFS described here.

        Kind regards,

        Francesco

        *** DRAFT OF THE PROPOSAL ***

        SCOPE OF THE RFC : Interface user provided vector functions with the
vectorizer.
       
===============================================================================
        Because the users care about portability (across compilers, libraries
        and systems), I believe we have to base sour solution on a standard that
        describes the mapping from the scalar function to the vector function.

        Because OpenMP is standard and widely used, we should base our solution
        on the mechanisms that the standard provides, via the directives
        `declare simd` and `declare variant`, the latter when used in with the
        `simd` trait in the `construct` set.

        Please notice that:

        1.  The scope of the proposal is not implementing full support for
            `pragma omp declare variant`.
        2.  The scope of the proposal is not enabling the vectorizer to do new
            kind of vectorizations (e.g. RV-like vectorization described by
            Simon).
        3.  The proposal aims to be extendible wrt 1. and 2.
        4.  The IR attribute introduced in this proposal is equivalent to the
            one needed for the VecClone pass under development in
            https://reviews.llvm.org/D22792

        CLANG COMPONENTS
        ===============
        A C function attribute, `clang_declare_simd_variant`, to attach to the
        scalar version. The attribute provides enough information to the
        compiler about the vector shape of the user defined function. The vector
        shapes handled by the attribute are those handled by the OpenMP standard
        via `declare simd` (and no more than that).

        1.  The function attribute handling in clang is crafted with the
            requirement that it will be possible to re-use the same components
            for the info generated by `declare variant` when used with a `simd`
            traits in the `construct` set.
        2.  The attribute allows orthogonality with the vectorization that is
            done via OpenMP: the user vector function is still exposed for
            vectorization when not using `-fopenmp-[simd]` once the
            `declare simd` and `declare variant` directive of OpenMP will be
            available in the front-end.

        C function attribute: `clang_declare_simd_variant`
        --------------------------------------------------

        The definition of this attribute has been crafted to match the semantics
        of `declare variant` for a `simd` construct described in OpenMP 5.0. I
        have added only the traits of the `device` set, `isa` and `arch`, which
        I believe are enough to cover for the use case of this proposal. If that
        is not the case, please provide an example, extending the attribute will
        be easy even once the current one is implemented.

            clang_declare_simd_variant(<variant-func-id>, <simd
clauses>{, <context selector clauses>})

            <variant-func-id>:= The name of a function variant that is a
base language identifier, or,
                                for C++, a template-id.

            <simd clauses> := <simdlen>, <mask>{, <optional
simd clauses>}

            <simdlen> := simdlen(<positive number>) |
simdlen("scalable")

            <mask>    := inbranch | notinbranch

            <optional simd clauses> := <linear clause>
                                     | <uniform clause>
                                     | <align clause>  | {,<optional
simd clauses>}

            <linear clause>  := linear_ref(<var>,<step>)
                              | linear_var(<var>, <step>)
                              | linear_uval(<var>, <step>)
                              | linear(<var>, <step>)

            <step> := <var> | <non zero number>

            <uniform clause> := uniform(<var>)

            <align clause>   := align(<var>, <positive
number>)

            <var> := Name of a parameter in the scalar function
declaration/definition

            <non zero number> := ... | -2 | -1 | 1 | 2 | ...

            <positive number> := 1 | 2 | 3 | ...

            <context selector clauses> := {<isa>}{,} {<arch>}

            <isa> := isa(target-specific-value)

            <arch> := arch(target-specific-value)

        LLVM COMPONENTS:
        ===============
        VectorFunctionShape class
        -------------------------

        The object `VectorFunctionShape` contains the information about the kind
        of vectorization available for an `llvm::CallInst`.

        The object `VectorFunctionShape` must contain the following information:

        1.  Vectorization Factor (or number or concurrent lanes executed by the
            SIMD version of the function). Encoded by unsigned integer.
        2.  Whether the vector function is requested for scalable vectorization,
            encoded by a boolean.
        3.  Information about masking / no masking, encoded by a boolean.
        4.  Information about the parameters, encoded in a container that
            carries objects of type `ParamaterType`, to describe features like
            `linear` and `uniform`. This parameter type can be extended to
            represent concepts that are not handled by OpenMP.
        5.  Vector ISA, used in the implementation of the vector function.

        The `VectorFunctionShape` class can be extended in the future to include
        new vectorization kinds (for example the RV-like vectorization of the
        Region Vectorizer), or to add more context information that might come
        from other uses of OpenMP `declare variant`, or to add new Vector
        Function ABIs not based on OpenMP. Such information can be retrieved by
        attributes that will be added to describe the `llvm::CallInst` instance.

        IR Attribute
        ------------

        We define a `vector-function-abi-variant` attribute that lists the
        mangled names produced via the mangling function of the Vector Function
        ABI rules.

            vector-function-abi-variant = "abi_mangled_name_01,
abi_mangled_name_02(user_redirection),..."

        1.  Because we use only OpenMP `declare simd` vectorization, and because
            we require a vector Function ABI, we make this explicit in the name
            of the attribute.
        2.  Because the Vector Function ABIs encode all the information needed
            to know the vectorization shape of the vector function in the
            mangled names, we provide the mangled name via the attribute.
        3.  Function names redirection is specified by enclosing the name of the
            redirection in parenthesis, as in
            `abi_mangled_name_02(user_redirection)`.

        The IR attribute is used in conjunction with the vector function
        declarations or definitions that are available in the module. Each
        mangled name in the `vector-function-abi-attribute` is be associated to
        a correspondent declaration/definition in the module. Such definition is
        provided by the front-end. The vector function declaration or definition
        is passed as an argument to the `llvm.compiler.used` intrinsic to
        prevent the compiler from removing it from the module (for example when
        the OpenMP mapping mechanism is used via C header file).

        We decided to make the vector function signature explicit in IR by
        creating it with the front-end, because we have found some cases for
        which it is impossible to use the backend to reconstruct the vector
        function signature out of the Vector Function ABI mangled name and the
        signature of the scalar function. This is due to the fact that the
        layout of some C types is lost in the C-to-IR process.

        As an example, the following three types can not be distinguished at IR
        level, because all cases are mapped to `i64` in the signature of the
        function `foo`. In fact, according to the rules of the Vector Function
        ABI for AArch64, the three types, for a 2-lane vectorization factor,
        will map respectively to `<4 x int>`, `<2 x
(pointer_to_the_struct)>`,
        and `<2 x i64>`.

            // Type 1
            typedef _Complex int S;

            // Type 2
            typedef struct x{
            int a;
            int b;
            } S;

            // Type 3
            typedef uint64_t S;

            S foo(S a, S b) {
            return ...;
            }

        On of the problems that was raised during the discussion around these
        three types was how we could make sure that the vectorizer is able to
        determine how to map the values used in the scalar functions invocation
        to the values that can be used in the vecgtor signature.

        I was thiking to store the information needed for this in the parameter
        attributes of the function, but I realised that just the size of the
        scalar parameter might be enough, therefore I don't think we need to
add
        new attributes to handle this.

        I illustrate my reasoning with an example, in which we want to vectorize
        the "flattened" signature `i64 foo(i64, i64)` to a 2-lane
vector
        function. All exmaples are done for Advanced SIMD, with no mask
        parameter.

        I won't discuss `Type 3` becasue the mapping from scalar parameters
from
        vector parameters is trivial.

        In case of `Type 1`, we will see a 2-lane vector function associated to
        `foo` with signature `<4 x i32>(<4 x i32>, <4 x i32>)`
(the knowledge of
        being a 2-lane vector function comes from the `<vlen>` token in
the
        magled name, which is always present even in case of a used defined
        custom name).

        The size of the scalar parameter is 8, the size of the vector parameter
        is 16, therefore the fact that we are doing a 2-lane vectorization is
        enough to tell the vectorizer that the two instances of `i64` values
        needs to be mapped to the high and low half of the `<4 x i32>`
type.

        In case of `Type 2`, what we have is a situation in which two objects of
        type `i64` (the scalar values) need to be mapped to two pointers, which
        are pointing to instances of the same size of the scalar size. This is
        enough information for the vectorizer to be able to generate the code
        that can do this properly. The case of `Type 2` is distinguishable from
        `Type 3` because of the use of pointers, and it is distinguishable from
        the `linear` case that use references (in which vectors of pointers are
        needed) because the token in the mangled name is different (`v` is used
        for vector parameters that the vectorizer must to pass by value, while
        the linear references use different tokens for vector parameters.).

        Query interface: Search Vector Function System (SVFS)
        -----------------------------------------------------

        An interface that can be queried by the LLVM components to understand
        whether or not a scalar function can be vectorized, and that retrieves
        the vector function to be used if such vector shape is available.

        1.  This component is going to be unrelated to OpenMP.
        2.  This component will use internally the IR attribute defined in the
            previous section, but it will not expose any aspect of the Vector
            Function ABI via its interface.

        The interface provides two methods.

            std::vector<VectorFunctionShape>
SVFS::isFunctionVectorizable(llvm::CallInst * Call);

            llvm::Function * SVFS::getVectorizedFunction(llvm::CallInst * Call,
VectorFunctionShape Info);

        The first method is used to list all the vector shapes that available
        and attached to a scalar function. An empty results means that no vector
        versions are available.

        The second method retrieves the information needed to build a call to a
        vector function with a specific `VectorFunctionShape` info.

        The SVFS is wrapped in an analysis pass that can be retrieved in other
        passes.

        (SELF) ASSESSMENT ON EXTENDIBILITY
        =================================
        1.  Extending the C function attribute `clang_declare_simd_variant` to
            new Vector Function ABIs that use OpenMP will be straightforward
            because the attribute is tight to such ABIs and OpenMP.
        2.  The C attribute `clang_declare_simd_variant` and the
            `declare variant` directive used for the `simd` trait will be
            sharing the internals in clang, so adding the OpenMP functionality
            for `simd` traits will be mostly handling the directive in the
            OpenMP parser. How this should be done is described in
           
https://clang.llvm.org/docs/InternalsManual.html\#how-to-add-an-attribute
        3.  The IR attribute `vector-function-abi-variant` is not to be extended
            to represent other kind of vectorization other than those handled by
            `declare simd` and that are handled with a Vector Function ABI.
        4.  The IR attribute `vector-function-abi-variant` is not defined to be
            extended to represent the information of `declare variant` in its
            totality.
        5.  The IR attribute will not need to change when we will introduce non
            vector function ABI vectorization (RV-like, reductions...) or when
            we will decide to fully support `declare variant`. The information
            it carries will not need to be invalidated, but just extended with
            new attributes that will need to be handled by the
            `VectorFunctionShape` class, in a similar way the
            `llvm::FPMathOperator` does with the `llvm::FastMathFlags`, which
            operates on individual attributes to describe an overall
            functionality.
        6.  The IR attribute is to be used also to provide vector function
            information via the `declare simd` directive of OpenMP (see Example
            7 below).

        Examples
        =======
        Example 1
        ---------

        Exposing an Advanced SIMD vector function when targeting Advanced SIMD
        in AArch64.

            double foo_01(double Input)
__attribute__(clang_declare_simd_variant(“vector_foo_01", simdlen(2),
notinbranch, isa("simd"));

            // Advanced SIMD version provided by the user via an external module
            float64x2_t vector_foo_01(float64x2_t VectorInput);

            // ... loop ...
               x[i] = foo_01(y[i])

        The resulting IR is:

            @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<2
x double> (<2 x double>)* @_ZGVnN2v_foo_01 to i8*)], section
"llvm.metadata"

            declare double @foo_01(double %in) #0

            declare <2 x double> @_ZGVnN2v_foo_01(<2 x double>)

            // ... loop ...
               %xi = call double @foo_01(double %yi) #0

            attribute #0 =
{vector-abi-variant="_ZGVnN2v_foo_01(vector_foo_01)"}

        Example 2
        ---------

        Exposing an Advanced SIMD vector function when targeting Advanced SIMD
        in AArch64, but with the wrong signature. The user specifies a masked
        version of the function in the clauses of the attribute, the compiler
        throws an error suggesting the signature expected for `vector_foo_02.`

            double foo_02(double Input)
__attribute__(clang_declare_simd_variant(“vector_foo_02", simdlen(2),
inbranch, isa("simd"));

            // Advanced SIMD version
            float64x2_t vector_foo_02(float64x2_t VectorInput);
            // (suggested) compiler error ->                      ^ Missing
mask parameter of type `uint64x2_t`.

        Example 3
        ---------

        Targeting `sincos`-like signatures.

            void foo_03(double Input, double * Output)
__attribute__(clang_declare_simd_variant(“vector_foo_03", simdlen(2),
notinbranch, linear(Output, 1), isa("simd"));

            // Advanced SIMD version
            void vector_foo_03(float64x2_t VectorInput, double * Output);

            // ... loop ...
               foo_03(x[i], y + i)

        The resulting IR is:

            @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (void
(<2 x double>, double *)* @_ZGVnN2vl8_foo_03 to i8*)], section
"llvm.metadata"

            declare void @foo_03(double, double *) #0

            declare void @_ZGVnN2vl8_foo_03(<2 x double>, double *)

            ;; ... loop ...
            call void @foo_03(double %xi, double * %yiptr) #0

            attribute #0 =
{vector-abi-variant="_ZGVnN2vl8_foo_03(vector_foo_03)"}

        Example 4
        ---------

        Scalable vectorization targeting SVE

            double foo_04(double Input)
__attribute__(clang_declare_simd_variant(“vector_foo_04",
simdlen("scalable"), notinbranch, isa("sve"));

            // SVE version
            svfloat64_t vector_foo_04(svfloat64_t VectorInput, svbool_t Mask);

            // ... loop ...
               x[i] = foo_04(y[i])

        The IR generated is:

            @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast
(<vscale 2 x double> (<vscale 2 x double>)* @_ZGVsMxv_foo_04 to
i8*)], section "llvm.metadata"

            declare double @foo_04(double %in) #0

            declare <vscale 2 x double> @_ZGVnNxv_foo_04(<vscale 2 x
double>)

            // ... loop ...
               %xi = call double @foo_04(double %yi) #0

            attribute #0 =
{vector-abi-variant="_ZGVsMxv_foo_04(vector_foo_04)"}

        Example 5
        ---------

        Fixed length vectorization targeting SVE

            double foo_05(double Input)
__attribute__(clang_declare_simd_variant(“vector_foo_05", simdlen(4),
inbranch, isa("sve"));

            // Fixed-length SVE version
            svfloat64_t vector_foo_05(svfloat64_t VectorInput, svbool_t Mask);

        The resulting IR is:

            @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<4
x double> (<4 x double>)* @_ZGVsM4v_foo_05 to i8*)], section
"llvm.metadata"

            declare double @foo_05(double %in) #0

            declare <4 x double> @_ZGVnNxv_foo_05(<4 x double>)

            ;; ... loop ...
               %xi = call double @foo_05(double %yi) #0

            attribute #0 =
{vector-abi-variant="_ZGVsM4v_foo_04(vector_foo_04)"}

        Example 6
        ---------

        This is an x86 example, equivalent to the one provided by Andrei
        Elovikow in
        http://lists.llvm.org/pipermail/llvm-dev/2019-June/132885.html. Godbolt
        rendering with ICC at https://godbolt.org/z/Of1NxZ

            float MyAdd(float* a, int b)
__attribute__(clang_declare_simd_variant(“MyAddVec", simdlen(8),
notinbranch, linear(a), arch("core_2nd_gen_avx"))
            {
              return *a + b;
            }


            __m256 MyAddVec(float* v_a, __m128i v_b1, __m128i v_b2);

            // ... loop ...

              x[i] = MyAdd(a+i, b[i]);

        The resulting IR is:

            @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<8
x float> (float *, <2 x i64>, <2 x i64>)* @_ZGVbN8l4v_MyAdd to
i8*)], section "llvm.metadata"

            define float @MyAdd(float %a, i32 %b) {
              ;; return *a + b :)
            }

            define <8 x float> @_ZGVbN8l4v_MyAdd(float *, <2 x i64>,
<2 x i64>)

            ;; ... loop ...
               %xi = call float @MyAdd(float * %aiptr, i32 ) #0

            attribute #0 =
{vector-abi-variant="_ZGVbN8l4v_MyAdd(MyAddVec)"}

        Note: the signature of `MyAddVec` uses `<2 x i64>` instead of
        `<4 x i32>`, as shown in https://godbolt.org/z/T4T8s3 (line 11).
If we
        would have asked the back end to generate the signature of `MyAddVec` by
        looking at the signature of the scalar function and the `<vlen>=8`
token
        in the mangled name in the attribute, we would have end up using
        `<8 x i32>` instead of two instanced of `<2 x i64>`, which
would have
        been wrong.

        This is another example that demonstrate that we need to generate the
        vector function signatures in the front-end and not in the backend.

        Example 7: showing interaction with `declare simd`
        --------------------------------------------------

            #pragma omp declare simd linear(a) notinbranch
            float foo_07(float *a, int x)
__attribute__(clang_declare_simd_variant(“vector_foo_07", simdlen(4),
linear(a), notinbranch, arch("armv8.2-a+simd")) {
                return *a + x;
            }

            // Advanced SIMD version
            float32x4_t vector_foo_07(float *a, int32x4_t vx) {
            // Custom implementation.
            }

            // ... loop ...

              x[i] = foo_07(a+i, b[i]);

        The resulting IR attribute is made of three symbols:

        1.  `_ZGVnN2l4v_foo_07` and `_ZGVnN4l4v_foo_07`, which represent the
            ones the compiler builds by auto-vectorizing `foo_07` according to
            the rule defined in the Vector Function ABI specifications for
            AArch64.
        2.  `_ZGVnN4l4v_foo_07(vector_foo_07)`, which represents the
            user-defined redirection of the 4-lane version of `foo_07` to the
            custom implementation provided by the user when targeting Advanced
            SIMD for version 8.2 of the A64 instruction set.

        <!-- -->

            @llvm.compiler.used = appending global [2 x i8*] [i8* bitcast (<4
x float> (float *, <4 x i32>)* @_ZGVnN4l4v_foo_07 to i8*), i8* bitcast
(<2 x float> (float *, <2 x i32>)* @_ZGVnN2l4v_foo_07 to i8*) ],
section "llvm.metadata"

            define <4 x float> @_ZGVnN4l4v_foo_07(float *, <4 x
i32>) {
              ;; Compiler auto-vectorized version (via the VecClone pass)
            }

            define <2 x float> @_ZGVnN2l4v_foo_07(float *, <2 x
i32>) {
              ;; Compiler auto-vectorized version (via the VecClone pass)
            }

            define <4 x float> @vector_foo_07(float *, <4 x i32>) {
              ;; user provided vector version
            }

            define float @foo_07(float %a, i32 %b) {
              ;; return *a + b :)
            }

            // ... loop ...
               %xi = call float @foo_07(float * %aiptr, i32 %bi) #0

            attribute #0 =
{vector-function-abi-variant="_ZGVnN2l4v_foo_07,_ZGVnN4l4v_foo_07,_ZGVnN4l4v_foo_07(vector_foo_07)"}

        In this case, the body of the functions `_ZGVnN4l4v_foo_07` and
        `_ZGVnN2l4v_foo_07` is auto-generated by the compiler, therefore we
        might as well avoid adding them to the `@llvm.compiler.used` intrinsics.
        I have left it there for consistency, let me know if you think that
        there is no real reasons for requiring it, I will remove it.





IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.

llvm dev - Jul 2019 - Interface user provided vector functions with the vectorizer.

[llvm-dev] Interface user provided vector functions with the vectorizer.

[llvm-dev] Interface user provided vector functions with the vectorizer.