David Greene via llvm-dev
2019-Jun-24 16:39 UTC
[llvm-dev] RFC: Interface user provided vector functions with the vectorizer.
I have an RFC for first-class complex types in LLVM IR pending for some internal review. I hope to post it soon. That should help address this problem. Then the vector function signature generation could stay in LLVM, if I'm understanding the issue correctly. -David Francesco Petrogalli via llvm-dev <llvm-dev at lists.llvm.org> writes:> Hi all - I am working with a colleague to provide an initial implementation of this. > > We encountered a problem when dealing with generating the vector signatures of functions that use complex data. > > In this proposal, we expect the SVFS component in the backed to > demangle the name of the function in the attribute to be able to > reconstruct the signature of the vector function from the scalar > function signature. > > In case of Complex data, this doesn’t seem to be possible, because the > information of “being a vector of 2 lanes” that is supposed to be > carried by the complex scalar is lost in the transformation the data > type in a “coerced” type. > > Consider these three types and the function `foo`: > > // Type 1 > typedef _Complex int S; > > // Type 2 > typedef struct x{ > int a; > int b; > } S; > > // Type 3 > typedef uint64_t S; > > S foo(S a, S b) { > return ...; > } > > In all cases, the IR type of the parameters in `foo` is i64, therefore > is not possible to distinguish what C type generated the signature of > `foo`. > > I don’t know if this is going to be a problem for other architectures, > but this is definitely a problem on AArch64 where we need to be able > to generate the correct vector function signature for a specific > simdlen(N) attached on `foo`. When simdlen(2), for type 1 the vector > type is <4 x i32>, for type 2 is <2 x i64*>, for type 3 is <2 x i64>. > > Therefore, I would like to propose a change to the RFC, which would > move the responsibility off generating the vector function signature > from LLVM to clang. > > In particular, (and this I believe has already been mentioned by > Johannes), we could use the @llvm.compiler.used intrinsic to mark > those declaration that needs to stay in the IR and not optimized away > OPT before reaching the vectorizer. > > In summary, the change would consist of: > > 1. Generate symbols declaration/definitions of the vector function > with the mangled name in the IR, and mark it with > @llvm-compiler.used. This could be done in CGOpenMPRuntime.cpp > 2. Use the attribute vector-abs-variant defined in this RFC to map > scalar names to vector ABI mangled name, and used the same redirection > mechanism for the user provided vector name. > 3. Move the “vector function signature generation” from the SVFS in LLVM to the openmp code generator of the clang frontend > > The SVFS query system would still work as in the current proposal. The > only difference is that the vector function signature would be given > by the frontend and not need to be recomputed. > > Here is an example of ho the IR would look like with this change: > > ``` > @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<2 x i32> (<2 x i32>)* @f to i8*)], section "llvm.metadata" > > declare dso_local <2 x i32> @_ZGVnN2v_foo(<2 x i32> returned) > > declare i32 @foo(i32) #0 > > ; other function definition, including the one provided by the user > `my_vector_foo` if the user provided a definition and not just the > declaration > > attribute #0 = {vector-function-abi-variant=“_ZGVnN2v_foo(my_vector_foo)"} > > ``` > > If the attribute @llvm.compiler.used is not suitable for this (I am > not aware of all implication of using it on a global symbol), maybe we > could come up with a intrinsics that does what we need (avoid deleting > declarations that are not used) and name it > @llvm.vector.function.used? > > Please let me know what you think, I will submit an updated proposal next week. > > Kind regards, > > Francesco > >> On Jun 17, 2019, at 7:05 AM, Doerfert, Johannes <jdoerfert at anl.gov> wrote: >> >> I agree with Simon. This looks good conceptually. I have minor implementation comments but that can wait till the code reviews. >> >> Sorry for the delay and thanks for working on this. >> >> Get Outlook for Android >> >> From: Simon Moll <moll at cs.uni-saarland.de> >> Sent: Monday, June 17, 2019 10:02:58 AM >> To: Francesco Petrogalli; LLVM Development List; Clang Dev >> Cc: Renato Golin; Finkel, Hal J.; Andrea Bocci; Elovikov, Andrei; > Alexey Bataev; Doerfert, Johannes; Saito, Hideki; Tian, Xinmin; nd; > Roman Lebedev; Philip Reames; Shawn Landden >> Subject: Re: RFC: Interface user provided vector functions with the vectorizer. >> >> Hi Francesco, >> >> On 6/11/19 10:55 PM, Francesco Petrogalli wrote: >> > Dear all, >> > >> > I have re-written the proposal for interfacing user provided vector >> > functions, originally posted in both llvm-dev and cfe-dev mailing >> > list: >> > >> > "[RFC] Expose user provided vector function for auto-vectorization." >> > >> > The proposal looks quite different from the original submission, >> > therefore I took the liberty to start a new thread. >> > >> > The original thread generated some good discussion. In particular, >> > Simon Moll and Johannes Doerfert (CCed) have managed to provide good >> > arguments for the following claims: >> > >> > 1. The Vector Function ABI name mangling scheme of a target is not >> > enough to describe all uses cases of function vectorization that >> > the compiler might end up needing to support in the future. >> I think the new name of the attribute makes this point clear. >> > 2. `declare variant` needs to be handled properly at IR level, to be >> > able to give the compiler the full OpenMP context of the directive. >> > >> > This proposal addresses those two concerns and other (I believe) minor >> > concerns that have been raised in the previous thread. >> > >> > This proposal is provided with examples and a self assessment around >> > extendibility. >> > >> > I have CCed all the people that have participated in the discussion so >> > far, please let me know if you think I have missed anything of what >> > have been raised. >> > >> > Kind regards, >> > >> > Francesco >> >> LGTM. Please add me as a reviewer for this when you post patches. >> >> Thanks! >> >> Simon >> >> > >> > *** DRAFT OF THE PROPOSAL *** >> > >> > # SCOPE OF THE RFC : Interface user provided vector functions with the vectorizer. >> > >> > Because the users care about portability (across compilers, libraries >> > and systems), I believe we have to base sour solution on a standard >> > that describes the mapping from the scalar function to the vector >> > function. >> > >> > Because OpenMP is standard and widely used, we should base our >> > solution on the mechanisms that the standard provides, via the >> > directives `declare simd` and `declare variant`, the latter when used >> > in with the `simd` trait in the `construct` set. >> > >> > Please notice that: >> > >> > 1. The scope of the proposal is not implementing full support for >> > `pragma omp declare variant`. >> > 2. The scope of the proposal is not enabling the vectorizer to do new >> > kind of vectorizations (e.g. RV-like vectorization described by >> > Simon). >> > 3. The proposal aims to be extendible wrt 1. and 2. >> > 4. The IR attribute introduced in this proposal is equivalent to the >> > one needed for the VecClone pass under development in >> > https://reviews.llvm.org/D22792> > >> > # CLANG COMPONENTS >> > >> > A C function attribute, `clang_declare_simd_variant`, to attach to the >> > scalar version. The attribute provides enough information to the >> > compiler about the vector shape of the user defined function. The >> > vector shapes handled by the attribute are those handled by the OpenMP >> > standard via `declare simd` (and no more than that). >> > >> > 1. The function attribute handling in clang is crafted with the >> > requirement that it will be possible to re-use the same components >> > for the info generated by `declare variant` when used with a `simd` >> > traits in the `construct` set. >> > 2. The attribute allows orthogonality with the vectorization that is >> > done via OpenMP: the user vector function is still exposed for >> > vectorization when not using `-fopenmp-[simd]` once the `declare >> > simd` and `declare variant` directive of OpenMP will be available >> > in the front-end. >> > >> > ## C function attribute: `clang_declare_simd_variant` >> > >> > The definition of this attribute has been crafted to match the >> > semantics of `declare variant` for a `simd` construct described in >> > OpenMP 5.0. I have added only the traits of the `device` set, `isa` >> > and `arch`, which I believe are enough to cover for the use case of >> > this proposal. If that is not the case, please provide an example, >> > extending the attribute will be easy even once the current one is >> > implemented. >> > >> > ``` >> > clang_declare_simd_variant(<variant-func-id>, <simd clauses>{, <context selector clauses>}) >> > >> > <variant-func-id>:= The name of a function variant that is a base language identifier, or, >> > for C++, a template-id. >> > >> > <simd clauses> := <simdlen>, <mask>{, <optional simd clauses>} >> > >> > <simdlen> := simdlen(<positive number>) | simdlen("scalable") >> > >> > <mask> := inbranch | notinbranch >> > >> > <optional simd clauses> := <linear clause> >> > | <uniform clause> >> > | <align clause> | {,<optional simd clauses>} >> > >> > <linear clause> := linear_ref(<var>,<step>) >> > | linear_var(<var>, <step>) >> > | linear_uval(<var>, <step>) >> > | linear(<var>, <step>) >> > >> > <step> := <var> | <non zero number> >> > >> > <uniform clause> := uniform(<var>) >> > >> > <align clause> := align(<var>, <positive number>) >> > >> > <var> := Name of a parameter in the scalar function declaration/definition >> > >> > <non zero number> := ... | -2 | -1 | 1 | 2 | ... >> > >> > <positive number> := 1 | 2 | 3 | ... >> > >> > <context selector clauses> := {<isa>}{,} {<arch>} >> > >> > <isa> := isa(target-specific-value) >> > >> > <arch> := arch(target-specific-value) >> > >> > ``` >> > >> > # LLVM COMPONENTS: >> > >> > ## VectorFunctionShape class >> > >> > The object `VectorFunctionShape` contains the information about the >> > kind of vectorization available for an `llvm::Call`. >> > >> > The object `VectorFunctionShape` must contain the following information: >> > >> > 1. Vectorization Factor (or number or concurrent lanes executed by the >> > SIMD version of the function). Encoded by unsigned integer. >> > 2. Whether the vector function is requested for scalable >> > vectorization, encoded by a boolean. >> > 3. Information about masking / no masking, encoded by a boolean. >> > 4. Information about the parameters, encoded in a container that >> > carries objects of type `ParamaterType`, to describe features like >> > `linear` and `uniform`. >> > 5. Function name redirection, if a user has specified to use a custom >> > name instead of the Vector Function ABI ones. >> > >> > Items 1. to 5. represents the information stored in the >> > `vector-function-abi-variant` attribute (see next section). >> > >> > The object can be extended in the future to include new vectorization >> > kinds (for example the RV-like vectorization of the Region >> > Vectorizer), or to add more context information that might come from >> > other uses of OpenMP `declare variant`, or to add new Vector Function >> > ABIs not based on OpenMP. Such information can be retrieved by >> > attributes that will be added to describe the `Call` instance. >> > >> > ## IR Attribute >> > >> > We define a `vector-function-abi-variant` attribute that lists the >> > mangled names produced via the mangling function of the Vector >> > Function ABI rules. >> > >> > ``` >> > vector-function-abi-variant = "abi_mangled_name_01, abi_mangled_name_02(user_redirection),..." >> > ``` >> > >> > 1. Because we use only OpenMP `declare simd` vectorization, and >> > because we require a vector Function ABI, we make this explicit >> > in the name of the attribute. >> > 2. Because the Vector Function ABIs encode all the information >> > needed to know the vectorization shape of the vector function in >> > the mangled names, we provide the mangled name via the >> > attribute. >> > 3. Function names redirection is specified by enclosing the name of >> > the redirection in parenthesis, as in >> > `abi_mangled_name_02(user_redirection)`. >> > >> > ## Vector ABI Demangler >> > >> > The “Vector ABI demangler”, is the component that demangles the data >> > in the `vector-function-abi-variant` attribute and that provides the >> > instances of the class `VectorFunctionShape` that can be derived by >> > the mangled names listed in the attribute. >> > >> > ## Query interface: Search Vector Function System (SVFS) >> > >> > An interface that can be queried by the LLVM components to understand >> > whether or not a scalar function can be vectorized, and that retrieves >> > the vector function to be used if such vector shape is available. >> > >> > 1. This component is going to be unrelated to OpenMP. >> > 2. This component will use internally the demangler defined in the >> > previous section, but it will not expose any aspect of the Vector >> > Function ABI via its interface. >> > >> > The interface provides two methods. >> > >> > ``` >> > std::vector<VectorFunctionShape> SVFS::isFunctionVectorizable(llvm::CallInst * Call); >> > >> > llvm::Function * SVFS::getVectorizedFunction(llvm::CallInst * Call, VectorFunctionShape Info); >> > ``` >> > >> > The first method is used to list all the vector shapes that available >> > and attached to a scalar function. An empty results means that no >> > vector versions are available. >> > >> > The second method retrieves the information needed to build a call to >> > a vector function with a specific `VectorFunctionShape` info. >> > >> > # (SELF) ASSESSMENT ON EXTENDIBILITY >> > >> > >> > 1. Extending the C function attribute `clang_declare_simd_variant` to >> > new Vector Function ABIs that use OpenMP will be straightforward >> > because the attribute is tight to such ABIs and OpenMP. >> > 2. The C attribute `clang_declare_simd_variant` and the `declare >> > variant` directive used for the `simd` trait will be sharing the >> > internals in clang, so adding the OpenMP functionality for `simd` >> > traits will be mostly handling the directive in the OpenMP >> > parser. How this should be done is described in >> > https://clang.llvm.org/docs/InternalsManual.html#how-to-add-an-attribute> > 3. The IR attribute `vector-function-abi-variant` is not to be >> > extended to represent other kind of vectorization other than those >> > handled by `declare simd` and that are handled with a Vector >> > Function ABI. >> > 4. The IR attribute `vector-function-abi-variant` is not defined to be >> > extended to represent the information of `declare variant` in its >> > totality. >> > 5. The IR attribute will not need to change when we will introduce non >> > vector function ABI vectorization (RV-like, reductions...) or when >> > we will decide to fully support `declare variant`. The information >> > it carries will not need to be invalidated, but just extended with >> > new attributes that will need to be handled by the >> > `VectorFunctionShape` class, in a similar way the >> > `llvm::FPMathOperator` does with the `llvm::FastMathFlags`, which >> > operates on individual attributes to describe an overall >> > functionality. >> > >> > # Examples >> > >> > ## Example 1 >> > >> > Exposing an Advanced SIMD vector function when targeting Advanced SIMD >> > in AArch64. >> > >> > ``` >> > double foo_01(double Input) __attribute__(clang_declare_simd_variant(“vector_foo_01", simdlen(2), notinbranch, isa("simd")); >> > >> > // Advanced SIMD version >> > float64x2_t vector_foo_01(float64x2_t VectorInput); >> > ``` >> > >> > The resulting IR attribute is: >> > >> > ``` >> > attribute #0 = {vector-abi-variant="_ZGVnN2v_foo_01(vector_foo_01)"} >> > ``` >> > >> > ## Example 2 >> > >> > Exposing an Advanced SIMD vector function when targeting Advanced SIMD >> > in AArch64, but with the wrong signature. The user specifies a masked >> > version of the function in the clauses of the attribute, the compiler >> > throws an error suggesting the signature expected for >> > ``vector_foo_02.`` >> > >> > ``` >> > double foo_02(double Input) __attribute__(clang_declare_simd_variant(“vector_foo_02", simdlen(2), inbranch, isa("simd")); >> > >> > // Advanced SIMD version >> > float64x2_t vector_foo_02(float64x2_t VectorInput); >> > // (suggested) compiler error -> ^ Missing mask parameter of type `uint64x2_t`. >> > ``` >> > >> > ## Example 3 >> > >> > Targeting `sincos`-like signatures. >> > >> > ``` >> > void foo_03(double Input, double * Output) > __attribute__(clang_declare_simd_variant(“vector_foo_03", simdlen(2), > notinbranch, linear(Output, 1), isa("simd")); >> > >> > // Advanced SIMD version >> > void vector_foo_03(float64x2_t VectorInput, double * Output); >> > ``` >> > >> > The resulting IR attribute is: >> > >> > ``` >> > attribute #0 = {vector-abi-variant="_ZGVnN2vl8_foo_03(vector_foo_03)"} >> > ``` >> > ## Example 4 >> > >> > Scalable vectorization targeting SVE >> > >> > ``` >> > double foo_04(double Input) > __attribute__(clang_declare_simd_variant(“vector_foo_04", > simdlen("scalable"), notinbranch, isa("sve")); >> > >> > // SVE version >> > svfloat64_t vector_foo_04(svfloat64_t VectorInput, svbool_t Mask); >> > ``` >> > >> > The resulting IR attribute is: >> > >> > ``` >> > attribute #0 = {vector-abi-variant="_ZGVsM2v_foo_04(vector_foo_04)"} >> > ``` >> > >> > ## Example 5 >> > >> > Fixed length vectorization targeting SVE >> > >> > ``` >> > double foo_05(double Input) __attribute__(clang_declare_simd_variant(“vector_foo_05", simdlen(4), inbranch, isa("sve")); >> > >> > // Fixed-length SVE version >> > svfloat64_t vector_foo_05(svfloat64_t VectorInput, svbool_t Mask); >> > ``` >> > >> > The resulting IR attribute is: >> > >> > ``` >> > attribute #0 = {vector-abi-variant="_ZGVsM2v_foo_04(vector_foo_04)"} >> > ``` >> > >> > ## Example 06 >> > >> > This is an x86 example, equivalent to the one provided by Andrei >> > Elovikow in >> > http://lists.llvm.org/pipermail/llvm-dev/2019-June/132885.html. Godbolt >> > rendering with ICC at https://godbolt.org/z/Of1NxZ> > >> > ``` >> > float MyAdd(float* a, int b) > __attribute__(clang_declare_simd_variant(“MyAddVec", simdlen(8), > notinbranch, arch("core_2nd_gen_avx")) >> > { >> > return *a + b; >> > } >> > >> > >> > __m256 MyAddVec(float* v_a, __m128i v_b1, __m128i v_b2); >> > ``` >> > >> > The resulting IR attribute is: >> > >> > ``` >> > attribute #0 = {vector-abi-variant="_ZGVbN8l4v_MyAdd(MyAddVec)"} >> > ``` >> > >> > ## Example showing interaction with `declare simd` >> > >> > ``` >> > #pragma omp declare simd linear(a) notinbranch >> > float foo_06(float *a, int x) > __attribute__(clang_declare_simd_variant(“vector_foo_06", simdlen(4), > linear(a), notinbranch, arch("armv8.2-a+simd")) { >> > return *a + x; >> > } >> > >> > // Advanced SIMD version >> > float32x4_t vector_foo_06(float *a, int32x4_t vx) { >> > // Custom implementation. >> > } >> > ``` >> > >> > The resulting IR attribute is made of three symbols: >> > >> > 1. `_ZGVnN2l4v_foo_06` and `_ZGVnN4l4v_foo_06`, which represent the >> > ones the compiler builds by auto-vectorizing `foo_06` according to >> > the rule defined in the Vector Function ABI specifications for >> > AArch64. >> > 2. `_ZGVnN4l4v_foo_06(vector_foo_06)`, which represents the >> > user-defined redirection of the 4-lane version of `foo_06` to the >> > custom implementation provided by the user when targeting Advanced >> > SIMD for version 8.2 of the A64 instruction set. >> > >> > ``` >> > attribute #0 = {vector-function-abi-variant="_ZGVnN2l4v_foo_06,_ZGVnN4l4v_foo_06,_ZGVnN4l4v_foo_06(vector_foo_06)"} >> > ``` >> > >> -- >> >> Simon Moll >> Researcher / PhD Student >> >> Compiler Design Lab (Prof. Hack) >> Saarland University, Computer Science >> Building E1.3, Room 4.31 >> >> Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de >> Fax. +49 (0)681 302-3065 : http://compilers.cs.uni-saarland.de/people/moll > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Saito, Hideki via llvm-dev
2019-Jun-24 17:22 UTC
[llvm-dev] RFC: Interface user provided vector functions with the vectorizer.
That helps complex but not other structures that can be call-by-value. Hideki -----Original Message----- From: David Greene [mailto:dag at cray.com] Sent: Monday, June 24, 2019 9:40 AM To: Francesco Petrogalli via llvm-dev <llvm-dev at lists.llvm.org> Cc: Doerfert, Johannes <jdoerfert at anl.gov>; Francesco Petrogalli <Francesco.Petrogalli at arm.com>; nd <nd at arm.com>; Andrea Bocci <andrea.bocci at cern.ch>; Clang Dev <cfe-dev at lists.llvm.org>; Saito, Hideki <hideki.saito at intel.com>; Alexey Bataev <a.bataev at hotmail.com> Subject: Re: [llvm-dev] RFC: Interface user provided vector functions with the vectorizer. I have an RFC for first-class complex types in LLVM IR pending for some internal review. I hope to post it soon. That should help address this problem. Then the vector function signature generation could stay in LLVM, if I'm understanding the issue correctly. -David Francesco Petrogalli via llvm-dev <llvm-dev at lists.llvm.org> writes:> Hi all - I am working with a colleague to provide an initial implementation of this. > > We encountered a problem when dealing with generating the vector signatures of functions that use complex data. > > In this proposal, we expect the SVFS component in the backed to > demangle the name of the function in the attribute to be able to > reconstruct the signature of the vector function from the scalar > function signature. > > In case of Complex data, this doesn’t seem to be possible, because the > information of “being a vector of 2 lanes” that is supposed to be > carried by the complex scalar is lost in the transformation the data > type in a “coerced” type. > > Consider these three types and the function `foo`: > > // Type 1 > typedef _Complex int S; > > // Type 2 > typedef struct x{ > int a; > int b; > } S; > > // Type 3 > typedef uint64_t S; > > S foo(S a, S b) { > return ...; > } > > In all cases, the IR type of the parameters in `foo` is i64, therefore > is not possible to distinguish what C type generated the signature of > `foo`. > > I don’t know if this is going to be a problem for other architectures, > but this is definitely a problem on AArch64 where we need to be able > to generate the correct vector function signature for a specific > simdlen(N) attached on `foo`. When simdlen(2), for type 1 the vector > type is <4 x i32>, for type 2 is <2 x i64*>, for type 3 is <2 x i64>. > > Therefore, I would like to propose a change to the RFC, which would > move the responsibility off generating the vector function signature > from LLVM to clang. > > In particular, (and this I believe has already been mentioned by > Johannes), we could use the @llvm.compiler.used intrinsic to mark > those declaration that needs to stay in the IR and not optimized away > OPT before reaching the vectorizer. > > In summary, the change would consist of: > > 1. Generate symbols declaration/definitions of the vector function > with the mangled name in the IR, and mark it with @llvm-compiler.used. > This could be done in CGOpenMPRuntime.cpp 2. Use the attribute > vector-abs-variant defined in this RFC to map scalar names to vector > ABI mangled name, and used the same redirection mechanism for the user > provided vector name. > 3. Move the “vector function signature generation” from the SVFS in > LLVM to the openmp code generator of the clang frontend > > The SVFS query system would still work as in the current proposal. The > only difference is that the vector function signature would be given > by the frontend and not need to be recomputed. > > Here is an example of ho the IR would look like with this change: > > ``` > @llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (<2 x i32> (<2 x i32>)* @f to i8*)], section "llvm.metadata" > > declare dso_local <2 x i32> @_ZGVnN2v_foo(<2 x i32> returned) > > declare i32 @foo(i32) #0 > > ; other function definition, including the one provided by the user > `my_vector_foo` if the user provided a definition and not just the > declaration > > attribute #0 = > {vector-function-abi-variant=“_ZGVnN2v_foo(my_vector_foo)"} > > ``` > > If the attribute @llvm.compiler.used is not suitable for this (I am > not aware of all implication of using it on a global symbol), maybe we > could come up with a intrinsics that does what we need (avoid deleting > declarations that are not used) and name it > @llvm.vector.function.used? > > Please let me know what you think, I will submit an updated proposal next week. > > Kind regards, > > Francesco > >> On Jun 17, 2019, at 7:05 AM, Doerfert, Johannes <jdoerfert at anl.gov> wrote: >> >> I agree with Simon. This looks good conceptually. I have minor implementation comments but that can wait till the code reviews. >> >> Sorry for the delay and thanks for working on this. >> >> Get Outlook for Android >> >> From: Simon Moll <moll at cs.uni-saarland.de> >> Sent: Monday, June 17, 2019 10:02:58 AM >> To: Francesco Petrogalli; LLVM Development List; Clang Dev >> Cc: Renato Golin; Finkel, Hal J.; Andrea Bocci; Elovikov, Andrei; > Alexey Bataev; Doerfert, Johannes; Saito, Hideki; Tian, Xinmin; nd; > Roman Lebedev; Philip Reames; Shawn Landden >> Subject: Re: RFC: Interface user provided vector functions with the vectorizer. >> >> Hi Francesco, >> >> On 6/11/19 10:55 PM, Francesco Petrogalli wrote: >> > Dear all, >> > >> > I have re-written the proposal for interfacing user provided vector >> > functions, originally posted in both llvm-dev and cfe-dev mailing >> > list: >> > >> > "[RFC] Expose user provided vector function for auto-vectorization." >> > >> > The proposal looks quite different from the original submission, >> > therefore I took the liberty to start a new thread. >> > >> > The original thread generated some good discussion. In particular, >> > Simon Moll and Johannes Doerfert (CCed) have managed to provide >> > good arguments for the following claims: >> > >> > 1. The Vector Function ABI name mangling scheme of a target is not >> > enough to describe all uses cases of function vectorization that >> > the compiler might end up needing to support in the future. >> I think the new name of the attribute makes this point clear. >> > 2. `declare variant` needs to be handled properly at IR level, to be >> > able to give the compiler the full OpenMP context of the directive. >> > >> > This proposal addresses those two concerns and other (I believe) >> > minor concerns that have been raised in the previous thread. >> > >> > This proposal is provided with examples and a self assessment >> > around extendibility. >> > >> > I have CCed all the people that have participated in the discussion >> > so far, please let me know if you think I have missed anything of >> > what have been raised. >> > >> > Kind regards, >> > >> > Francesco >> >> LGTM. Please add me as a reviewer for this when you post patches. >> >> Thanks! >> >> Simon >> >> > >> > *** DRAFT OF THE PROPOSAL *** >> > >> > # SCOPE OF THE RFC : Interface user provided vector functions with the vectorizer. >> > >> > Because the users care about portability (across compilers, >> > libraries and systems), I believe we have to base sour solution on >> > a standard that describes the mapping from the scalar function to >> > the vector function. >> > >> > Because OpenMP is standard and widely used, we should base our >> > solution on the mechanisms that the standard provides, via the >> > directives `declare simd` and `declare variant`, the latter when >> > used in with the `simd` trait in the `construct` set. >> > >> > Please notice that: >> > >> > 1. The scope of the proposal is not implementing full support for >> > `pragma omp declare variant`. >> > 2. The scope of the proposal is not enabling the vectorizer to do new >> > kind of vectorizations (e.g. RV-like vectorization described by >> > Simon). >> > 3. The proposal aims to be extendible wrt 1. and 2. >> > 4. The IR attribute introduced in this proposal is equivalent to the >> > one needed for the VecClone pass under development in >> > https://reviews.llvm.org/D22792> > # CLANG COMPONENTS >> > >> > A C function attribute, `clang_declare_simd_variant`, to attach to >> > the scalar version. The attribute provides enough information to >> > the compiler about the vector shape of the user defined function. >> > The vector shapes handled by the attribute are those handled by the >> > OpenMP standard via `declare simd` (and no more than that). >> > >> > 1. The function attribute handling in clang is crafted with the >> > requirement that it will be possible to re-use the same components >> > for the info generated by `declare variant` when used with a `simd` >> > traits in the `construct` set. >> > 2. The attribute allows orthogonality with the vectorization that is >> > done via OpenMP: the user vector function is still exposed for >> > vectorization when not using `-fopenmp-[simd]` once the `declare >> > simd` and `declare variant` directive of OpenMP will be available >> > in the front-end. >> > >> > ## C function attribute: `clang_declare_simd_variant` >> > >> > The definition of this attribute has been crafted to match the >> > semantics of `declare variant` for a `simd` construct described in >> > OpenMP 5.0. I have added only the traits of the `device` set, `isa` >> > and `arch`, which I believe are enough to cover for the use case of >> > this proposal. If that is not the case, please provide an example, >> > extending the attribute will be easy even once the current one is >> > implemented. >> > >> > ``` >> > clang_declare_simd_variant(<variant-func-id>, <simd clauses>{, >> > <context selector clauses>}) >> > >> > <variant-func-id>:= The name of a function variant that is a base language identifier, or, >> > for C++, a template-id. >> > >> > <simd clauses> := <simdlen>, <mask>{, <optional simd clauses>} >> > >> > <simdlen> := simdlen(<positive number>) | simdlen("scalable") >> > >> > <mask> := inbranch | notinbranch >> > >> > <optional simd clauses> := <linear clause> >> > | <uniform clause> >> > | <align clause> | {,<optional simd >> > clauses>} >> > >> > <linear clause> := linear_ref(<var>,<step>) >> > | linear_var(<var>, <step>) >> > | linear_uval(<var>, <step>) >> > | linear(<var>, <step>) >> > >> > <step> := <var> | <non zero number> >> > >> > <uniform clause> := uniform(<var>) >> > >> > <align clause> := align(<var>, <positive number>) >> > >> > <var> := Name of a parameter in the scalar function >> > declaration/definition >> > >> > <non zero number> := ... | -2 | -1 | 1 | 2 | ... >> > >> > <positive number> := 1 | 2 | 3 | ... >> > >> > <context selector clauses> := {<isa>}{,} {<arch>} >> > >> > <isa> := isa(target-specific-value) >> > >> > <arch> := arch(target-specific-value) >> > >> > ``` >> > >> > # LLVM COMPONENTS: >> > >> > ## VectorFunctionShape class >> > >> > The object `VectorFunctionShape` contains the information about the >> > kind of vectorization available for an `llvm::Call`. >> > >> > The object `VectorFunctionShape` must contain the following information: >> > >> > 1. Vectorization Factor (or number or concurrent lanes executed by the >> > SIMD version of the function). Encoded by unsigned integer. >> > 2. Whether the vector function is requested for scalable >> > vectorization, encoded by a boolean. >> > 3. Information about masking / no masking, encoded by a boolean. >> > 4. Information about the parameters, encoded in a container that >> > carries objects of type `ParamaterType`, to describe features like >> > `linear` and `uniform`. >> > 5. Function name redirection, if a user has specified to use a custom >> > name instead of the Vector Function ABI ones. >> > >> > Items 1. to 5. represents the information stored in the >> > `vector-function-abi-variant` attribute (see next section). >> > >> > The object can be extended in the future to include new >> > vectorization kinds (for example the RV-like vectorization of the >> > Region Vectorizer), or to add more context information that might >> > come from other uses of OpenMP `declare variant`, or to add new >> > Vector Function ABIs not based on OpenMP. Such information can be >> > retrieved by attributes that will be added to describe the `Call` instance. >> > >> > ## IR Attribute >> > >> > We define a `vector-function-abi-variant` attribute that lists the >> > mangled names produced via the mangling function of the Vector >> > Function ABI rules. >> > >> > ``` >> > vector-function-abi-variant = "abi_mangled_name_01, abi_mangled_name_02(user_redirection),..." >> > ``` >> > >> > 1. Because we use only OpenMP `declare simd` vectorization, and >> > because we require a vector Function ABI, we make this explicit >> > in the name of the attribute. >> > 2. Because the Vector Function ABIs encode all the information >> > needed to know the vectorization shape of the vector function in >> > the mangled names, we provide the mangled name via the >> > attribute. >> > 3. Function names redirection is specified by enclosing the name of >> > the redirection in parenthesis, as in >> > `abi_mangled_name_02(user_redirection)`. >> > >> > ## Vector ABI Demangler >> > >> > The “Vector ABI demangler”, is the component that demangles the >> > data in the `vector-function-abi-variant` attribute and that >> > provides the instances of the class `VectorFunctionShape` that can >> > be derived by the mangled names listed in the attribute. >> > >> > ## Query interface: Search Vector Function System (SVFS) >> > >> > An interface that can be queried by the LLVM components to >> > understand whether or not a scalar function can be vectorized, and >> > that retrieves the vector function to be used if such vector shape is available. >> > >> > 1. This component is going to be unrelated to OpenMP. >> > 2. This component will use internally the demangler defined in the >> > previous section, but it will not expose any aspect of the Vector >> > Function ABI via its interface. >> > >> > The interface provides two methods. >> > >> > ``` >> > std::vector<VectorFunctionShape> >> > SVFS::isFunctionVectorizable(llvm::CallInst * Call); >> > >> > llvm::Function * SVFS::getVectorizedFunction(llvm::CallInst * Call, >> > VectorFunctionShape Info); ``` >> > >> > The first method is used to list all the vector shapes that >> > available and attached to a scalar function. An empty results means >> > that no vector versions are available. >> > >> > The second method retrieves the information needed to build a call >> > to a vector function with a specific `VectorFunctionShape` info. >> > >> > # (SELF) ASSESSMENT ON EXTENDIBILITY >> > >> > >> > 1. Extending the C function attribute `clang_declare_simd_variant` to >> > new Vector Function ABIs that use OpenMP will be straightforward >> > because the attribute is tight to such ABIs and OpenMP. >> > 2. The C attribute `clang_declare_simd_variant` and the `declare >> > variant` directive used for the `simd` trait will be sharing the >> > internals in clang, so adding the OpenMP functionality for `simd` >> > traits will be mostly handling the directive in the OpenMP >> > parser. How this should be done is described in >> > https://clang.llvm.org/docs/InternalsManual.html#how-to-add-an-attribute> > 3. The IR attribute `vector-function-abi-variant` is not to be >> > extended to represent other kind of vectorization other than those >> > handled by `declare simd` and that are handled with a Vector >> > Function ABI. >> > 4. The IR attribute `vector-function-abi-variant` is not defined to be >> > extended to represent the information of `declare variant` in its >> > totality. >> > 5. The IR attribute will not need to change when we will introduce non >> > vector function ABI vectorization (RV-like, reductions...) or when >> > we will decide to fully support `declare variant`. The information >> > it carries will not need to be invalidated, but just extended with >> > new attributes that will need to be handled by the >> > `VectorFunctionShape` class, in a similar way the >> > `llvm::FPMathOperator` does with the `llvm::FastMathFlags`, which >> > operates on individual attributes to describe an overall >> > functionality. >> > >> > # Examples >> > >> > ## Example 1 >> > >> > Exposing an Advanced SIMD vector function when targeting Advanced >> > SIMD in AArch64. >> > >> > ``` >> > double foo_01(double Input) >> > __attribute__(clang_declare_simd_variant(“vector_foo_01", >> > simdlen(2), notinbranch, isa("simd")); >> > >> > // Advanced SIMD version >> > float64x2_t vector_foo_01(float64x2_t VectorInput); ``` >> > >> > The resulting IR attribute is: >> > >> > ``` >> > attribute #0 = >> > {vector-abi-variant="_ZGVnN2v_foo_01(vector_foo_01)"} >> > ``` >> > >> > ## Example 2 >> > >> > Exposing an Advanced SIMD vector function when targeting Advanced >> > SIMD in AArch64, but with the wrong signature. The user specifies a >> > masked version of the function in the clauses of the attribute, the >> > compiler throws an error suggesting the signature expected for >> > ``vector_foo_02.`` >> > >> > ``` >> > double foo_02(double Input) >> > __attribute__(clang_declare_simd_variant(“vector_foo_02", >> > simdlen(2), inbranch, isa("simd")); >> > >> > // Advanced SIMD version >> > float64x2_t vector_foo_02(float64x2_t VectorInput); >> > // (suggested) compiler error -> ^ Missing mask parameter of type `uint64x2_t`. >> > ``` >> > >> > ## Example 3 >> > >> > Targeting `sincos`-like signatures. >> > >> > ``` >> > void foo_03(double Input, double * Output) > __attribute__(clang_declare_simd_variant(“vector_foo_03", simdlen(2), > notinbranch, linear(Output, 1), isa("simd")); >> > >> > // Advanced SIMD version >> > void vector_foo_03(float64x2_t VectorInput, double * Output); ``` >> > >> > The resulting IR attribute is: >> > >> > ``` >> > attribute #0 = >> > {vector-abi-variant="_ZGVnN2vl8_foo_03(vector_foo_03)"} >> > ``` >> > ## Example 4 >> > >> > Scalable vectorization targeting SVE >> > >> > ``` >> > double foo_04(double Input) > __attribute__(clang_declare_simd_variant(“vector_foo_04", > simdlen("scalable"), notinbranch, isa("sve")); >> > >> > // SVE version >> > svfloat64_t vector_foo_04(svfloat64_t VectorInput, svbool_t Mask); >> > ``` >> > >> > The resulting IR attribute is: >> > >> > ``` >> > attribute #0 = >> > {vector-abi-variant="_ZGVsM2v_foo_04(vector_foo_04)"} >> > ``` >> > >> > ## Example 5 >> > >> > Fixed length vectorization targeting SVE >> > >> > ``` >> > double foo_05(double Input) >> > __attribute__(clang_declare_simd_variant(“vector_foo_05", >> > simdlen(4), inbranch, isa("sve")); >> > >> > // Fixed-length SVE version >> > svfloat64_t vector_foo_05(svfloat64_t VectorInput, svbool_t Mask); >> > ``` >> > >> > The resulting IR attribute is: >> > >> > ``` >> > attribute #0 = >> > {vector-abi-variant="_ZGVsM2v_foo_04(vector_foo_04)"} >> > ``` >> > >> > ## Example 06 >> > >> > This is an x86 example, equivalent to the one provided by Andrei >> > Elovikow in >> > http://lists.llvm.org/pipermail/llvm-dev/2019-June/132885.html. >> > Godbolt rendering with ICC at https://godbolt.org/z/Of1NxZ> > ``` >> > float MyAdd(float* a, int b) > __attribute__(clang_declare_simd_variant(“MyAddVec", simdlen(8), > notinbranch, arch("core_2nd_gen_avx")) >> > { >> > return *a + b; >> > } >> > >> > >> > __m256 MyAddVec(float* v_a, __m128i v_b1, __m128i v_b2); ``` >> > >> > The resulting IR attribute is: >> > >> > ``` >> > attribute #0 = {vector-abi-variant="_ZGVbN8l4v_MyAdd(MyAddVec)"} >> > ``` >> > >> > ## Example showing interaction with `declare simd` >> > >> > ``` >> > #pragma omp declare simd linear(a) notinbranch float foo_06(float >> > *a, int x) > __attribute__(clang_declare_simd_variant(“vector_foo_06", simdlen(4), > linear(a), notinbranch, arch("armv8.2-a+simd")) { >> > return *a + x; >> > } >> > >> > // Advanced SIMD version >> > float32x4_t vector_foo_06(float *a, int32x4_t vx) { // Custom >> > implementation. >> > } >> > ``` >> > >> > The resulting IR attribute is made of three symbols: >> > >> > 1. `_ZGVnN2l4v_foo_06` and `_ZGVnN4l4v_foo_06`, which represent the >> > ones the compiler builds by auto-vectorizing `foo_06` according to >> > the rule defined in the Vector Function ABI specifications for >> > AArch64. >> > 2. `_ZGVnN4l4v_foo_06(vector_foo_06)`, which represents the >> > user-defined redirection of the 4-lane version of `foo_06` to the >> > custom implementation provided by the user when targeting Advanced >> > SIMD for version 8.2 of the A64 instruction set. >> > >> > ``` >> > attribute #0 = >> > {vector-function-abi-variant="_ZGVnN2l4v_foo_06,_ZGVnN4l4v_foo_06,_ >> > ZGVnN4l4v_foo_06(vector_foo_06)"} >> > ``` >> > >> -- >> >> Simon Moll >> Researcher / PhD Student >> >> Compiler Design Lab (Prof. Hack) >> Saarland University, Computer Science Building E1.3, Room 4.31 >> >> Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de Fax. +49 (0)681 >> 302-3065 : http://compilers.cs.uni-saarland.de/people/moll > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Francesco Petrogalli via llvm-dev
2019-Jun-24 18:27 UTC
[llvm-dev] RFC: Interface user provided vector functions with the vectorizer.
> On Jun 24, 2019, at 12:22 PM, Saito, Hideki <hideki.saito at intel.com> wrote: > > That helps complex but not other structures that can be call-by-value.Still, having complex types in IR will be very helpful also for other reasons! :) I am looking forward to see your proposal David, thank you. Francesco