thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization. [May 2019]

If this information is useful, please help other people find it:
Share via:

Alexey Bataev via llvm-dev

2019-May-31 18:04 UTC

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

You can define clang specific attribute and later add GCC alias for it.

Best regards,
Alexey Bataev
> 31 мая 2019 г., в 13:46, Francesco Petrogalli <Francesco.Petrogalli at
arm.com> написал(а):
> 
> 
> 
>> On May 31, 2019, at 12:38 PM, Alexey Bataev <a.bataev at
hotmail.com> wrote:
>> 
>> Francesco, there won't be any duplication. Most of the declarative
OpenMP directives are represented as attributes internally, so, I think, it will
be natural to use an attribute here rather than pragma.
>> 
> 
> Very nice. I am open to get rid of the `clang` based directive in favor of
the attribute one.
> 
> At the moment there is no “declare variant” attribute in the list of common
function attributes at
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes.
What should we do? Define our own, or wait for GCC to publish the attribute?
> 
> If we decide to do what GCC does, shall we actively participate in the
discussion with the GCC community to shape the attribute itself? Am I right
thinking that the GCC solution is preferred given that system header files are
more likely to follow whatever GCC comes up with?
> 
> 
>> Best regards,
>> Alexey Bataev
>> 
>>> 31 мая 2019 г., в 13:32, Francesco Petrogalli
<Francesco.Petrogalli at arm.com> написал(а):
>>> 
>>> 
>>> 
>>>> On May 31, 2019, at 12:00 PM, Alexey Bataev via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
>>>> 
>>>> Hi Francesco, did you think about adding the attribute instead
of the pragma? It is a common way to express such constructs as function
attributes in clang/GCC rather than as pragma.
>>>> 
>>> 
>>> Yes, I thought about it, I believe that GCC plans to use
attributes.
>>> 
>>> In
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
there is a description of the “simd” attribute, but I couldn’t find an example
of how it is used.
>>> 
>>> Also, I cannot find an attribute that is equivalent to the “declare
variant” directive. Maybe it is planned for future releases.
>>> 
>>> The idea of using a `clang` equivalent of the `omp` directive was
to avoid duplication in terms of handling both the attribute and the omp
directive, as both directive will share much of the infrastructure.
>>> 
>>>> Best regards,
>>>> Alexey Bataev
>>>> 
>>>>> 31 мая 2019 г., в 12:18, Francesco Petrogalli via cfe-dev
<cfe-dev at lists.llvm.org> написал(а):
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> Thank you for the feedback so far.
>>>>> 
>>>>> I am replying to all your questions/concerns/suggestions in
this single email. Please let me know if I have missed any.
>>>>> 
>>>>> I will update the RFC accordingly to what we end up
deciding here.
>>>>> 
>>>>> Kind regards,
>>>>> 
>>>>> Francesco
>>>>> 
>>>>> 
>>>>> # TOPIC 1: concerns about name mangling
>>>>> 
>>>>> I understand that there are concerns in using the mangling
scheme I proposed, and that it would be preferred to have a mangling scheme that
is based on (and standardized by) OpenMP. I hear the argument on having some
common ground here. In fact, there is already common ground between the x86 and
aarch64 backend, who have based their respective Vector Function ABI
specifications on OpenMP.
>>>>> 
>>>>> In fact, the mangled name grammar can be summarized as
follows:
>>>>> 
>>>>> _ZGV<isa><masking><VLEN><parameter
type>_<scalar name>
>>>>> 
>>>>> Across vector extensions the only <token> that will
differ is the <isa> token.
>>>>> 
>>>>> This might lead people to think that we could drop the
_ZGV<isa> prefix and consider the <masking><VLEN><parameter
type>_<scalar name> part as a sort of unofficial OpenMP mangling
scheme: in fact, the signature of an “unmasked 2-lane vector vector of `sin`”
will always be `<2 x double>(2 x double>).
>>>>> 
>>>>> The problem with this choice is the number of vector
version available for a target is not unique.
>>>>> 
>>>>> In particular, the following declaration generates multiple
vector versions, depending on the target:
>>>>> 
>>>>> #pragma omp declare simd simdlen(2) notinbranch
>>>>> double foo(double) {…};
>>>>> 
>>>>> On x86, this generates at least 4 symbols (one for SSE, one
for AVX, one for AVX2, and one for AVX512: https://godbolt.org/z/TLYXPi)
>>>>> 
>>>>> On aarch64, the same declaration generates a unique symbol,
as specified in the Vector Function ABI.
>>>>> 
>>>>> This means that the attribute (or metadata) that carries
the information on the available vector version needs to deal also with things
that are not usually visible at IR level, but that might still need to be
provided to be able to decide which particular instruction set/ vector extension
needs to be targeted.
>>>>> 
>>>>> I used an example based on `declare simd` instead of
`declare variant` because the attribute/metadata needed for `declare variant` is
a modification of the one needed for `declare simd`, which has already been
agreed in a previous RFC proposed by Intel [1], and for which Intel has already
provided an implementation [2]. The changes proposed in this RFC are fully
compatible with the work that is being don for the VecClone pass in [2].
>>>>> 
>>>>> [1]
http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
>>>>> [2] VecCLone pass: https://reviews.llvm.org/D22792
>>>>> 
>>>>> The good news is that as far as AArch64 and x86 are
concerned, the only thing that will differ in the mangled name is the
“<isa>” token. As far as I can tell, the mangling scheme of the rest of
the vector name is the same, therefore a lot of infrastructure in terms of
mangling and demangling can be reused. In fact, the `mangleVectorParameters`
function in
https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could
already be shared among x86 and aarch64.
>>>>> 
>>>>> TOPIC 2: metadata vs attribute
>>>>> 
>>>>> From a functionality point of view, I don’t care whether we
use metadata or attributes. The VecClone pass mentioned in TOPIC 1 uses the
following:
>>>>> 
>>>>> attributes #0 = { nounwind uwtable
“vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16”}
>>>>> 
>>>>> This is an attribute (I though it was metadata?), I am
happy to reword the RFC using the right terminology (sorry for messing this up).
>>>>> 
>>>>> Also, @Renato expressed concern that metadata might be
dropped by optimization passes - would using attributes prevent that?
>>>>> 
>>>>> TOPIC 3: "There is no way to notify the backend how
conformant the SIMD versions are.”
>>>>> 
>>>>> @Shawn, I am afraid I don’t understand what you mean by
“conformant” here. Can you elaborate with an example?
>>>>> 
>>>>> TOPIC 3: interaction of the `omp declare variant` with
`clang declare variant`
>>>>> 
>>>>> I believe this is described in the `Option behavior, and
interaction with OpenMP`. The option `-fclang-declare-variant` is there to make
the OpenMP based one orthogonal. Of course, we might decide to make
-fclang-declare-variant on/off by default, and have default behavior when
interacting with -fopenmp-simd. For the sake of compatibility with other
compilers, we might need to require -fno-clang-declare-variant when targeting
-fopenmp-[simd].
>>>>> 
>>>>> TOPIC 3: "there are no special arguments / flags /
status regs that are used / changed in the vector version that the compiler will
have to "just know”
>>>>> 
>>>>> I believe that this concern is raised by the problem of
handling FP exceptions? If that’s the case, the compiler is not allowed to do
any assumption on the vector function about that, and treat it with the same
knowledge of any other function, depending on the visibility it has in the
compilation unit. @Renato, does this answer your question?
>>>>> 
>>>>> TOPIC 4: attribute in function declaration vs attribute
function call site
>>>>> 
>>>>> We discussed this in the previous version of the proposal.
Having it in the call sites guarantees that incompatible vector version are used
when merging modules compiled for different targets. I don’t have a use case for
this, if I remember correctly this was asked by @Hideki Saito. Hideki, any
comment on this?
>>>>> 
>>>>> TOPIC 5: overriding system header (the discussion on
#pragma omp/clang/system variants initiated by @Hal Finkel).
>>>>> 
>>>>> I though that the split among #pragma clang declare variant
and #pragma omp declare variant was already providing the orthogonality between
system header and user header. Meaning that a user should always prefer the omp
version (for portability to other compilers) instead of the #pragma clang one,
which would be relegated to system headers and headers provided by the compiler.
Am I missing something? If so, I am happy to add a “system” version of the
directive, as it would be quite easy to do given most of the parsing
infrastructure will be shared.
>>>>> 
>>>>> 
>>>>>> On May 30, 2019, at 12:53 PM, Philip Reames
<listmail at philipreames.com> wrote:
>>>>>> 
>>>>>> 
>>>>>>>>> On 5/30/19 9:05 AM, Doerfert, Johannes
wrote:
>>>>>>>>>> On 05/29, Finkel, Hal J. via cfe-dev
wrote:
>>>>>>>>>>> On 5/29/19 1:52 PM, Philip Reames
wrote:
>>>>>>>>>>> On 5/28/19 7:55 PM, Finkel, Hal J.
wrote:
>>>>>>>>>>> On 5/28/19 3:31 PM, Philip Reames
via cfe-dev wrote:
>>>>>>>>>>> I generally like the idea of having
support in IR for vectorization of
>>>>>>>>>>> custom functions.  I have several
use cases which would benefit from this.
>>>>>>>>>>> 
>>>>>>>>>>> I'd suggest a couple of
reframings to the IR representation though.
>>>>>>>>>>> 
>>>>>>>>>>> First, this should probably be
specified as metadata/attribute on a
>>>>>>>>>>> function declaration.  Allowing the
callsite variant is fine, but it
>>>>>>>>>>> should primarily be a property of
the called function, not of the call
>>>>>>>>>>> site.  Being able to specify it
once per declaration is much cleaner.
>>>>>>>>>> I agree. We should support this both on
the function declaration and on
>>>>>>>>>> the call sites.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> Second, I really don't like the
mangling use here.  We need a better way
>>>>>>>>>>> to specify the properties of the
function then it's mangled name.  One
>>>>>>>>>>> thought to explore is to directly
use the Value of the function
>>>>>>>>>>> declaration (since this is metadata
and we can do that), and then tie
>>>>>>>>>>> the properties to the function
declaration in some way?  Sorry, I don't
>>>>>>>>>>> really have a specific suggestion
here.
>>>>>>>>>> Is the problem the mangling or the fact
that the mangling is
>>>>>>>>>> ABI/target-specific? One option is to
use LLVM's mangling scheme (the
>>>>>>>>>> one we use for intrinsics) and then
provide some backend infrastructure
>>>>>>>>>> to translate later.
>>>>>>>>> Well, both honestly.  But mangling with a
non-target specific scheme is
>>>>>>>>> a lot better, so I might be okay with that.
Good idea.
>>>>>>>> 
>>>>>>>> I liked your idea of directly encoding the
signature in the metadata,
>>>>>>>> but I think that we want to continue to use
attributes, and not
>>>>>>>> metadata, and the options for attributes seem
more limited - unless we
>>>>>>>> allow attributes to take metadata arguments -
maybe that's an
>>>>>>>> enhancement worth considering.
>>>>>>> I recently talked to people in the OpenMP language
committee meeting
>>>>>>> about this and, thinking forward to the actual
implementation/use of the
>>>>>>> OpenMP 5.x declare variant feature, I'd say:
>>>>>>> 
>>>>>>> - We will need a mangling scheme if we want to
allow variants on
>>>>>>> declarations that are defined elsewhere.
>>>>>>> - We will need a (OpenMP) standardized mangling
scheme if we want
>>>>>>> interoperability between compilers.
>>>>>>> 
>>>>>>> I assume we want both so I think we will need both.
>>>>>> If I'm reading this correctly, this describes a
need for the frontend to
>>>>>> have a mangling scheme.  Nothing in here would seem to
prevent the
>>>>>> frontend for generating a declaration for a mangled
external symbol and
>>>>>> then referencing that declaration.  Am I missing
something?
>>>>>>> 
>>>>>>> That said, I think this should allow us to avoid
attributes/metadata
>>>>>>> which seems to me like a good thing right now.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Johannes
>>>>>>> 
>>>>>>> 
>>>>>>>>>>>> On 5/28/19 12:44 PM, Francesco
Petrogalli via llvm-dev wrote:
>>>>>>>>>>>> Dear all,
>>>>>>>>>>>> 
>>>>>>>>>>>> This RFC is a proposal to
provide auto-vectorization functionality for user provided vector functions.
>>>>>>>>>>>> 
>>>>>>>>>>>> The proposal is a modification
of an RFC that I have sent out a couple of months ago, with the title `[RFC]
Re-implementing -fveclib with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
>>>>>>>>>>>> 
>>>>>>>>>>>> The original RFC was proposing
to re-implement the `-fveclib` command line option. This proposal avoids that,
and limits its scope to the mechanics of providing vector function in user code
that the compiler can pick up for auto-vectorization. This narrower scope limits
the impact of changes that are needed in both clang and LLVM.
>>>>>>>>>>>> 
>>>>>>>>>>>> Please let me know what you
think.
>>>>>>>>>>>> 
>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>> 
>>>>>>>>>>>> Francesco
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>
================================================================================>>>>>>>>>>>>
>>>>>>>>>>>> Introduction
>>>>>>>>>>>>
===========>>>>>>>>>>>>
>>>>>>>>>>>> This RFC encompasses the
proposal of informing the vectorizer about the
>>>>>>>>>>>> availability of vector
functions provided by the user. The mechanism is
>>>>>>>>>>>> based on the use of the
directive `declare variant` introduced in OpenMP
>>>>>>>>>>>> 5.0 [^1].
>>>>>>>>>>>> 
>>>>>>>>>>>> The mechanism proposed has the
following properties:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1.  Decouples the compiler
front-end that knows about the availability
>>>>>>>>>>>> of vectorized routines, from
the back-end that knows how to make use
>>>>>>>>>>>> of them.
>>>>>>>>>>>> 2.  Enable support for a
developer's own vector libraries without
>>>>>>>>>>>> requiring changes to the
compiler.
>>>>>>>>>>>> 3.  Enables other frontends
(e.g. f18) to add scalar-to-vector function
>>>>>>>>>>>> mappings as relevant for their
own runtime libraries, etc.
>>>>>>>>>>>> 
>>>>>>>>>>>> The implemetation consists of
two separate sets of changes.
>>>>>>>>>>>> 
>>>>>>>>>>>> The first set is a set o
changes in `llvm`, and consists of:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1.  [Changes in LLVM
IR](#llvmIR) to provide information about the
>>>>>>>>>>>> availability of user-defined
vector functions via metadata attached
>>>>>>>>>>>> to an `llvm::CallInst`.
>>>>>>>>>>>> 2.  [An
infrastructure](#infrastructure) that can be queried to retrive
>>>>>>>>>>>> information about the available
vector functions associated to a
>>>>>>>>>>>> `llvm::CallInst`.
>>>>>>>>>>>> 3.  [Changes in the
LoopVectorizer](#LV) to use the API to query the
>>>>>>>>>>>> metadata.
>>>>>>>>>>>> 
>>>>>>>>>>>> The second set consists of the
changes [changes in clang](#clang) that
>>>>>>>>>>>> are needed too to recognize the
`#pragma clang declare variant`
>>>>>>>>>>>> directive.
>>>>>>>>>>>> 
>>>>>>>>>>>> Proposed changes
>>>>>>>>>>>>
===============>>>>>>>>>>>>
>>>>>>>>>>>> We propose an implementation
that uses `#pragma clang declare variant`
>>>>>>>>>>>> to inform the backend
components about the availability of vector
>>>>>>>>>>>> version of scalar functions
found in IR. The mechanism relies in storing
>>>>>>>>>>>> such information in IR
metadata, and therefore makes the
>>>>>>>>>>>> auto-vectorization of function
calls a mid-end (`opt`) process that is
>>>>>>>>>>>> independent on the front-end
that generated such IR metadata.
>>>>>>>>>>>> 
>>>>>>>>>>>> This implementation provides a
generic mechanism that the users of the
>>>>>>>>>>>> LLVM compiler will be able to
use for interfacing their own vector
>>>>>>>>>>>> routines for generic code.
>>>>>>>>>>>> 
>>>>>>>>>>>> The implementation can also
expose vectorization-specific descriptors --
>>>>>>>>>>>> for example, like the `linear`
and `uniform` clauses of the OpenMP
>>>>>>>>>>>> `declare simd` directive --
that could be used to finely tune the
>>>>>>>>>>>> automatic vectorization of some
functions (think for example the
>>>>>>>>>>>> vectorization of `double
sincos(double , double *, double *)`, where
>>>>>>>>>>>> `linear` can be used to give
extra information about the memory layout
>>>>>>>>>>>> of the 2 pointers parameters in
the vector version).
>>>>>>>>>>>> 
>>>>>>>>>>>> The directive `#pragma clang
declare variant` follows the syntax of the
>>>>>>>>>>>> `#pragma omp declare variant`
directive of OpenMP.
>>>>>>>>>>>> 
>>>>>>>>>>>> We define the new directive in
the `clang` namespace instead of using
>>>>>>>>>>>> the `omp` one of OpenMP to
allow the compiler to perform
>>>>>>>>>>>> auto-vectorization outside of
an OpenMP SIMD context.
>>>>>>>>>>>> 
>>>>>>>>>>>> The mechanism is base on OpenMP
to provide a uniform user experience
>>>>>>>>>>>> across the two mechanism, and
to maximise the number of shared
>>>>>>>>>>>> components of the
infrastructure needed in the compiler frontend to
>>>>>>>>>>>> enable the feature.
>>>>>>>>>>>> 
>>>>>>>>>>>> Changes in LLVM IR {#llvmIR}
>>>>>>>>>>>> ------------------
>>>>>>>>>>>> 
>>>>>>>>>>>> The IR is enriched with
metadata that details the availability of vector
>>>>>>>>>>>> versions of an associated
scalar function. This metadata is attached to
>>>>>>>>>>>> the call site of the scalar
function.
>>>>>>>>>>>> 
>>>>>>>>>>>> The metadata takes the form of
an attribute containing a comma separated
>>>>>>>>>>>> list of vector function
mappings. Each entry has a unique name that
>>>>>>>>>>>> follows the Vector Function
ABI[^2] and real name that is used when
>>>>>>>>>>>> generating calls to this vector
function.
>>>>>>>>>>>> 
>>>>>>>>>>>> vfunc_name1(real_name1),
vfunc_name2(real_name2)
>>>>>>>>>>>> 
>>>>>>>>>>>> The Vector Function ABI name
describes the signature of the vector
>>>>>>>>>>>> function so that properties
like vectorisation factor can be queried
>>>>>>>>>>>> during compilation.
>>>>>>>>>>>> 
>>>>>>>>>>>> The `(real name)` token is
optional and assumed to match the Vector
>>>>>>>>>>>> Function ABI name when omitted.
>>>>>>>>>>>> 
>>>>>>>>>>>> For example, the availability
of a 2-lane double precision `sin`
>>>>>>>>>>>> function via SVML when
targeting AVX on x86 is provided by the following
>>>>>>>>>>>> IR.
>>>>>>>>>>>> 
>>>>>>>>>>>> // ...
>>>>>>>>>>>> ... = call double @sin(double)
#0
>>>>>>>>>>>> // ...
>>>>>>>>>>>> 
>>>>>>>>>>>> #0 = { vector-variant =
{"_ZGVcN2v_sin(__svml_sin2),
>>>>>>>>>>>>                          
_ZGVdN4v_sin(__svml_sin4),
>>>>>>>>>>>>                          
..."} }
>>>>>>>>>>>> 
>>>>>>>>>>>> The string
`"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant
>>>>>>>>>>>> attribute provides information
on the shape of the vector function via
>>>>>>>>>>>> the string `_ZGVcN2v_sin`,
mangled according to the Vector Function ABI
>>>>>>>>>>>> for Intel, and remaps the
standard Vector Function ABI name to the
>>>>>>>>>>>> non-standard name
`__svml_sin2`.
>>>>>>>>>>>> 
>>>>>>>>>>>> This metadata is compatible
with the proposal "Proposal for function
>>>>>>>>>>>> vectorization and loop
vectorization with function calls",[^3] that uses
>>>>>>>>>>>> Vector Function ABI mangled
names to inform the vectorizer about the
>>>>>>>>>>>> availability of vector
functions. The proposal extends the original by
>>>>>>>>>>>> allowing the explicit mapping
of the Vector Function ABI mangled name to
>>>>>>>>>>>> a non-standard name, which
allows the use of existing vector libraries.
>>>>>>>>>>>> 
>>>>>>>>>>>> The `vector-variant` attribute
needs to be attached on a per-call basis
>>>>>>>>>>>> to avoid conflicts when merging
modules with different vector variants.
>>>>>>>>>>>> 
>>>>>>>>>>>> The query infrastructure: SVFS
{#infrastructure}
>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>> 
>>>>>>>>>>>> The Search Vector Function
System (SVFS) is constructed from an
>>>>>>>>>>>> `llvm::Module` instance so it
can create function definitions. The SVFS
>>>>>>>>>>>> exposes an API with two
methods.
>>>>>>>>>>>> 
>>>>>>>>>>>> ###
`SVFS::isFunctionVectorizable`
>>>>>>>>>>>> 
>>>>>>>>>>>> This method queries the
avilability of a vectorized version of a
>>>>>>>>>>>> function. The signature of the
method is as follows.
>>>>>>>>>>>> 
>>>>>>>>>>>> bool
isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap Params);
>>>>>>>>>>>> 
>>>>>>>>>>>> The method determine the
availability of vector version of the function
>>>>>>>>>>>> invoked by the `Call` parameter
by looking at the `vector-variant`
>>>>>>>>>>>> metadata.
>>>>>>>>>>>> 
>>>>>>>>>>>> The `Params` argument is a map
that associates the position of a
>>>>>>>>>>>> parameter in the `CallInst` to
its `ParameterType` descriptor. The
>>>>>>>>>>>> `ParameterType` descriptor
holds information about the shape of the
>>>>>>>>>>>> correspondend parameter in the
signature of the vector function. This
>>>>>>>>>>>> `ParamaterType` is used to
query the SVMS about the availability of
>>>>>>>>>>>> vector version that have
`linear`, `uniform` or `align` parameters (in
>>>>>>>>>>>> the sense of OpenMP 4.0 and
onwards).
>>>>>>>>>>>> 
>>>>>>>>>>>> The method
`isFunctionVectorizable`, when invoked with an empty
>>>>>>>>>>>> `ParTypeMap`, is equivalent to
the `TargetLibraryInfo` method
>>>>>>>>>>>>
`isFunctionVectorizable(StrinRef Name)`.
>>>>>>>>>>>> 
>>>>>>>>>>>> ###
`SVFS::getVectorizedFunction`
>>>>>>>>>>>> 
>>>>>>>>>>>> This method returns the vector
function declaration that correspond to
>>>>>>>>>>>> the needs of the vectorization
technique that is being run.
>>>>>>>>>>>> 
>>>>>>>>>>>> The signature of the function
is as follows.
>>>>>>>>>>>> 
>>>>>>>>>>>> std::pair<llvm::FunctionType
*, std::string> getVectorizedFunction(
>>>>>>>>>>>>   llvm::CallInst * Call,
unsigned VF, bool IsMasked, ParTypeSet Params);
>>>>>>>>>>>> 
>>>>>>>>>>>> The `Call` parameter is the
call instance that is being vectorized, the
>>>>>>>>>>>> `VF` parameter represent the
vectorization factor (how many lanes), the
>>>>>>>>>>>> `IsMasked` parameter decides
whether or not the signature of the vector
>>>>>>>>>>>> function is required to have a
mask parameter, the `Params` parameter
>>>>>>>>>>>> describes the shape of the
vector function as in the
>>>>>>>>>>>> `isFunctionVectorizable`
method.
>>>>>>>>>>>> 
>>>>>>>>>>>> The methods uses the
`vector-variant` metadata and returns the function
>>>>>>>>>>>> signature and the name of the
function based on the input parameters.
>>>>>>>>>>>> 
>>>>>>>>>>>> The SVFS can add new function
definitions, in the same module as the
>>>>>>>>>>>> `Call`, to provide vector
functions that are not present within the
>>>>>>>>>>>> vector-variant metadata. For
example, if a library provides a vector
>>>>>>>>>>>> version of a function with a
vectorization factor of 2, but the
>>>>>>>>>>>> vectorizer is requesting a
vectorization factor of 4, the SVFS is
>>>>>>>>>>>> allowed to create a definition
that calls the 2-lane version twice. This
>>>>>>>>>>>> capability applies similarly
for providing masked and unmasked versions
>>>>>>>>>>>> when the request does not match
what is available in the library.
>>>>>>>>>>>> 
>>>>>>>>>>>> This method is equivalent to
the TLI method
>>>>>>>>>>>> `StringRef
getVectorizedFunction(StringRef F, unsigned VF) const;`.
>>>>>>>>>>>> 
>>>>>>>>>>>> Notice that to fully support
OpenMP vectorization we need to think about
>>>>>>>>>>>> a fuzzy matching mechanism that
is able to select a candidate in the
>>>>>>>>>>>> calling context. However, this
proposal is intended for scalar-to-vector
>>>>>>>>>>>> mappings of math-like functions
that are most likely to associate a
>>>>>>>>>>>> unique vector candidate in most
contexts. Therefore, extending this
>>>>>>>>>>>> behavior to a generic one is an
aspect of the implementation that will
>>>>>>>>>>>> be treated in a separate RFC
about the vectorization pass.
>>>>>>>>>>>> 
>>>>>>>>>>>> ### Scalable vectorization
>>>>>>>>>>>> 
>>>>>>>>>>>> Both methods of the SVFS API
will be extended with a boolean parameter
>>>>>>>>>>>> to specify whether scalable
signatures are needed by the user of the
>>>>>>>>>>>> SVFS.
>>>>>>>>>>>> 
>>>>>>>>>>>> Changes in clang {#clang}
>>>>>>>>>>>> ----------------
>>>>>>>>>>>> 
>>>>>>>>>>>> We use clang to generate the
metadata described above.
>>>>>>>>>>>> 
>>>>>>>>>>>> In the compilation unit, the
vector function definition or declaration
>>>>>>>>>>>> must be visible and associated
to the scalar version via the
>>>>>>>>>>>> `#pragma clang declare variant`
according to the rule defined by the
>>>>>>>>>>>> correspondent `#pragma omp
declare variant` defined in OpenMP 5.0, as in
>>>>>>>>>>>> the following example.
>>>>>>>>>>>> 
>>>>>>>>>>>> #pragma clang declare
variant(vector_sinf) \
>>>>>>>>>>>>
match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})
>>>>>>>>>>>> extern float sinf(float);
>>>>>>>>>>>> 
>>>>>>>>>>>> float32x4_t
vector_sinf(float32x4_t x);
>>>>>>>>>>>> 
>>>>>>>>>>>> The `construct` set in the
directive, together with the `device` set, is
>>>>>>>>>>>> used to generate the vector
mangled name to be used in the
>>>>>>>>>>>> `vector-variant` attribute, for
example `_ZGVnN2v_sin`, when targeting
>>>>>>>>>>>> AArch64 Advanced SIMD code
generation. The rule for mangling the name of
>>>>>>>>>>>> the scalar function in the
vector name are defined in the the Vector
>>>>>>>>>>>> Function ABI specification of
the target.
>>>>>>>>>>>> 
>>>>>>>>>>>> The part of the vector-variant
attribute that redirects the call to
>>>>>>>>>>>> `vector_sinf` is derived from
the `variant-id` specified in the
>>>>>>>>>>>> `variant` clause.
>>>>>>>>>>>> 
>>>>>>>>>>>> Summary
>>>>>>>>>>>>
======>>>>>>>>>>>>
>>>>>>>>>>>> New `clang` directive in clang
>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>> 
>>>>>>>>>>>> `#pragma omp declare variant`,
same as `#pragma omp declare variant`
>>>>>>>>>>>> restricted to the `simd`
context selector, from OpenMP 5.0+.
>>>>>>>>>>>> 
>>>>>>>>>>>> Option behavior, and
interaction with OpenMP
>>>>>>>>>>>>
--------------------------------------------
>>>>>>>>>>>> 
>>>>>>>>>>>> The behavior described below
makes sure that
>>>>>>>>>>>> `#pragma cland declare variant`
function vectorization and OpenMP
>>>>>>>>>>>> function vectorization are
orthogonal.
>>>>>>>>>>>> 
>>>>>>>>>>>> `-fclang-declare-variant`
>>>>>>>>>>>> 
>>>>>>>>>>>> :   The `#pragma clang declare
variant` directives are parsed and used
>>>>>>>>>>>> to populate the
`vector-variant` attribute.
>>>>>>>>>>>> 
>>>>>>>>>>>> `-fopenmp[-simd]`
>>>>>>>>>>>> 
>>>>>>>>>>>> :   The `#pragma omp declare
variant` directives are parsed and used to
>>>>>>>>>>>> populate the `vector-variant`
attribute.
>>>>>>>>>>>> 
>>>>>>>>>>>> `-fopenmp[-simd]`and
`-fno-clang-declare-variant`
>>>>>>>>>>>> 
>>>>>>>>>>>> :   The directive `#pragma omp
declare variant` is used to populate the
>>>>>>>>>>>> `vector-variant` attribute in
IR. The directive
>>>>>>>>>>>> `#pragma   clang declare
variant` are ignored.
>>>>>>>>>>>> 
>>>>>>>>>>>> [^1]:
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
>>>>>>>>>>>> 
>>>>>>>>>>>> [^2]: Vector Function ABI for
x86:
>>>>>>>>>>>>
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
>>>>>>>>>>>> Vector Function ABI for
AArch64:
>>>>>>>>>>>>
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi
>>>>>>>>>>>> 
>>>>>>>>>>>> [^3]:
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
>>>>>>>>>>>> 
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>> -- 
>>>>>>>> Hal Finkel
>>>>>>>> Lead, Compiler Technology and Programming
Languages
>>>>>>>> Leadership Computing Facility
>>>>>>>> Argonne National Laboratory
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> cfe-dev mailing list
>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>> 
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>> 
>

Francesco Petrogalli via llvm-dev

2019-May-31 18:43 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

> On May 31, 2019, at 1:04 PM, Alexey Bataev <a.bataev at hotmail.com>
wrote:
> 
> You can define clang specific attribute and later add GCC alias for it.
> 
This sounds very much equivalent to defining a clang specific directive and
later add the GCC attributes? Should be less risky, because the GCC attributes
will very likely represent what is in OpenMP? So doing the clang directive would
guarantee more compatibility?

Or am I missing something about the two possible implementations?

Francesco

> Best regards,
> Alexey Bataev
> 
>> 31 мая 2019 г., в 13:46, Francesco Petrogalli <Francesco.Petrogalli
at arm.com> написал(а):
>> 
>> 
>> 
>>> On May 31, 2019, at 12:38 PM, Alexey Bataev <a.bataev at
hotmail.com> wrote:
>>> 
>>> Francesco, there won't be any duplication. Most of the
declarative OpenMP directives are represented as attributes internally, so, I
think, it will be natural to use an attribute here rather than pragma.
>>> 
>> 
>> Very nice. I am open to get rid of the `clang` based directive in favor
of the attribute one.
>> 
>> At the moment there is no “declare variant” attribute in the list of
common function attributes at
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes.
What should we do? Define our own, or wait for GCC to publish the attribute?
>> 
>> If we decide to do what GCC does, shall we actively participate in the
discussion with the GCC community to shape the attribute itself? Am I right
thinking that the GCC solution is preferred given that system header files are
more likely to follow whatever GCC comes up with?
>> 
>> 
>>> Best regards,
>>> Alexey Bataev
>>> 
>>>> 31 мая 2019 г., в 13:32, Francesco Petrogalli
<Francesco.Petrogalli at arm.com> написал(а):
>>>> 
>>>> 
>>>> 
>>>>> On May 31, 2019, at 12:00 PM, Alexey Bataev via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
>>>>> 
>>>>> Hi Francesco, did you think about adding the attribute
instead of the pragma? It is a common way to express such constructs as function
attributes in clang/GCC rather than as pragma.
>>>>> 
>>>> 
>>>> Yes, I thought about it, I believe that GCC plans to use
attributes.
>>>> 
>>>> In
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
there is a description of the “simd” attribute, but I couldn’t find an example
of how it is used.
>>>> 
>>>> Also, I cannot find an attribute that is equivalent to the
“declare variant” directive. Maybe it is planned for future releases.
>>>> 
>>>> The idea of using a `clang` equivalent of the `omp` directive
was to avoid duplication in terms of handling both the attribute and the omp
directive, as both directive will share much of the infrastructure.
>>>> 
>>>>> Best regards,
>>>>> Alexey Bataev
>>>>> 
>>>>>> 31 мая 2019 г., в 12:18, Francesco Petrogalli via
cfe-dev <cfe-dev at lists.llvm.org> написал(а):
>>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> Thank you for the feedback so far.
>>>>>> 
>>>>>> I am replying to all your
questions/concerns/suggestions in this single email. Please let me know if I
have missed any.
>>>>>> 
>>>>>> I will update the RFC accordingly to what we end up
deciding here.
>>>>>> 
>>>>>> Kind regards,
>>>>>> 
>>>>>> Francesco
>>>>>> 
>>>>>> 
>>>>>> # TOPIC 1: concerns about name mangling
>>>>>> 
>>>>>> I understand that there are concerns in using the
mangling scheme I proposed, and that it would be preferred to have a mangling
scheme that is based on (and standardized by) OpenMP. I hear the argument on
having some common ground here. In fact, there is already common ground between
the x86 and aarch64 backend, who have based their respective Vector Function ABI
specifications on OpenMP.
>>>>>> 
>>>>>> In fact, the mangled name grammar can be summarized as
follows:
>>>>>> 
>>>>>> _ZGV<isa><masking><VLEN><parameter
type>_<scalar name>
>>>>>> 
>>>>>> Across vector extensions the only <token> that
will differ is the <isa> token.
>>>>>> 
>>>>>> This might lead people to think that we could drop the
_ZGV<isa> prefix and consider the <masking><VLEN><parameter
type>_<scalar name> part as a sort of unofficial OpenMP mangling
scheme: in fact, the signature of an “unmasked 2-lane vector vector of `sin`”
will always be `<2 x double>(2 x double>).
>>>>>> 
>>>>>> The problem with this choice is the number of vector
version available for a target is not unique.
>>>>>> 
>>>>>> In particular, the following declaration generates
multiple vector versions, depending on the target:
>>>>>> 
>>>>>> #pragma omp declare simd simdlen(2) notinbranch
>>>>>> double foo(double) {…};
>>>>>> 
>>>>>> On x86, this generates at least 4 symbols (one for SSE,
one for AVX, one for AVX2, and one for AVX512: https://godbolt.org/z/TLYXPi)
>>>>>> 
>>>>>> On aarch64, the same declaration generates a unique
symbol, as specified in the Vector Function ABI.
>>>>>> 
>>>>>> This means that the attribute (or metadata) that
carries the information on the available vector version needs to deal also with
things that are not usually visible at IR level, but that might still need to be
provided to be able to decide which particular instruction set/ vector extension
needs to be targeted.
>>>>>> 
>>>>>> I used an example based on `declare simd` instead of
`declare variant` because the attribute/metadata needed for `declare variant` is
a modification of the one needed for `declare simd`, which has already been
agreed in a previous RFC proposed by Intel [1], and for which Intel has already
provided an implementation [2]. The changes proposed in this RFC are fully
compatible with the work that is being don for the VecClone pass in [2].
>>>>>> 
>>>>>> [1]
http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
>>>>>> [2] VecCLone pass: https://reviews.llvm.org/D22792
>>>>>> 
>>>>>> The good news is that as far as AArch64 and x86 are
concerned, the only thing that will differ in the mangled name is the
“<isa>” token. As far as I can tell, the mangling scheme of the rest of
the vector name is the same, therefore a lot of infrastructure in terms of
mangling and demangling can be reused. In fact, the `mangleVectorParameters`
function in
https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could
already be shared among x86 and aarch64.
>>>>>> 
>>>>>> TOPIC 2: metadata vs attribute
>>>>>> 
>>>>>> From a functionality point of view, I don’t care
whether we use metadata or attributes. The VecClone pass mentioned in TOPIC 1
uses the following:
>>>>>> 
>>>>>> attributes #0 = { nounwind uwtable
“vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16”}
>>>>>> 
>>>>>> This is an attribute (I though it was metadata?), I am
happy to reword the RFC using the right terminology (sorry for messing this up).
>>>>>> 
>>>>>> Also, @Renato expressed concern that metadata might be
dropped by optimization passes - would using attributes prevent that?
>>>>>> 
>>>>>> TOPIC 3: "There is no way to notify the backend
how conformant the SIMD versions are.”
>>>>>> 
>>>>>> @Shawn, I am afraid I don’t understand what you mean by
“conformant” here. Can you elaborate with an example?
>>>>>> 
>>>>>> TOPIC 3: interaction of the `omp declare variant` with
`clang declare variant`
>>>>>> 
>>>>>> I believe this is described in the `Option behavior,
and interaction with OpenMP`. The option `-fclang-declare-variant` is there to
make the OpenMP based one orthogonal. Of course, we might decide to make
-fclang-declare-variant on/off by default, and have default behavior when
interacting with -fopenmp-simd. For the sake of compatibility with other
compilers, we might need to require -fno-clang-declare-variant when targeting
-fopenmp-[simd].
>>>>>> 
>>>>>> TOPIC 3: "there are no special arguments / flags /
status regs that are used / changed in the vector version that the compiler will
have to "just know”
>>>>>> 
>>>>>> I believe that this concern is raised by the problem of
handling FP exceptions? If that’s the case, the compiler is not allowed to do
any assumption on the vector function about that, and treat it with the same
knowledge of any other function, depending on the visibility it has in the
compilation unit. @Renato, does this answer your question?
>>>>>> 
>>>>>> TOPIC 4: attribute in function declaration vs attribute
function call site
>>>>>> 
>>>>>> We discussed this in the previous version of the
proposal. Having it in the call sites guarantees that incompatible vector
version are used when merging modules compiled for different targets. I don’t
have a use case for this, if I remember correctly this was asked by @Hideki
Saito. Hideki, any comment on this?
>>>>>> 
>>>>>> TOPIC 5: overriding system header (the discussion on
#pragma omp/clang/system variants initiated by @Hal Finkel).
>>>>>> 
>>>>>> I though that the split among #pragma clang declare
variant and #pragma omp declare variant was already providing the orthogonality
between system header and user header. Meaning that a user should always prefer
the omp version (for portability to other compilers) instead of the #pragma
clang one, which would be relegated to system headers and headers provided by
the compiler. Am I missing something? If so, I am happy to add a “system”
version of the directive, as it would be quite easy to do given most of the
parsing infrastructure will be shared.
>>>>>> 
>>>>>> 
>>>>>>> On May 30, 2019, at 12:53 PM, Philip Reames
<listmail at philipreames.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>>>> On 5/30/19 9:05 AM, Doerfert, Johannes
wrote:
>>>>>>>>>>> On 05/29, Finkel, Hal J. via
cfe-dev wrote:
>>>>>>>>>>>> On 5/29/19 1:52 PM, Philip
Reames wrote:
>>>>>>>>>>>> On 5/28/19 7:55 PM, Finkel, Hal
J. wrote:
>>>>>>>>>>>> On 5/28/19 3:31 PM, Philip
Reames via cfe-dev wrote:
>>>>>>>>>>>> I generally like the idea of
having support in IR for vectorization of
>>>>>>>>>>>> custom functions.  I have
several use cases which would benefit from this.
>>>>>>>>>>>> 
>>>>>>>>>>>> I'd suggest a couple of
reframings to the IR representation though.
>>>>>>>>>>>> 
>>>>>>>>>>>> First, this should probably be
specified as metadata/attribute on a
>>>>>>>>>>>> function declaration.  Allowing
the callsite variant is fine, but it
>>>>>>>>>>>> should primarily be a property
of the called function, not of the call
>>>>>>>>>>>> site.  Being able to specify it
once per declaration is much cleaner.
>>>>>>>>>>> I agree. We should support this
both on the function declaration and on
>>>>>>>>>>> the call sites.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> Second, I really don't like
the mangling use here.  We need a better way
>>>>>>>>>>>> to specify the properties of
the function then it's mangled name.  One
>>>>>>>>>>>> thought to explore is to
directly use the Value of the function
>>>>>>>>>>>> declaration (since this is
metadata and we can do that), and then tie
>>>>>>>>>>>> the properties to the function
declaration in some way?  Sorry, I don't
>>>>>>>>>>>> really have a specific
suggestion here.
>>>>>>>>>>> Is the problem the mangling or the
fact that the mangling is
>>>>>>>>>>> ABI/target-specific? One option is
to use LLVM's mangling scheme (the
>>>>>>>>>>> one we use for intrinsics) and then
provide some backend infrastructure
>>>>>>>>>>> to translate later.
>>>>>>>>>> Well, both honestly.  But mangling with
a non-target specific scheme is
>>>>>>>>>> a lot better, so I might be okay with
that.   Good idea.
>>>>>>>>> 
>>>>>>>>> I liked your idea of directly encoding the
signature in the metadata,
>>>>>>>>> but I think that we want to continue to use
attributes, and not
>>>>>>>>> metadata, and the options for attributes
seem more limited - unless we
>>>>>>>>> allow attributes to take metadata arguments
- maybe that's an
>>>>>>>>> enhancement worth considering.
>>>>>>>> I recently talked to people in the OpenMP
language committee meeting
>>>>>>>> about this and, thinking forward to the actual
implementation/use of the
>>>>>>>> OpenMP 5.x declare variant feature, I'd
say:
>>>>>>>> 
>>>>>>>> - We will need a mangling scheme if we want to
allow variants on
>>>>>>>> declarations that are defined elsewhere.
>>>>>>>> - We will need a (OpenMP) standardized mangling
scheme if we want
>>>>>>>> interoperability between compilers.
>>>>>>>> 
>>>>>>>> I assume we want both so I think we will need
both.
>>>>>>> If I'm reading this correctly, this describes a
need for the frontend to
>>>>>>> have a mangling scheme.  Nothing in here would seem
to prevent the
>>>>>>> frontend for generating a declaration for a mangled
external symbol and
>>>>>>> then referencing that declaration.  Am I missing
something?
>>>>>>>> 
>>>>>>>> That said, I think this should allow us to
avoid attributes/metadata
>>>>>>>> which seems to me like a good thing right now.
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Johannes
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>>>>> On 5/28/19 12:44 PM,
Francesco Petrogalli via llvm-dev wrote:
>>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This RFC is a proposal to
provide auto-vectorization functionality for user provided vector functions.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The proposal is a
modification of an RFC that I have sent out a couple of months ago, with the
title `[RFC] Re-implementing -fveclib with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The original RFC was
proposing to re-implement the `-fveclib` command line option. This proposal
avoids that, and limits its scope to the mechanics of providing vector function
in user code that the compiler can pick up for auto-vectorization. This narrower
scope limits the impact of changes that are needed in both clang and LLVM.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Please let me know what you
think.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Francesco
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>
================================================================================>>>>>>>>>>>>>
>>>>>>>>>>>>> Introduction
>>>>>>>>>>>>>
===========>>>>>>>>>>>>>
>>>>>>>>>>>>> This RFC encompasses the
proposal of informing the vectorizer about the
>>>>>>>>>>>>> availability of vector
functions provided by the user. The mechanism is
>>>>>>>>>>>>> based on the use of the
directive `declare variant` introduced in OpenMP
>>>>>>>>>>>>> 5.0 [^1].
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The mechanism proposed has
the following properties:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1.  Decouples the compiler
front-end that knows about the availability
>>>>>>>>>>>>> of vectorized routines,
from the back-end that knows how to make use
>>>>>>>>>>>>> of them.
>>>>>>>>>>>>> 2.  Enable support for a
developer's own vector libraries without
>>>>>>>>>>>>> requiring changes to the
compiler.
>>>>>>>>>>>>> 3.  Enables other frontends
(e.g. f18) to add scalar-to-vector function
>>>>>>>>>>>>> mappings as relevant for
their own runtime libraries, etc.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The implemetation consists
of two separate sets of changes.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The first set is a set o
changes in `llvm`, and consists of:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1.  [Changes in LLVM
IR](#llvmIR) to provide information about the
>>>>>>>>>>>>> availability of
user-defined vector functions via metadata attached
>>>>>>>>>>>>> to an `llvm::CallInst`.
>>>>>>>>>>>>> 2.  [An
infrastructure](#infrastructure) that can be queried to retrive
>>>>>>>>>>>>> information about the
available vector functions associated to a
>>>>>>>>>>>>> `llvm::CallInst`.
>>>>>>>>>>>>> 3.  [Changes in the
LoopVectorizer](#LV) to use the API to query the
>>>>>>>>>>>>> metadata.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The second set consists of
the changes [changes in clang](#clang) that
>>>>>>>>>>>>> are needed too to recognize
the `#pragma clang declare variant`
>>>>>>>>>>>>> directive.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Proposed changes
>>>>>>>>>>>>>
===============>>>>>>>>>>>>>
>>>>>>>>>>>>> We propose an
implementation that uses `#pragma clang declare variant`
>>>>>>>>>>>>> to inform the backend
components about the availability of vector
>>>>>>>>>>>>> version of scalar functions
found in IR. The mechanism relies in storing
>>>>>>>>>>>>> such information in IR
metadata, and therefore makes the
>>>>>>>>>>>>> auto-vectorization of
function calls a mid-end (`opt`) process that is
>>>>>>>>>>>>> independent on the
front-end that generated such IR metadata.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This implementation
provides a generic mechanism that the users of the
>>>>>>>>>>>>> LLVM compiler will be able
to use for interfacing their own vector
>>>>>>>>>>>>> routines for generic code.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The implementation can also
expose vectorization-specific descriptors --
>>>>>>>>>>>>> for example, like the
`linear` and `uniform` clauses of the OpenMP
>>>>>>>>>>>>> `declare simd` directive --
that could be used to finely tune the
>>>>>>>>>>>>> automatic vectorization of
some functions (think for example the
>>>>>>>>>>>>> vectorization of `double
sincos(double , double *, double *)`, where
>>>>>>>>>>>>> `linear` can be used to
give extra information about the memory layout
>>>>>>>>>>>>> of the 2 pointers
parameters in the vector version).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The directive `#pragma
clang declare variant` follows the syntax of the
>>>>>>>>>>>>> `#pragma omp declare
variant` directive of OpenMP.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We define the new directive
in the `clang` namespace instead of using
>>>>>>>>>>>>> the `omp` one of OpenMP to
allow the compiler to perform
>>>>>>>>>>>>> auto-vectorization outside
of an OpenMP SIMD context.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The mechanism is base on
OpenMP to provide a uniform user experience
>>>>>>>>>>>>> across the two mechanism,
and to maximise the number of shared
>>>>>>>>>>>>> components of the
infrastructure needed in the compiler frontend to
>>>>>>>>>>>>> enable the feature.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Changes in LLVM IR
{#llvmIR}
>>>>>>>>>>>>> ------------------
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The IR is enriched with
metadata that details the availability of vector
>>>>>>>>>>>>> versions of an associated
scalar function. This metadata is attached to
>>>>>>>>>>>>> the call site of the scalar
function.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The metadata takes the form
of an attribute containing a comma separated
>>>>>>>>>>>>> list of vector function
mappings. Each entry has a unique name that
>>>>>>>>>>>>> follows the Vector Function
ABI[^2] and real name that is used when
>>>>>>>>>>>>> generating calls to this
vector function.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> vfunc_name1(real_name1),
vfunc_name2(real_name2)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The Vector Function ABI
name describes the signature of the vector
>>>>>>>>>>>>> function so that properties
like vectorisation factor can be queried
>>>>>>>>>>>>> during compilation.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The `(real name)` token is
optional and assumed to match the Vector
>>>>>>>>>>>>> Function ABI name when
omitted.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For example, the
availability of a 2-lane double precision `sin`
>>>>>>>>>>>>> function via SVML when
targeting AVX on x86 is provided by the following
>>>>>>>>>>>>> IR.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> // ...
>>>>>>>>>>>>> ... = call double
@sin(double) #0
>>>>>>>>>>>>> // ...
>>>>>>>>>>>>> 
>>>>>>>>>>>>> #0 = { vector-variant =
{"_ZGVcN2v_sin(__svml_sin2),
>>>>>>>>>>>>>                         
_ZGVdN4v_sin(__svml_sin4),
>>>>>>>>>>>>>                         
..."} }
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The string
`"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant
>>>>>>>>>>>>> attribute provides
information on the shape of the vector function via
>>>>>>>>>>>>> the string `_ZGVcN2v_sin`,
mangled according to the Vector Function ABI
>>>>>>>>>>>>> for Intel, and remaps the
standard Vector Function ABI name to the
>>>>>>>>>>>>> non-standard name
`__svml_sin2`.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This metadata is compatible
with the proposal "Proposal for function
>>>>>>>>>>>>> vectorization and loop
vectorization with function calls",[^3] that uses
>>>>>>>>>>>>> Vector Function ABI mangled
names to inform the vectorizer about the
>>>>>>>>>>>>> availability of vector
functions. The proposal extends the original by
>>>>>>>>>>>>> allowing the explicit
mapping of the Vector Function ABI mangled name to
>>>>>>>>>>>>> a non-standard name, which
allows the use of existing vector libraries.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The `vector-variant`
attribute needs to be attached on a per-call basis
>>>>>>>>>>>>> to avoid conflicts when
merging modules with different vector variants.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The query infrastructure:
SVFS {#infrastructure}
>>>>>>>>>>>>>
------------------------------
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The Search Vector Function
System (SVFS) is constructed from an
>>>>>>>>>>>>> `llvm::Module` instance so
it can create function definitions. The SVFS
>>>>>>>>>>>>> exposes an API with two
methods.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ###
`SVFS::isFunctionVectorizable`
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This method queries the
avilability of a vectorized version of a
>>>>>>>>>>>>> function. The signature of
the method is as follows.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> bool
isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap Params);
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The method determine the
availability of vector version of the function
>>>>>>>>>>>>> invoked by the `Call`
parameter by looking at the `vector-variant`
>>>>>>>>>>>>> metadata.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The `Params` argument is a
map that associates the position of a
>>>>>>>>>>>>> parameter in the `CallInst`
to its `ParameterType` descriptor. The
>>>>>>>>>>>>> `ParameterType` descriptor
holds information about the shape of the
>>>>>>>>>>>>> correspondend parameter in
the signature of the vector function. This
>>>>>>>>>>>>> `ParamaterType` is used to
query the SVMS about the availability of
>>>>>>>>>>>>> vector version that have
`linear`, `uniform` or `align` parameters (in
>>>>>>>>>>>>> the sense of OpenMP 4.0 and
onwards).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The method
`isFunctionVectorizable`, when invoked with an empty
>>>>>>>>>>>>> `ParTypeMap`, is equivalent
to the `TargetLibraryInfo` method
>>>>>>>>>>>>>
`isFunctionVectorizable(StrinRef Name)`.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ###
`SVFS::getVectorizedFunction`
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This method returns the
vector function declaration that correspond to
>>>>>>>>>>>>> the needs of the
vectorization technique that is being run.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The signature of the
function is as follows.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>
std::pair<llvm::FunctionType *, std::string> getVectorizedFunction(
>>>>>>>>>>>>>  llvm::CallInst * Call,
unsigned VF, bool IsMasked, ParTypeSet Params);
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The `Call` parameter is the
call instance that is being vectorized, the
>>>>>>>>>>>>> `VF` parameter represent
the vectorization factor (how many lanes), the
>>>>>>>>>>>>> `IsMasked` parameter
decides whether or not the signature of the vector
>>>>>>>>>>>>> function is required to
have a mask parameter, the `Params` parameter
>>>>>>>>>>>>> describes the shape of the
vector function as in the
>>>>>>>>>>>>> `isFunctionVectorizable`
method.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The methods uses the
`vector-variant` metadata and returns the function
>>>>>>>>>>>>> signature and the name of
the function based on the input parameters.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The SVFS can add new
function definitions, in the same module as the
>>>>>>>>>>>>> `Call`, to provide vector
functions that are not present within the
>>>>>>>>>>>>> vector-variant metadata.
For example, if a library provides a vector
>>>>>>>>>>>>> version of a function with
a vectorization factor of 2, but the
>>>>>>>>>>>>> vectorizer is requesting a
vectorization factor of 4, the SVFS is
>>>>>>>>>>>>> allowed to create a
definition that calls the 2-lane version twice. This
>>>>>>>>>>>>> capability applies
similarly for providing masked and unmasked versions
>>>>>>>>>>>>> when the request does not
match what is available in the library.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This method is equivalent
to the TLI method
>>>>>>>>>>>>> `StringRef
getVectorizedFunction(StringRef F, unsigned VF) const;`.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Notice that to fully
support OpenMP vectorization we need to think about
>>>>>>>>>>>>> a fuzzy matching mechanism
that is able to select a candidate in the
>>>>>>>>>>>>> calling context. However,
this proposal is intended for scalar-to-vector
>>>>>>>>>>>>> mappings of math-like
functions that are most likely to associate a
>>>>>>>>>>>>> unique vector candidate in
most contexts. Therefore, extending this
>>>>>>>>>>>>> behavior to a generic one
is an aspect of the implementation that will
>>>>>>>>>>>>> be treated in a separate
RFC about the vectorization pass.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ### Scalable vectorization
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Both methods of the SVFS
API will be extended with a boolean parameter
>>>>>>>>>>>>> to specify whether scalable
signatures are needed by the user of the
>>>>>>>>>>>>> SVFS.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Changes in clang {#clang}
>>>>>>>>>>>>> ----------------
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We use clang to generate
the metadata described above.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In the compilation unit,
the vector function definition or declaration
>>>>>>>>>>>>> must be visible and
associated to the scalar version via the
>>>>>>>>>>>>> `#pragma clang declare
variant` according to the rule defined by the
>>>>>>>>>>>>> correspondent `#pragma omp
declare variant` defined in OpenMP 5.0, as in
>>>>>>>>>>>>> the following example.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> #pragma clang declare
variant(vector_sinf) \
>>>>>>>>>>>>>
match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})
>>>>>>>>>>>>> extern float sinf(float);
>>>>>>>>>>>>> 
>>>>>>>>>>>>> float32x4_t
vector_sinf(float32x4_t x);
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The `construct` set in the
directive, together with the `device` set, is
>>>>>>>>>>>>> used to generate the vector
mangled name to be used in the
>>>>>>>>>>>>> `vector-variant` attribute,
for example `_ZGVnN2v_sin`, when targeting
>>>>>>>>>>>>> AArch64 Advanced SIMD code
generation. The rule for mangling the name of
>>>>>>>>>>>>> the scalar function in the
vector name are defined in the the Vector
>>>>>>>>>>>>> Function ABI specification
of the target.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The part of the
vector-variant attribute that redirects the call to
>>>>>>>>>>>>> `vector_sinf` is derived
from the `variant-id` specified in the
>>>>>>>>>>>>> `variant` clause.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Summary
>>>>>>>>>>>>>
======>>>>>>>>>>>>>
>>>>>>>>>>>>> New `clang` directive in
clang
>>>>>>>>>>>>>
------------------------------
>>>>>>>>>>>>> 
>>>>>>>>>>>>> `#pragma omp declare
variant`, same as `#pragma omp declare variant`
>>>>>>>>>>>>> restricted to the `simd`
context selector, from OpenMP 5.0+.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Option behavior, and
interaction with OpenMP
>>>>>>>>>>>>>
--------------------------------------------
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The behavior described
below makes sure that
>>>>>>>>>>>>> `#pragma cland declare
variant` function vectorization and OpenMP
>>>>>>>>>>>>> function vectorization are
orthogonal.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> `-fclang-declare-variant`
>>>>>>>>>>>>> 
>>>>>>>>>>>>> :   The `#pragma clang
declare variant` directives are parsed and used
>>>>>>>>>>>>> to populate the
`vector-variant` attribute.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> `-fopenmp[-simd]`
>>>>>>>>>>>>> 
>>>>>>>>>>>>> :   The `#pragma omp
declare variant` directives are parsed and used to
>>>>>>>>>>>>> populate the
`vector-variant` attribute.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> `-fopenmp[-simd]`and
`-fno-clang-declare-variant`
>>>>>>>>>>>>> 
>>>>>>>>>>>>> :   The directive `#pragma
omp declare variant` is used to populate the
>>>>>>>>>>>>> `vector-variant` attribute
in IR. The directive
>>>>>>>>>>>>> `#pragma   clang declare
variant` are ignored.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [^1]:
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [^2]: Vector Function ABI
for x86:
>>>>>>>>>>>>>
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
>>>>>>>>>>>>> Vector Function ABI for
AArch64:
>>>>>>>>>>>>>
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [^3]:
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
>>>>>>>>>>>>> 
>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>> LLVM Developers mailing
list
>>>>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>> -- 
>>>>>>>>> Hal Finkel
>>>>>>>>> Lead, Compiler Technology and Programming
Languages
>>>>>>>>> Leadership Computing Facility
>>>>>>>>> Argonne National Laboratory
>>>>>>>>> 
>>>>>>>>>
_______________________________________________
>>>>>>>>> cfe-dev mailing list
>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>> 
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>> 
>>

Alexey Bataev via llvm-dev

2019-May-31 18:48 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

Yes, this is very similar, but only expressed in terms of clang attributes,
which may have different spellings for clang, GCC,  c++11 etc. I don't think
GCC will implement this as pragma. They added simd attribute instead of pragma.

Best regards,
Alexey Bataev
> 31 мая 2019 г., в 14:43, Francesco Petrogalli <Francesco.Petrogalli at
arm.com> написал(а):
> 
> 
> 
>> On May 31, 2019, at 1:04 PM, Alexey Bataev <a.bataev at
hotmail.com> wrote:
>> 
>> You can define clang specific attribute and later add GCC alias for it.
>> 
> 
> This sounds very much equivalent to defining a clang specific directive and
later add the GCC attributes? Should be less risky, because the GCC attributes
will very likely represent what is in OpenMP? So doing the clang directive would
guarantee more compatibility?
> 
> Or am I missing something about the two possible implementations?
> 
> Francesco
> 
> 
>> Best regards,
>> Alexey Bataev
>> 
>>> 31 мая 2019 г., в 13:46, Francesco Petrogalli
<Francesco.Petrogalli at arm.com> написал(а):
>>> 
>>> 
>>> 
>>>> On May 31, 2019, at 12:38 PM, Alexey Bataev <a.bataev at
hotmail.com> wrote:
>>>> 
>>>> Francesco, there won't be any duplication. Most of the
declarative OpenMP directives are represented as attributes internally, so, I
think, it will be natural to use an attribute here rather than pragma.
>>>> 
>>> 
>>> Very nice. I am open to get rid of the `clang` based directive in
favor of the attribute one.
>>> 
>>> At the moment there is no “declare variant” attribute in the list
of common function attributes at
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes.
What should we do? Define our own, or wait for GCC to publish the attribute?
>>> 
>>> If we decide to do what GCC does, shall we actively participate in
the discussion with the GCC community to shape the attribute itself? Am I right
thinking that the GCC solution is preferred given that system header files are
more likely to follow whatever GCC comes up with?
>>> 
>>> 
>>>> Best regards,
>>>> Alexey Bataev
>>>> 
>>>>> 31 мая 2019 г., в 13:32, Francesco Petrogalli
<Francesco.Petrogalli at arm.com> написал(а):
>>>>> 
>>>>> 
>>>>> 
>>>>>> On May 31, 2019, at 12:00 PM, Alexey Bataev via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
>>>>>> 
>>>>>> Hi Francesco, did you think about adding the attribute
instead of the pragma? It is a common way to express such constructs as function
attributes in clang/GCC rather than as pragma.
>>>>>> 
>>>>> 
>>>>> Yes, I thought about it, I believe that GCC plans to use
attributes.
>>>>> 
>>>>> In
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
there is a description of the “simd” attribute, but I couldn’t find an example
of how it is used.
>>>>> 
>>>>> Also, I cannot find an attribute that is equivalent to the
“declare variant” directive. Maybe it is planned for future releases.
>>>>> 
>>>>> The idea of using a `clang` equivalent of the `omp`
directive was to avoid duplication in terms of handling both the attribute and
the omp directive, as both directive will share much of the infrastructure.
>>>>> 
>>>>>> Best regards,
>>>>>> Alexey Bataev
>>>>>> 
>>>>>>> 31 мая 2019 г., в 12:18, Francesco Petrogalli via
cfe-dev <cfe-dev at lists.llvm.org> написал(а):
>>>>>>> 
>>>>>>> Hi All,
>>>>>>> 
>>>>>>> Thank you for the feedback so far.
>>>>>>> 
>>>>>>> I am replying to all your
questions/concerns/suggestions in this single email. Please let me know if I
have missed any.
>>>>>>> 
>>>>>>> I will update the RFC accordingly to what we end up
deciding here.
>>>>>>> 
>>>>>>> Kind regards,
>>>>>>> 
>>>>>>> Francesco
>>>>>>> 
>>>>>>> 
>>>>>>> # TOPIC 1: concerns about name mangling
>>>>>>> 
>>>>>>> I understand that there are concerns in using the
mangling scheme I proposed, and that it would be preferred to have a mangling
scheme that is based on (and standardized by) OpenMP. I hear the argument on
having some common ground here. In fact, there is already common ground between
the x86 and aarch64 backend, who have based their respective Vector Function ABI
specifications on OpenMP.
>>>>>>> 
>>>>>>> In fact, the mangled name grammar can be summarized
as follows:
>>>>>>> 
>>>>>>>
_ZGV<isa><masking><VLEN><parameter type>_<scalar
name>
>>>>>>> 
>>>>>>> Across vector extensions the only <token>
that will differ is the <isa> token.
>>>>>>> 
>>>>>>> This might lead people to think that we could drop
the _ZGV<isa> prefix and consider the
<masking><VLEN><parameter type>_<scalar name> part as a
sort of unofficial OpenMP mangling scheme: in fact, the signature of an
“unmasked 2-lane vector vector of `sin`” will always be `<2 x double>(2 x
double>).
>>>>>>> 
>>>>>>> The problem with this choice is the number of
vector version available for a target is not unique.
>>>>>>> 
>>>>>>> In particular, the following declaration generates
multiple vector versions, depending on the target:
>>>>>>> 
>>>>>>> #pragma omp declare simd simdlen(2) notinbranch
>>>>>>> double foo(double) {…};
>>>>>>> 
>>>>>>> On x86, this generates at least 4 symbols (one for
SSE, one for AVX, one for AVX2, and one for AVX512:
https://godbolt.org/z/TLYXPi)
>>>>>>> 
>>>>>>> On aarch64, the same declaration generates a unique
symbol, as specified in the Vector Function ABI.
>>>>>>> 
>>>>>>> This means that the attribute (or metadata) that
carries the information on the available vector version needs to deal also with
things that are not usually visible at IR level, but that might still need to be
provided to be able to decide which particular instruction set/ vector extension
needs to be targeted.
>>>>>>> 
>>>>>>> I used an example based on `declare simd` instead
of `declare variant` because the attribute/metadata needed for `declare variant`
is a modification of the one needed for `declare simd`, which has already been
agreed in a previous RFC proposed by Intel [1], and for which Intel has already
provided an implementation [2]. The changes proposed in this RFC are fully
compatible with the work that is being don for the VecClone pass in [2].
>>>>>>> 
>>>>>>> [1]
http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
>>>>>>> [2] VecCLone pass: https://reviews.llvm.org/D22792
>>>>>>> 
>>>>>>> The good news is that as far as AArch64 and x86 are
concerned, the only thing that will differ in the mangled name is the
“<isa>” token. As far as I can tell, the mangling scheme of the rest of
the vector name is the same, therefore a lot of infrastructure in terms of
mangling and demangling can be reused. In fact, the `mangleVectorParameters`
function in
https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could
already be shared among x86 and aarch64.
>>>>>>> 
>>>>>>> TOPIC 2: metadata vs attribute
>>>>>>> 
>>>>>>> From a functionality point of view, I don’t care
whether we use metadata or attributes. The VecClone pass mentioned in TOPIC 1
uses the following:
>>>>>>> 
>>>>>>> attributes #0 = { nounwind uwtable
“vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16”}
>>>>>>> 
>>>>>>> This is an attribute (I though it was metadata?), I
am happy to reword the RFC using the right terminology (sorry for messing this
up).
>>>>>>> 
>>>>>>> Also, @Renato expressed concern that metadata might
be dropped by optimization passes - would using attributes prevent that?
>>>>>>> 
>>>>>>> TOPIC 3: "There is no way to notify the
backend how conformant the SIMD versions are.”
>>>>>>> 
>>>>>>> @Shawn, I am afraid I don’t understand what you
mean by “conformant” here. Can you elaborate with an example?
>>>>>>> 
>>>>>>> TOPIC 3: interaction of the `omp declare variant`
with `clang declare variant`
>>>>>>> 
>>>>>>> I believe this is described in the `Option
behavior, and interaction with OpenMP`. The option `-fclang-declare-variant` is
there to make the OpenMP based one orthogonal. Of course, we might decide to
make -fclang-declare-variant on/off by default, and have default behavior when
interacting with -fopenmp-simd. For the sake of compatibility with other
compilers, we might need to require -fno-clang-declare-variant when targeting
-fopenmp-[simd].
>>>>>>> 
>>>>>>> TOPIC 3: "there are no special arguments /
flags / status regs that are used / changed in the vector version that the
compiler will have to "just know”
>>>>>>> 
>>>>>>> I believe that this concern is raised by the
problem of handling FP exceptions? If that’s the case, the compiler is not
allowed to do any assumption on the vector function about that, and treat it
with the same knowledge of any other function, depending on the visibility it
has in the compilation unit. @Renato, does this answer your question?
>>>>>>> 
>>>>>>> TOPIC 4: attribute in function declaration vs
attribute function call site
>>>>>>> 
>>>>>>> We discussed this in the previous version of the
proposal. Having it in the call sites guarantees that incompatible vector
version are used when merging modules compiled for different targets. I don’t
have a use case for this, if I remember correctly this was asked by @Hideki
Saito. Hideki, any comment on this?
>>>>>>> 
>>>>>>> TOPIC 5: overriding system header (the discussion
on #pragma omp/clang/system variants initiated by @Hal Finkel).
>>>>>>> 
>>>>>>> I though that the split among #pragma clang declare
variant and #pragma omp declare variant was already providing the orthogonality
between system header and user header. Meaning that a user should always prefer
the omp version (for portability to other compilers) instead of the #pragma
clang one, which would be relegated to system headers and headers provided by
the compiler. Am I missing something? If so, I am happy to add a “system”
version of the directive, as it would be quite easy to do given most of the
parsing infrastructure will be shared.
>>>>>>> 
>>>>>>> 
>>>>>>>> On May 30, 2019, at 12:53 PM, Philip Reames
<listmail at philipreames.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>>>> On 5/30/19 9:05 AM, Doerfert,
Johannes wrote:
>>>>>>>>>>>>> On 05/29, Finkel, Hal J.
via cfe-dev wrote:
>>>>>>>>>>>>> On 5/29/19 1:52 PM, Philip
Reames wrote:
>>>>>>>>>>>>> On 5/28/19 7:55 PM, Finkel,
Hal J. wrote:
>>>>>>>>>>>>> On 5/28/19 3:31 PM, Philip
Reames via cfe-dev wrote:
>>>>>>>>>>>>> I generally like the idea
of having support in IR for vectorization of
>>>>>>>>>>>>> custom functions.  I have
several use cases which would benefit from this.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'd suggest a couple of
reframings to the IR representation though.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> First, this should probably
be specified as metadata/attribute on a
>>>>>>>>>>>>> function declaration. 
Allowing the callsite variant is fine, but it
>>>>>>>>>>>>> should primarily be a
property of the called function, not of the call
>>>>>>>>>>>>> site.  Being able to
specify it once per declaration is much cleaner.
>>>>>>>>>>>> I agree. We should support this
both on the function declaration and on
>>>>>>>>>>>> the call sites.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> Second, I really don't
like the mangling use here.  We need a better way
>>>>>>>>>>>>> to specify the properties
of the function then it's mangled name.  One
>>>>>>>>>>>>> thought to explore is to
directly use the Value of the function
>>>>>>>>>>>>> declaration (since this is
metadata and we can do that), and then tie
>>>>>>>>>>>>> the properties to the
function declaration in some way?  Sorry, I don't
>>>>>>>>>>>>> really have a specific
suggestion here.
>>>>>>>>>>>> Is the problem the mangling or
the fact that the mangling is
>>>>>>>>>>>> ABI/target-specific? One option
is to use LLVM's mangling scheme (the
>>>>>>>>>>>> one we use for intrinsics) and
then provide some backend infrastructure
>>>>>>>>>>>> to translate later.
>>>>>>>>>>> Well, both honestly.  But mangling
with a non-target specific scheme is
>>>>>>>>>>> a lot better, so I might be okay
with that.   Good idea.
>>>>>>>>>> 
>>>>>>>>>> I liked your idea of directly encoding
the signature in the metadata,
>>>>>>>>>> but I think that we want to continue to
use attributes, and not
>>>>>>>>>> metadata, and the options for
attributes seem more limited - unless we
>>>>>>>>>> allow attributes to take metadata
arguments - maybe that's an
>>>>>>>>>> enhancement worth considering.
>>>>>>>>> I recently talked to people in the OpenMP
language committee meeting
>>>>>>>>> about this and, thinking forward to the
actual implementation/use of the
>>>>>>>>> OpenMP 5.x declare variant feature, I'd
say:
>>>>>>>>> 
>>>>>>>>> - We will need a mangling scheme if we want
to allow variants on
>>>>>>>>> declarations that are defined elsewhere.
>>>>>>>>> - We will need a (OpenMP) standardized
mangling scheme if we want
>>>>>>>>> interoperability between compilers.
>>>>>>>>> 
>>>>>>>>> I assume we want both so I think we will
need both.
>>>>>>>> If I'm reading this correctly, this
describes a need for the frontend to
>>>>>>>> have a mangling scheme.  Nothing in here would
seem to prevent the
>>>>>>>> frontend for generating a declaration for a
mangled external symbol and
>>>>>>>> then referencing that declaration.  Am I
missing something?
>>>>>>>>> 
>>>>>>>>> That said, I think this should allow us to
avoid attributes/metadata
>>>>>>>>> which seems to me like a good thing right
now.
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> Johannes
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>>>>> On 5/28/19 12:44 PM,
Francesco Petrogalli via llvm-dev wrote:
>>>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This RFC is a proposal
to provide auto-vectorization functionality for user provided vector functions.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The proposal is a
modification of an RFC that I have sent out a couple of months ago, with the
title `[RFC] Re-implementing -fveclib with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The original RFC was
proposing to re-implement the `-fveclib` command line option. This proposal
avoids that, and limits its scope to the mechanics of providing vector function
in user code that the compiler can pick up for auto-vectorization. This narrower
scope limits the impact of changes that are needed in both clang and LLVM.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Please let me know what
you think.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Francesco
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>
================================================================================>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Introduction
>>>>>>>>>>>>>>
===========>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This RFC encompasses
the proposal of informing the vectorizer about the
>>>>>>>>>>>>>> availability of vector
functions provided by the user. The mechanism is
>>>>>>>>>>>>>> based on the use of the
directive `declare variant` introduced in OpenMP
>>>>>>>>>>>>>> 5.0 [^1].
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The mechanism proposed
has the following properties:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1.  Decouples the
compiler front-end that knows about the availability
>>>>>>>>>>>>>> of vectorized routines,
from the back-end that knows how to make use
>>>>>>>>>>>>>> of them.
>>>>>>>>>>>>>> 2.  Enable support for
a developer's own vector libraries without
>>>>>>>>>>>>>> requiring changes to
the compiler.
>>>>>>>>>>>>>> 3.  Enables other
frontends (e.g. f18) to add scalar-to-vector function
>>>>>>>>>>>>>> mappings as relevant
for their own runtime libraries, etc.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The implemetation
consists of two separate sets of changes.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The first set is a set
o changes in `llvm`, and consists of:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1.  [Changes in LLVM
IR](#llvmIR) to provide information about the
>>>>>>>>>>>>>> availability of
user-defined vector functions via metadata attached
>>>>>>>>>>>>>> to an `llvm::CallInst`.
>>>>>>>>>>>>>> 2.  [An
infrastructure](#infrastructure) that can be queried to retrive
>>>>>>>>>>>>>> information about the
available vector functions associated to a
>>>>>>>>>>>>>> `llvm::CallInst`.
>>>>>>>>>>>>>> 3.  [Changes in the
LoopVectorizer](#LV) to use the API to query the
>>>>>>>>>>>>>> metadata.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The second set consists
of the changes [changes in clang](#clang) that
>>>>>>>>>>>>>> are needed too to
recognize the `#pragma clang declare variant`
>>>>>>>>>>>>>> directive.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Proposed changes
>>>>>>>>>>>>>>
===============>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We propose an
implementation that uses `#pragma clang declare variant`
>>>>>>>>>>>>>> to inform the backend
components about the availability of vector
>>>>>>>>>>>>>> version of scalar
functions found in IR. The mechanism relies in storing
>>>>>>>>>>>>>> such information in IR
metadata, and therefore makes the
>>>>>>>>>>>>>> auto-vectorization of
function calls a mid-end (`opt`) process that is
>>>>>>>>>>>>>> independent on the
front-end that generated such IR metadata.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This implementation
provides a generic mechanism that the users of the
>>>>>>>>>>>>>> LLVM compiler will be
able to use for interfacing their own vector
>>>>>>>>>>>>>> routines for generic
code.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The implementation can
also expose vectorization-specific descriptors --
>>>>>>>>>>>>>> for example, like the
`linear` and `uniform` clauses of the OpenMP
>>>>>>>>>>>>>> `declare simd`
directive -- that could be used to finely tune the
>>>>>>>>>>>>>> automatic vectorization
of some functions (think for example the
>>>>>>>>>>>>>> vectorization of
`double sincos(double , double *, double *)`, where
>>>>>>>>>>>>>> `linear` can be used to
give extra information about the memory layout
>>>>>>>>>>>>>> of the 2 pointers
parameters in the vector version).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The directive `#pragma
clang declare variant` follows the syntax of the
>>>>>>>>>>>>>> `#pragma omp declare
variant` directive of OpenMP.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We define the new
directive in the `clang` namespace instead of using
>>>>>>>>>>>>>> the `omp` one of OpenMP
to allow the compiler to perform
>>>>>>>>>>>>>> auto-vectorization
outside of an OpenMP SIMD context.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The mechanism is base
on OpenMP to provide a uniform user experience
>>>>>>>>>>>>>> across the two
mechanism, and to maximise the number of shared
>>>>>>>>>>>>>> components of the
infrastructure needed in the compiler frontend to
>>>>>>>>>>>>>> enable the feature.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Changes in LLVM IR
{#llvmIR}
>>>>>>>>>>>>>> ------------------
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The IR is enriched with
metadata that details the availability of vector
>>>>>>>>>>>>>> versions of an
associated scalar function. This metadata is attached to
>>>>>>>>>>>>>> the call site of the
scalar function.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The metadata takes the
form of an attribute containing a comma separated
>>>>>>>>>>>>>> list of vector function
mappings. Each entry has a unique name that
>>>>>>>>>>>>>> follows the Vector
Function ABI[^2] and real name that is used when
>>>>>>>>>>>>>> generating calls to
this vector function.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>
vfunc_name1(real_name1), vfunc_name2(real_name2)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The Vector Function ABI
name describes the signature of the vector
>>>>>>>>>>>>>> function so that
properties like vectorisation factor can be queried
>>>>>>>>>>>>>> during compilation.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The `(real name)` token
is optional and assumed to match the Vector
>>>>>>>>>>>>>> Function ABI name when
omitted.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> For example, the
availability of a 2-lane double precision `sin`
>>>>>>>>>>>>>> function via SVML when
targeting AVX on x86 is provided by the following
>>>>>>>>>>>>>> IR.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> // ...
>>>>>>>>>>>>>> ... = call double
@sin(double) #0
>>>>>>>>>>>>>> // ...
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> #0 = { vector-variant =
{"_ZGVcN2v_sin(__svml_sin2),
>>>>>>>>>>>>>>                        
_ZGVdN4v_sin(__svml_sin4),
>>>>>>>>>>>>>>                        
..."} }
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The string
`"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant
>>>>>>>>>>>>>> attribute provides
information on the shape of the vector function via
>>>>>>>>>>>>>> the string
`_ZGVcN2v_sin`, mangled according to the Vector Function ABI
>>>>>>>>>>>>>> for Intel, and remaps
the standard Vector Function ABI name to the
>>>>>>>>>>>>>> non-standard name
`__svml_sin2`.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This metadata is
compatible with the proposal "Proposal for function
>>>>>>>>>>>>>> vectorization and loop
vectorization with function calls",[^3] that uses
>>>>>>>>>>>>>> Vector Function ABI
mangled names to inform the vectorizer about the
>>>>>>>>>>>>>> availability of vector
functions. The proposal extends the original by
>>>>>>>>>>>>>> allowing the explicit
mapping of the Vector Function ABI mangled name to
>>>>>>>>>>>>>> a non-standard name,
which allows the use of existing vector libraries.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The `vector-variant`
attribute needs to be attached on a per-call basis
>>>>>>>>>>>>>> to avoid conflicts when
merging modules with different vector variants.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The query
infrastructure: SVFS {#infrastructure}
>>>>>>>>>>>>>>
------------------------------
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The Search Vector
Function System (SVFS) is constructed from an
>>>>>>>>>>>>>> `llvm::Module` instance
so it can create function definitions. The SVFS
>>>>>>>>>>>>>> exposes an API with two
methods.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ###
`SVFS::isFunctionVectorizable`
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This method queries the
avilability of a vectorized version of a
>>>>>>>>>>>>>> function. The signature
of the method is as follows.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> bool
isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap Params);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The method determine
the availability of vector version of the function
>>>>>>>>>>>>>> invoked by the `Call`
parameter by looking at the `vector-variant`
>>>>>>>>>>>>>> metadata.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The `Params` argument
is a map that associates the position of a
>>>>>>>>>>>>>> parameter in the
`CallInst` to its `ParameterType` descriptor. The
>>>>>>>>>>>>>> `ParameterType`
descriptor holds information about the shape of the
>>>>>>>>>>>>>> correspondend parameter
in the signature of the vector function. This
>>>>>>>>>>>>>> `ParamaterType` is used
to query the SVMS about the availability of
>>>>>>>>>>>>>> vector version that
have `linear`, `uniform` or `align` parameters (in
>>>>>>>>>>>>>> the sense of OpenMP 4.0
and onwards).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The method
`isFunctionVectorizable`, when invoked with an empty
>>>>>>>>>>>>>> `ParTypeMap`, is
equivalent to the `TargetLibraryInfo` method
>>>>>>>>>>>>>>
`isFunctionVectorizable(StrinRef Name)`.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ###
`SVFS::getVectorizedFunction`
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This method returns the
vector function declaration that correspond to
>>>>>>>>>>>>>> the needs of the
vectorization technique that is being run.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The signature of the
function is as follows.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>
std::pair<llvm::FunctionType *, std::string> getVectorizedFunction(
>>>>>>>>>>>>>> llvm::CallInst * Call,
unsigned VF, bool IsMasked, ParTypeSet Params);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The `Call` parameter is
the call instance that is being vectorized, the
>>>>>>>>>>>>>> `VF` parameter
represent the vectorization factor (how many lanes), the
>>>>>>>>>>>>>> `IsMasked` parameter
decides whether or not the signature of the vector
>>>>>>>>>>>>>> function is required to
have a mask parameter, the `Params` parameter
>>>>>>>>>>>>>> describes the shape of
the vector function as in the
>>>>>>>>>>>>>>
`isFunctionVectorizable` method.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The methods uses the
`vector-variant` metadata and returns the function
>>>>>>>>>>>>>> signature and the name
of the function based on the input parameters.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The SVFS can add new
function definitions, in the same module as the
>>>>>>>>>>>>>> `Call`, to provide
vector functions that are not present within the
>>>>>>>>>>>>>> vector-variant
metadata. For example, if a library provides a vector
>>>>>>>>>>>>>> version of a function
with a vectorization factor of 2, but the
>>>>>>>>>>>>>> vectorizer is
requesting a vectorization factor of 4, the SVFS is
>>>>>>>>>>>>>> allowed to create a
definition that calls the 2-lane version twice. This
>>>>>>>>>>>>>> capability applies
similarly for providing masked and unmasked versions
>>>>>>>>>>>>>> when the request does
not match what is available in the library.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This method is
equivalent to the TLI method
>>>>>>>>>>>>>> `StringRef
getVectorizedFunction(StringRef F, unsigned VF) const;`.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Notice that to fully
support OpenMP vectorization we need to think about
>>>>>>>>>>>>>> a fuzzy matching
mechanism that is able to select a candidate in the
>>>>>>>>>>>>>> calling context.
However, this proposal is intended for scalar-to-vector
>>>>>>>>>>>>>> mappings of math-like
functions that are most likely to associate a
>>>>>>>>>>>>>> unique vector candidate
in most contexts. Therefore, extending this
>>>>>>>>>>>>>> behavior to a generic
one is an aspect of the implementation that will
>>>>>>>>>>>>>> be treated in a
separate RFC about the vectorization pass.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ### Scalable
vectorization
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Both methods of the
SVFS API will be extended with a boolean parameter
>>>>>>>>>>>>>> to specify whether
scalable signatures are needed by the user of the
>>>>>>>>>>>>>> SVFS.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Changes in clang
{#clang}
>>>>>>>>>>>>>> ----------------
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We use clang to
generate the metadata described above.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In the compilation
unit, the vector function definition or declaration
>>>>>>>>>>>>>> must be visible and
associated to the scalar version via the
>>>>>>>>>>>>>> `#pragma clang declare
variant` according to the rule defined by the
>>>>>>>>>>>>>> correspondent `#pragma
omp declare variant` defined in OpenMP 5.0, as in
>>>>>>>>>>>>>> the following example.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> #pragma clang declare
variant(vector_sinf) \
>>>>>>>>>>>>>>
match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})
>>>>>>>>>>>>>> extern float
sinf(float);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> float32x4_t
vector_sinf(float32x4_t x);
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The `construct` set in
the directive, together with the `device` set, is
>>>>>>>>>>>>>> used to generate the
vector mangled name to be used in the
>>>>>>>>>>>>>> `vector-variant`
attribute, for example `_ZGVnN2v_sin`, when targeting
>>>>>>>>>>>>>> AArch64 Advanced SIMD
code generation. The rule for mangling the name of
>>>>>>>>>>>>>> the scalar function in
the vector name are defined in the the Vector
>>>>>>>>>>>>>> Function ABI
specification of the target.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The part of the
vector-variant attribute that redirects the call to
>>>>>>>>>>>>>> `vector_sinf` is
derived from the `variant-id` specified in the
>>>>>>>>>>>>>> `variant` clause.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Summary
>>>>>>>>>>>>>>
======>>>>>>>>>>>>>>
>>>>>>>>>>>>>> New `clang` directive
in clang
>>>>>>>>>>>>>>
------------------------------
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> `#pragma omp declare
variant`, same as `#pragma omp declare variant`
>>>>>>>>>>>>>> restricted to the
`simd` context selector, from OpenMP 5.0+.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Option behavior, and
interaction with OpenMP
>>>>>>>>>>>>>>
--------------------------------------------
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The behavior described
below makes sure that
>>>>>>>>>>>>>> `#pragma cland declare
variant` function vectorization and OpenMP
>>>>>>>>>>>>>> function vectorization
are orthogonal.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>
`-fclang-declare-variant`
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> :   The `#pragma clang
declare variant` directives are parsed and used
>>>>>>>>>>>>>> to populate the
`vector-variant` attribute.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> `-fopenmp[-simd]`
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> :   The `#pragma omp
declare variant` directives are parsed and used to
>>>>>>>>>>>>>> populate the
`vector-variant` attribute.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> `-fopenmp[-simd]`and
`-fno-clang-declare-variant`
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> :   The directive
`#pragma omp declare variant` is used to populate the
>>>>>>>>>>>>>> `vector-variant`
attribute in IR. The directive
>>>>>>>>>>>>>> `#pragma   clang
declare variant` are ignored.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [^1]:
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [^2]: Vector Function
ABI for x86:
>>>>>>>>>>>>>>
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
>>>>>>>>>>>>>> Vector Function ABI for
AArch64:
>>>>>>>>>>>>>>
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [^3]:
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>> LLVM Developers mailing
list
>>>>>>>>>>>>>> llvm-dev at
lists.llvm.org
>>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>> -- 
>>>>>>>>>> Hal Finkel
>>>>>>>>>> Lead, Compiler Technology and
Programming Languages
>>>>>>>>>> Leadership Computing Facility
>>>>>>>>>> Argonne National Laboratory
>>>>>>>>>> 
>>>>>>>>>>
_______________________________________________
>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> cfe-dev at lists.llvm.org
>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>> 
>>> 
>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - May 2019 - [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

Possibly Parallel Threads