thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization. [Jun 2019]

If this information is useful, please help other people find it:
Share via:

Saito, Hideki via llvm-dev

2019-Jun-01 00:54 UTC

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

Page 22 of OpenMP 5.0 specification (Lines 13/14):

	When any thread encounters a simd construct, the iterations of the loop
associated with the
	construct may be executed concurrently using the SIMD lanes that are available
to the thread

This is the Execution Model. The word here is "may" i.e., not
"must".  Declare simd is not explicitly mentioned here, but requiring
vectorization in declare simd when the caller simd construct may not be
vectorized is odd. Having said that, ICC implementation is "work extremely
hard to vectorize", and we are proud of it. Our recommendation is do the
same as ICC (i.e., requires a lot of beefing up in LoopVectorize) and I think
that's what you are asking. If so, we are aligned.

If anyone strongly goes against that idea (i.e., anyone wanting to keep OpenMP
simd as just an optimization hint for auto-vectorizer), please speak up.
>if you think there is a problem to reuse that for OpenCL/SYCL, let us know.
Sure.

-----Original Message-----
From: Doerfert, Johannes [mailto:jdoerfert at anl.gov] 
Sent: Friday, May 31, 2019 4:58 PM
To: Saito, Hideki <hideki.saito at intel.com>
Cc: Francesco Petrogalli <Francesco.Petrogalli at arm.com>; Philip Reames
<listmail at philipreames.com>; Finkel, Hal J. <hfinkel at anl.gov>;
LLVM Development List <llvm-dev at lists.llvm.org>; nd <nd at
arm.com>; Clang Dev <cfe-dev at lists.llvm.org>; scogland1 at llnl.gov
Subject: Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided vector function for
auto-vectorization.

On 05/31, Saito, Hideki wrote:> 
> >Is this also the case if the user did require lock-step semantic for
the code to be correct?
> 
> Certainly not, but that part is actually beyond OpenMP specification.
> I suggest looking up ICC's "#pragma simd assert" description
and see
> if the assert feature is something you may be interested in seeing as 
> an extended part of LLVM implementation of OpenMP (declare) simd.
> Else, vectorization report would tell you whether it was vectorized or 
> not.
Wait, why do you think it is beyond the OpenMP specification? If an OpenMP
variant is picked based on the context the user should be able to assume the
context requirements are fulfilled. If we agree on that, I don't see how we
can do anything else than emitting an error if we optimistically picked a vector
variant but failed to vectorize. I don't think that precludes the approach
you've taken though.
 > >How does OpenCL/SYCL play in this now?
> 
> Not right now, when we are working to get OpenMP stuff going --- 
> except that I don't think we need to change the design (e.g., on 
> function attribute, VecClone direction, etc.) in the future for those 
> or similar languages.
OK. Whatever is considered here, if you think there is a problem to reuse that
for OpenCL/SYCL, let us know.

> -----Original Message-----
> From: Doerfert, Johannes [mailto:jdoerfert at anl.gov]
> Sent: Friday, May 31, 2019 4:16 PM
> To: Saito, Hideki <hideki.saito at intel.com>
> Cc: Francesco Petrogalli <Francesco.Petrogalli at arm.com>; Philip
Reames
> <listmail at philipreames.com>; Finkel, Hal J. <hfinkel at
anl.gov>; LLVM
> Development List <llvm-dev at lists.llvm.org>; nd <nd at
arm.com>; Clang Dev
> <cfe-dev at lists.llvm.org>; scogland1 at llnl.gov
> Subject: Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided vector
function for auto-vectorization.
> 
> On 05/31, Saito, Hideki wrote:
> > 
> > >VectorClone does more than just mapping a scalar version to a
vector one. It builds also the vector version definition by auto-vectorizing the
body of the scalar function.
> > [...]
> > The code is still fully functional w/o LoopVectorize vectorizing that
loop.
> 
> Is this also the case if the user did require lock-step semantic for the
code to be correct?
> 
> > >I don’t know if the patches related to VecClone also are intended
to use the `vector-variant` attribute for function declaration with a #pragma
omp declare simd.
> > 
> > VecClone predated #pragma omp declare variant. So that patches 
> > doesn’t know about declare variant. VecClone was written for handling
#pragma omp declare simd, as described above. OpenCL/SYCL kernel is similar
enough to OpenMP declare simd. Most code can be reused.
> 
> How does OpenCL/SYCL play in this now?
> 
> 
> > -----Original Message-----
> > From: Francesco Petrogalli [mailto:Francesco.Petrogalli at arm.com]
> > Sent: Friday, May 31, 2019 3:06 PM
> > To: Doerfert, Johannes <jdoerfert at anl.gov>
> > Cc: Philip Reames <listmail at philipreames.com>; Finkel, Hal J.
> > <hfinkel at anl.gov>; LLVM Development List <llvm-dev at
lists.llvm.org>;
> > nd <nd at arm.com>; Saito, Hideki <hideki.saito at
intel.com>; Clang Dev
> > <cfe-dev at lists.llvm.org>; scogland1 at llnl.gov
> > Subject: Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided vector
function for auto-vectorization.
> > 
> > 
> > 
> > > On May 31, 2019, at 2:56 PM, Doerfert, Johannes <jdoerfert at
anl.gov> wrote:
> > > 
> > > I think I did misunderstand what you want to do with attributes. 
> > > This is my bad. Let me try to explain:
> > > 
> > > It seems you want the "vector-variants" attributes
(which I could
> > > not find with this name in trunk, correct?) to
"remember" what
> > > vector versions can be created (wrt. validity), assuming a 
> > > definition is available? Correct?
> > 
> > Yes.
> > 
> > > What I was concerned with is the example I sketched somewhere 
> > > below which motivates the need for a generalized/standardized
name
> > > mangling for OpenMP. I though you wanted to avoid that somehow
but
> > > if you don't I misunderstood you. I basically removed the
part
> > > where the vector versions have to be created first but I assumed 
> > > them to be existent (in the module or somewhere else). That is, I
> > > assumed a call to foo and various symbols available that are 
> > > specializations of foo. When we then vectorize foo (or otherwise 
> > > specialize at some point in the future), you would scan the
module
> > > and pick the best match based on the context of the call.
> > > 
> > 
> > Yes, although the syntax you use below is wrong. Declare variant is
attached to the scalar definition, and points to a vector definitions (the
variant) that is declared/defined in the same compilation unit where the scalar
version is visible.
> > 
> > 
> > > Now I don't know if I understood your proposal by now but let
me
> > > ask a question anyway:
> > > 
> > > VecClone.cpp:276-278 mentions that the vectorizer is supposed to 
> > > look at the vector-variants functions. This works for variants 
> > > that are created from definitions in the module but what about 
> > > #omp declare simd declarations?
> > > 
> > 
> > VectorClone does more than just mapping a scalar version to a vector
one. It builds also the vector version definition by auto-vectorizing the body
of the scalar function.
> > 
> > I don’t know if the patches related to VecClone also are intended to
use the `vector-variant` attribute for function declaration with a #pragma omp
declare simd. On aarch64, in Arm compiler for HPC, we do that to support vector
math libraries. It works in principle, but `vector variant` allows more context
selection (and custom names instead of vector ABI names, which are easier for
users).
> > 
> > 
> > > 
> > > On 05/31, Francesco Petrogalli wrote:
> > >>> On May 31, 2019, at 11:47 AM, Doerfert, Johannes
<jdoerfert at anl.gov> wrote:
> > >>> 
> > >>> I think we should split this discussion:
> > >>> TOPIC 1 & 2 & 4: How do implement all use cases
and OpenMP 5.X
> > >>>                  features, including compatibility with
other
> > >>>                  compilers and cross module support.
> > >> 
> > >> Yes, and we have to carefully make this as standard and
compatible as possible.
> > > 
> > > Agreed.
> > > 
> > > 
> > >>> TOPIC 3b & 5: Interoperability with clang declare
(system vs. user
> > >>>                declares)
> > >> 
> > >> 
> > >> I think that Alexey explanation of how the directive are
handled
> > >> internally in the frontend makes us propound towards the
attribute.
> > > 
> > > How things are handled right now, especially given that declare 
> > > variant is not handled at all, should not limit our design space.
> > > If the argument is that we cannot reasonably implement a
solution,
> > > that is a different story.
> > > 
> > > 
> > >>> TOPIC 3a & 3c: floating point issues?
> > >>> 
> > >> 
> > >> I believe there is no issue there. I have quoted the openMP
standard in reply to Renato:
> > >> 
> > >> See
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page
118, lines 23-24:
> > >> 
> > >> “The execution of the function or subroutine cannot have any
side
> > >> effects that would alter its execution for concurrent
iterations
> > >> of a SIMD chunk."
> > > 
> > > Great.
> > > 
> > > 
> > >>> I inlined comments for Topic 1 below.
> > >>> 
> > >>> I hope that we do not have to discuss topic 2 if we agree
> > >>> neither attributes nor metadata is necessary, or better,
will
> > >>> solve the actual problem at hand. I don't have strong
feeling on
> > >>> topic 4 but I have the feeling this will become less
problematic once we figure out topic 1.
> > >>> 
> > >>> Thanks,
> > >>> Johannes
> > >>> 
> > >>> 
> > >>> On 05/31, Francesco Petrogalli wrote:
> > >>>> # TOPIC 1: concerns about name mangling
> > >>>> 
> > >>>> I understand that there are concerns in using the
mangling
> > >>>> scheme I proposed, and that it would be preferred to
have a
> > >>>> mangling scheme that is based on (and standardized
by) OpenMP.
> > >>> 
> > >>> I still think it will be required to have a standardized
one,
> > >>> not only preferred.
> > >>> 
> > >>> 
> > >> 
> > >> I am all with you in standardizing. x86 and arch64 have their
own
> > >> vector function ABI, which, although “private”, are to be 
> > >> considered standard. Opensource and commercial compilers are 
> > >> using them, therefore we have to deal with this mangling
scheme,
> > >> whether or not OpenMP comes up with a standard mangling
scheme.
> > > 
> > > I don't get the point you are trying to make here. What do
you
> > > mean by "we have to deal with"? (I do not suggest to
get rid of
> > > them.)
> > > 
> > 
> > That we cannot ignore the fact that the name scheme is already
standardized by the vendors, so let’s first deal with what we have, and think
about the OpenMP mangling scheme only once there is one available.
> > 
> > > 
> > >>>> I hear the argument on having some common ground
here. In fact,
> > >>>> there is already common ground between the x86 and
aarch64
> > >>>> backend, who have based their respective Vector
Function ABI specifications on OpenMP.
> > >>>> 
> > >>>> In fact, the mangled name grammar can be summarized
as follows:
> > >>>> 
> > >>>>
_ZGV<isa><masking><VLEN><parameter type>_<scalar
name>
> > >>>> 
> > >>>> Across vector extensions the only <token> that
will differ is
> > >>>> the <isa> token.
> > >>>> 
> > >>>> This might lead people to think that we could drop
the
> > >>>> _ZGV<isa> prefix and consider the
<masking><VLEN><parameter
> > >>>> type>_<scalar
> > >>>> name> part as a sort of unofficial OpenMP mangling
scheme: in
> > >>>> name> fact,
> > >>>> the signature of an “unmasked 2-lane vector vector of
`sin`”
> > >>>> will always be `<2 x double>(2 x double>).
> > >>>> 
> > >>>> The problem with this choice is the number of vector
version
> > >>>> available for a target is not unique.
> > >>> 
> > >>> For me, this simply means this mangling scheme is not
sufficient.
> > >>> 
> > >> 
> > >> Can you explain more why you think the mangling scheme is not
> > >> sufficient? The mangling scheme is shaped to provide all the 
> > >> information that the OpenMP directive describes.
> > > 
> > > I don't know if it is insufficient but I though you hinted
towards that.
> > 
> > I didn’t mean that, the tokens in the vector function ABI mangled
schemes are sufficient.
> > 
> > > If we can handle/decode everything we need for declare variants 
> > > then I do not object at all. If not, we require respective 
> > > extension such that we can. The result should be a superset of
the
> > > current SIMD encoding and compatible with the current one.
> > > 
> > > 
> > 
> > We can handle/decode everything for a SIMD context. :)
> > 
> > 
> > > 
> > >> The fact that x86 and aarch64 realize such information in 
> > >> different way (multiple signature/vector extensions) is
something
> > >> that cannot be avoided, because it is related to
architectural
> > >> aspects that are specific to the vector extension and
transparent
> > >> to the OpenMP standard.
> > > 
> > > I don't think that is a problem (that's why I
"failed to see the
> > > problem" in the comment below). I look at it this way: If
#declare
> > > simd, or similar, results in N variants, it should at the end of 
> > > the day not be different from declaring these N variants 
> > > explicitly with the respective declare variant match clause.
> > > 
> > 
> > That’s not the case. #declare simd should create all the versions that
are optimal for the target. We carefully thoght about that when writing the
vector function ABI. Most of the constrains derive by the fact that each target
has a specific register size.
> > 
> > Example:
> > 
> > #pragma omp declare simd
> > Float foo(float);
> > 
> > X86 -> 8 version {2, 4, 8, 16 lanes} x {masking, no masking}, see 
> > https://godbolt.org/z/m1BUVt Arm NEON: -> 4 versions {2, 4 lanes} x
> > {masking, no masking } Arm SVE: -> 1 version
> > 
> > Therefore, the outcome of declare simd is not target independent. Your
expectation are met only inside one target.
> > 
> > 
> > > 
> > >>>> In particular, the following declaration generates
multiple
> > >>>> vector versions, depending on the target:
> > >>>> 
> > >>>> #pragma omp declare simd simdlen(2) notinbranch
double
> > >>>> foo(double) {…};
> > >>>> 
> > >>>> On x86, this generates at least 4 symbols (one for
SSE, one for
> > >>>> AVX, one for AVX2, and one for AVX512:
> > >>>> https://godbolt.org/z/TLYXPi)
> > >>>> 
> > >>>> On aarch64, the same declaration generates a unique
symbol, as
> > >>>> specified in the Vector Function ABI.
> > >>> 
> > >>> I fail to see the problem. We generate X symbols for X
different
> > >>> contexts. Once we get to the point where we vectorize, we
> > >>> determine which context fits best and choose the
corresponding symbol version.
> > >>> 
> > >> 
> > >> Yes, this is exactly what we need to do, under the constrains
> > >> that the rules for  generating "X symbols for X
different
> > >> contexts” are decided by the Vector Function ABI of the
target.
> > > 
> > > Sounds good. The vector ABI is used to determine what contexts 
> > > exists and what symbols should be created. I would assume the 
> > > encoding should be the same as if we specified the versions
> > > (/contexts) ourselves via #declare variant.
> > > 
> > 
> > Oh yes, vector functions listed in a declare variant should obey the
vector function ABI rules (other than the function name).
> > 
> > > 
> > >>> Maybe my view is to naive here, please feel free to
correct me.
> > >>> 
> > >>> 
> > >>>> This means that the attribute (or metadata) that
carries the
> > >>>> information on the available vector version needs to
deal also
> > >>>> with things that are not usually visible at IR level,
but that
> > >>>> might still need to be provided to be able to decide
which
> > >>>> particular instruction set/ vector extension needs to
be targeted.
> > >>> 
> > >>> The symbol names should carry all the information we
need. If
> > >>> they do not, we need to improve the mangling scheme such
that they do.
> > >>> There is no attributes/metadata we could use at library
boundaries.
> > >>> 
> > >> Hum, I am not sure what you mean by "There is no 
> > >> attributes/metadata we could use at library boundaries."
> > > 
> > > (This seems to be part of the misunderstanding, I leave my
comment
> > > here
> > > anyway:)
> > > 
> > > The simd-related stuff works because it is a uniform mangling 
> > > scheme used by all compilers. Take the situation below in which I
> > > think we want to call foo_CTX in the library. If so, we need a
name for it.
> > > 
> > 
> > In the situation below, the mangled name is going to be the same for
both compilers, as long as they adhere to the vector function ABI.
> > 
> > > 
> > > a.c:  // Compiled by gcc into a library #omp declare variant
(foo)
> > > match(CTX) void foo_CTX(...) {...}
> > > 
> > > b.c:  // Compiled by clang linked against the library above.
> > > #omp declare variant (foo) match(CTX) void foo_CTX(...);
> > > 
> > > void bar(...) {
> > >  #pragma omp CTX
> > >  foo();   // <- What function (symbol) do we call if a.c was
compiled
> > >           //    by gcc and b.c with clang?
> > > }
> > > 
> > 
> > Please notice that `declare variant` needs to be attached to the
scalar function, not the vector one.
> > 
> > ```
> > #pragma omp declare variant(foo_CTX) match (context=simd… double foo
> > (double) {…}
> > 
> > Vector_double_ty foo_CTX(vector_double_ty) {…} ```
> > 
> > In vectorizing foo in bar, the compiler will not care where foo_CTX
would come from (of course, as long as the scalar+declare variant declarations
are visible).
> > 
> > >> In our downstream compiler (Arm compiler for HPC, based on
LLVM),
> > >> we use `declare simd` to provide vector math functions via
custom
> > >> header file. It works brilliantly, if not for specific
aspects
> > >> that would be perfectly covered by the `declare variant`,
which
> > >> might be one of the reason why the OpenMP committee decided
to
> > >> introduce `declare variant`.
> > > 
> > > But you (assume that you) control the mangling scheme across the 
> > > entire infrastructure. Given that the simd mangling is de-facto 
> > > standardized, that works.
> > > 
> > > Side note:
> > > Declare variant, as of 5.0, is not flexible enough for a sensible
> > > inclusion of target specific headers. That will change in 5.1.
> > > 
> > 
> > Could you point me at the discussion in 5.1 on this specific aspect?
> > 
> > 
> > > 
> > >> If your concerns is that by adding an attribute that somehow 
> > >> represent something that is available in an external library
is
> > >> not enough to guarantee that that symbol is available in the 
> > >> library… not even C code can guarantee that? If the linker is
not
> > >> pointing to the right library, there is nothing that can
prevent
> > >> it to fail if the symbol is not present?
> > > 
> > > I don't follow the example you describe. I don't want to
change
> > > anything in how symbols are looked up or what happens if they are
missing.
> > > 
> > > 
> > 
> > I don’t want to change that too :). I think we are misunderstanding
each other here...
> > 
> > >>>> I used an example based on `declare simd` instead of
`declare
> > >>>> variant` because the attribute/metadata needed for
`declare
> > >>>> variant` is a modification of the one needed for
`declare
> > >>>> simd`, which has already been agreed in a previous
RFC proposed
> > >>>> by Intel [1], and for which Intel has already
provided an
> > >>>> implementation [2]. The changes proposed in this RFC
are fully
> > >>>> compatible with the work that is being don for the
VecClone pass in [2].
> > >>>> 
> > >>>> [1]
> > >>>>
http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
> > >>>> [2] VecCLone pass: https://reviews.llvm.org/D22792
> > >>> 
> > >>> Having an agreed upon mangling for the older feature is
not
> > >>> necessarily important here. We will need more
functionality for
> > >>> variants and keeping the old scheme around with some
metadata is
> > >>> not an extensible long-term solution. So, I would not try
to fit
> > >>> variants into the existing simd-scheme but instead do it
the
> > >>> other way around. We define what we need for variants and
implement simd in that scheme.
> > >>> 
> > >> 
> > >> I kinda think that having agreed on something is important.
It
> > >> allows to build other things on top of what have been agreed 
> > >> without breaking compatibility.
> > >> 
> > >> On the specific, which are the new functionalities needed for
the
> > >> variants that would make the current metadata (attributes)
for
> > >> declare simd non extensible?
> > > 
> > > See first comment.
> > > 
> > >>>> The good news is that as far as AArch64 and x86 are
concerned, the only thing that will differ in the mangled name is the
“<isa>” token. As far as I can tell, the mangling scheme of the rest of
the vector name is the same, therefore a lot of infrastructure in terms of
mangling and demangling can be reused. In fact, the `mangleVectorParameters`
function in
https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could
already be shared among x86 and aarch64.
> > >>>> 
> > >>>> TOPIC 2: metadata vs attribute
> > >>>> 
> > >>>> From a functionality point of view, I don’t care
whether we use metadata or attributes. The VecClone pass mentioned in TOPIC 1
uses the following:
> > >>>> 
> > >>>> attributes #0 = { nounwind uwtable 
> > >>>>
“vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8v
> > >>>> v_
> > >>>> ve
> > >>>>
c_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZG
> > >>>> Ve
> > >>>> M1
> > >>>> 6vv_vec_sum,_ZGVeN16”}
> > >>>> 
> > >>>> This is an attribute (I though it was metadata?), I
am happy to reword the RFC using the right terminology (sorry for messing this
up).
> > >>>> 
> > >>>> Also, @Renato expressed concern that metadata might
be dropped by optimization passes - would using attributes prevent that?
> > >>>> 
> > >>>> TOPIC 3: "There is no way to notify the backend
how conformant the SIMD versions are.”
> > >>>> 
> > >>>> @Shawn, I am afraid I don’t understand what you mean
by “conformant” here. Can you elaborate with an example?
> > >>>> 
> > >>>> TOPIC 3: interaction of the `omp declare variant`
with `clang
> > >>>> declare variant`
> > >>>> 
> > >>>> I believe this is described in the `Option behavior,
and interaction with OpenMP`. The option `-fclang-declare-variant` is there to
make the OpenMP based one orthogonal. Of course, we might decide to make
-fclang-declare-variant on/off by default, and have default behavior when
interacting with -fopenmp-simd. For the sake of compatibility with other
compilers, we might need to require -fno-clang-declare-variant when targeting
-fopenmp-[simd].
> > >>>> 
> > >>>> TOPIC 3: "there are no special arguments / flags
/ status regs that are used / changed in the vector version that the compiler
will have to "just know”
> > >>>> 
> > >>>> I believe that this concern is raised by the problem
of handling FP exceptions? If that’s the case, the compiler is not allowed to do
any assumption on the vector function about that, and treat it with the same
knowledge of any other function, depending on the visibility it has in the
compilation unit. @Renato, does this answer your question?
> > >>>> 
> > >>>> TOPIC 4: attribute in function declaration vs
attribute
> > >>>> function call site
> > >>>> 
> > >>>> We discussed this in the previous version of the
proposal. Having it in the call sites guarantees that incompatible vector
version are used when merging modules compiled for different targets. I don’t
have a use case for this, if I remember correctly this was asked by @Hideki
Saito. Hideki, any comment on this?
> > >>>> 
> > >>>> TOPIC 5: overriding system header (the discussion on
#pragma omp/clang/system variants initiated by @Hal Finkel).
> > >>>> 
> > >>>> I though that the split among #pragma clang declare
variant and #pragma omp declare variant was already providing the orthogonality
between system header and user header. Meaning that a user should always prefer
the omp version (for portability to other compilers) instead of the #pragma
clang one, which would be relegated to system headers and headers provided by
the compiler. Am I missing something? If so, I am happy to add a “system”
version of the directive, as it would be quite easy to do given most of the
parsing infrastructure will be shared.
> > >>>> 
> > >>>> 
> > >>>>> On May 30, 2019, at 12:53 PM, Philip Reames
<listmail at philipreames.com> wrote:
> > >>>>> 
> > >>>>> 
> > >>>>> On 5/30/19 9:05 AM, Doerfert, Johannes wrote:
> > >>>>>> On 05/29, Finkel, Hal J. via cfe-dev wrote:
> > >>>>>>> On 5/29/19 1:52 PM, Philip Reames wrote:
> > >>>>>>>> On 5/28/19 7:55 PM, Finkel, Hal J.
wrote:
> > >>>>>>>>> On 5/28/19 3:31 PM, Philip Reames
via cfe-dev wrote:
> > >>>>>>>>>> I generally like the idea of
having support in IR for
> > >>>>>>>>>> vectorization of custom
functions.  I have several use cases which would benefit from this.
> > >>>>>>>>>> 
> > >>>>>>>>>> I'd suggest a couple of
reframings to the IR representation though.
> > >>>>>>>>>> 
> > >>>>>>>>>> First, this should probably
be specified as
> > >>>>>>>>>> metadata/attribute on a
function declaration.  Allowing
> > >>>>>>>>>> the callsite variant is fine,
but it should primarily be
> > >>>>>>>>>> a property of the called
function, not of the call site.  Being able to specify it once per declaration
is much cleaner.
> > >>>>>>>>> I agree. We should support this
both on the function
> > >>>>>>>>> declaration and on the call
sites.
> > >>>>>>>>> 
> > >>>>>>>>> 
> > >>>>>>>>>> Second, I really don't
like the mangling use here.  We
> > >>>>>>>>>> need a better way to specify
the properties of the
> > >>>>>>>>>> function then it's
mangled name.  One thought to explore
> > >>>>>>>>>> is to directly use the Value
of the function declaration
> > >>>>>>>>>> (since this is metadata and
we can do that), and then tie
> > >>>>>>>>>> the properties to the
function declaration in some way?  Sorry, I don't really have a specific
suggestion here.
> > >>>>>>>>> Is the problem the mangling or
the fact that the mangling
> > >>>>>>>>> is ABI/target-specific? One
option is to use LLVM's
> > >>>>>>>>> mangling scheme (the one we use
for intrinsics) and then
> > >>>>>>>>> provide some backend
infrastructure to translate later.
> > >>>>>>>> Well, both honestly.  But mangling
with a non-target specific scheme is
> > >>>>>>>> a lot better, so I might be okay with
that.   Good idea.
> > >>>>>>> 
> > >>>>>>> I liked your idea of directly encoding
the signature in the
> > >>>>>>> metadata, but I think that we want to
continue to use
> > >>>>>>> attributes, and not metadata, and the
options for attributes
> > >>>>>>> seem more limited - unless we allow
attributes to take
> > >>>>>>> metadata arguments - maybe that's an
enhancement worth considering.
> > >>>>>> I recently talked to people in the OpenMP
language committee
> > >>>>>> meeting about this and, thinking forward to
the actual
> > >>>>>> implementation/use of the OpenMP 5.x declare
variant feature, I'd say:
> > >>>>>> 
> > >>>>>> - We will need a mangling scheme if we want
to allow variants
> > >>>>>> on declarations that are defined elsewhere.
> > >>>>>> - We will need a (OpenMP) standardized
mangling scheme if we
> > >>>>>> want interoperability between compilers.
> > >>>>>> 
> > >>>>>> I assume we want both so I think we will need
both.
> > >>>>> If I'm reading this correctly, this describes
a need for the
> > >>>>> frontend to have a mangling scheme.  Nothing in
here would
> > >>>>> seem to prevent the frontend for generating a
declaration for
> > >>>>> a mangled external symbol and then referencing
that declaration.  Am I missing something?
> > >>>>>> 
> > >>>>>> That said, I think this should allow us to
avoid
> > >>>>>> attributes/metadata which seems to me like a
good thing right now.
> > >>>>>> 
> > >>>>>> Cheers,
> > >>>>>> Johannes
> > >>>>>> 
> > >>>>>> 
> > >>>>>>>>>> On 5/28/19 12:44 PM,
Francesco Petrogalli via llvm-dev wrote:
> > >>>>>>>>>>> Dear all,
> > >>>>>>>>>>> 
> > >>>>>>>>>>> This RFC is a proposal to
provide auto-vectorization functionality for user provided vector functions.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The proposal is a
modification of an RFC that I have sent out a couple of months ago, with the
title `[RFC] Re-implementing -fveclib with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The original RFC was
proposing to re-implement the `-fveclib` command line option. This proposal
avoids that, and limits its scope to the mechanics of providing vector function
in user code that the compiler can pick up for auto-vectorization. This narrower
scope limits the impact of changes that are needed in both clang and LLVM.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> Please let me know what
you think.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> Kind regards,
> > >>>>>>>>>>> 
> > >>>>>>>>>>> Francesco
> > >>>>>>>>>>> 
> > >>>>>>>>>>> 
> > >>>>>>>>>>>
=======================================================> >
>>>>>>>>>>> => >
>>>>>>>>>>> => >
>>>>>>>>>>> ====================> >
>>>>>>>>>>>
> > >>>>>>>>>>> Introduction
> > >>>>>>>>>>> ===========> >
>>>>>>>>>>>
> > >>>>>>>>>>> This RFC encompasses the
proposal of informing the
> > >>>>>>>>>>> vectorizer about the
availability of vector functions
> > >>>>>>>>>>> provided by the user. The
mechanism is based on the use
> > >>>>>>>>>>> of the directive `declare
variant` introduced in OpenMP
> > >>>>>>>>>>> 5.0 [^1].
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The mechanism proposed
has the following properties:
> > >>>>>>>>>>> 
> > >>>>>>>>>>> 1.  Decouples the
compiler front-end that knows about the availability
> > >>>>>>>>>>>    of vectorized
routines, from the back-end that knows how to make use
> > >>>>>>>>>>>    of them.
> > >>>>>>>>>>> 2.  Enable support for a
developer's own vector libraries without
> > >>>>>>>>>>>    requiring changes to
the compiler.
> > >>>>>>>>>>> 3.  Enables other
frontends (e.g. f18) to add scalar-to-vector function
> > >>>>>>>>>>>    mappings as relevant
for their own runtime libraries, etc.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The implemetation
consists of two separate sets of changes.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The first set is a set o
changes in `llvm`, and consists of:
> > >>>>>>>>>>> 
> > >>>>>>>>>>> 1.  [Changes in LLVM
IR](#llvmIR) to provide information about the
> > >>>>>>>>>>>    availability of
user-defined vector functions via metadata attached
> > >>>>>>>>>>>    to an
`llvm::CallInst`.
> > >>>>>>>>>>> 2.  [An
infrastructure](#infrastructure) that can be queried to retrive
> > >>>>>>>>>>>    information about the
available vector functions associated to a
> > >>>>>>>>>>>    `llvm::CallInst`.
> > >>>>>>>>>>> 3.  [Changes in the
LoopVectorizer](#LV) to use the API to query the
> > >>>>>>>>>>>    metadata.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The second set consists
of the changes [changes in
> > >>>>>>>>>>> clang](#clang) that are
needed too to recognize the
> > >>>>>>>>>>> `#pragma clang declare
variant` directive.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> Proposed changes
> > >>>>>>>>>>> ===============> >
>>>>>>>>>>>
> > >>>>>>>>>>> We propose an
implementation that uses `#pragma clang
> > >>>>>>>>>>> declare variant` to
inform the backend components about
> > >>>>>>>>>>> the availability of
vector version of scalar functions
> > >>>>>>>>>>> found in IR. The
mechanism relies in storing such
> > >>>>>>>>>>> information in IR
metadata, and therefore makes the
> > >>>>>>>>>>> auto-vectorization of
function calls a mid-end (`opt`) process that is independent on the front-end
that generated such IR metadata.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> This implementation
provides a generic mechanism that
> > >>>>>>>>>>> the users of the LLVM
compiler will be able to use for
> > >>>>>>>>>>> interfacing their own
vector routines for generic code.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The implementation can
also expose
> > >>>>>>>>>>> vectorization-specific
descriptors -- for example, like
> > >>>>>>>>>>> the `linear` and
`uniform` clauses of the OpenMP
> > >>>>>>>>>>> `declare simd` directive
> > >>>>>>>>>>> -- that could be used to
finely tune the automatic
> > >>>>>>>>>>> vectorization of some
functions (think for example the
> > >>>>>>>>>>> vectorization of `double
sincos(double , double *,
> > >>>>>>>>>>> double *)`, where
`linear` can be used to give extra information about the memory layout of the 2
pointers parameters in the vector version).
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The directive `#pragma
clang declare variant` follows
> > >>>>>>>>>>> the syntax of the
`#pragma omp declare variant` directive of OpenMP.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> We define the new
directive in the `clang` namespace
> > >>>>>>>>>>> instead of using the
`omp` one of OpenMP to allow the
> > >>>>>>>>>>> compiler to perform
auto-vectorization outside of an OpenMP SIMD context.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The mechanism is base on
OpenMP to provide a uniform
> > >>>>>>>>>>> user experience across
the two mechanism, and to
> > >>>>>>>>>>> maximise the number of
shared components of the
> > >>>>>>>>>>> infrastructure needed in
the compiler frontend to enable the feature.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> Changes in LLVM IR
{#llvmIR}
> > >>>>>>>>>>> ------------------
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The IR is enriched with
metadata that details the
> > >>>>>>>>>>> availability of vector
versions of an associated scalar
> > >>>>>>>>>>> function. This metadata
is attached to the call site of the scalar function.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The metadata takes the
form of an attribute containing a
> > >>>>>>>>>>> comma separated list of
vector function mappings. Each
> > >>>>>>>>>>> entry has a unique name
that follows the Vector Function
> > >>>>>>>>>>> ABI[^2] and real name
that is used when generating calls to this vector function.
> > >>>>>>>>>>> 
> > >>>>>>>>>>>   
vfunc_name1(real_name1), vfunc_name2(real_name2)
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The Vector Function ABI
name describes the signature of
> > >>>>>>>>>>> the vector function so
that properties like
> > >>>>>>>>>>> vectorisation factor can
be queried during compilation.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The `(real name)` token
is optional and assumed to match
> > >>>>>>>>>>> the Vector Function ABI
name when omitted.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> For example, the
availability of a 2-lane double
> > >>>>>>>>>>> precision `sin` function
via SVML when targeting AVX on
> > >>>>>>>>>>> x86 is provided by the
following IR.
> > >>>>>>>>>>> 
> > >>>>>>>>>>>    // ...
> > >>>>>>>>>>>    ... = call double
@sin(double) #0
> > >>>>>>>>>>>    // ...
> > >>>>>>>>>>> 
> > >>>>>>>>>>>    #0 = { vector-variant
= {"_ZGVcN2v_sin(__svml_sin2),
> > >>>>>>>>>>>                          
_ZGVdN4v_sin(__svml_sin4),
> > >>>>>>>>>>>                          
..."} }
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The string
`"_ZGVcN2v_sin(__svml_sin2)"` in this
> > >>>>>>>>>>> vector-variant attribute
provides information on the
> > >>>>>>>>>>> shape of the vector
function via the string
> > >>>>>>>>>>> `_ZGVcN2v_sin`, mangled
according to the Vector Function
> > >>>>>>>>>>> ABI for Intel, and remaps
the standard Vector Function ABI name to the non-standard name `__svml_sin2`.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> This metadata is
compatible with the proposal "Proposal
> > >>>>>>>>>>> for function
vectorization and loop vectorization with
> > >>>>>>>>>>> function calls",[^3]
that uses Vector Function ABI
> > >>>>>>>>>>> mangled names to inform
the vectorizer about the
> > >>>>>>>>>>> availability of vector
functions. The proposal extends
> > >>>>>>>>>>> the original by allowing
the explicit mapping of the Vector Function ABI mangled name to a non-standard
name, which allows the use of existing vector libraries.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The `vector-variant`
attribute needs to be attached on a
> > >>>>>>>>>>> per-call basis to avoid
conflicts when merging modules with different vector variants.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The query infrastructure:
SVFS {#infrastructure}
> > >>>>>>>>>>>
------------------------------
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The Search Vector
Function System (SVFS) is constructed
> > >>>>>>>>>>> from an `llvm::Module`
instance so it can create
> > >>>>>>>>>>> function definitions. The
SVFS exposes an API with two methods.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> ###
`SVFS::isFunctionVectorizable`
> > >>>>>>>>>>> 
> > >>>>>>>>>>> This method queries the
avilability of a vectorized
> > >>>>>>>>>>> version of a function.
The signature of the method is as follows.
> > >>>>>>>>>>> 
> > >>>>>>>>>>>    bool
isFunctionVectorizable(llvm::CallInst * Call,
> > >>>>>>>>>>> ParTypeMap Params);
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The method determine the
availability of vector version
> > >>>>>>>>>>> of the function invoked
by the `Call` parameter by
> > >>>>>>>>>>> looking at the
`vector-variant` metadata.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The `Params` argument is
a map that associates the
> > >>>>>>>>>>> position of a parameter
in the `CallInst` to its
> > >>>>>>>>>>> `ParameterType`
descriptor. The `ParameterType`
> > >>>>>>>>>>> descriptor holds
information about the shape of the
> > >>>>>>>>>>> correspondend parameter
in the signature of the vector
> > >>>>>>>>>>> function. This
`ParamaterType` is used to query the SVMS
> > >>>>>>>>>>> about the availability of
vector version that have `linear`, `uniform` or `align` parameters (in the sense
of OpenMP 4.0 and onwards).
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The method
`isFunctionVectorizable`, when invoked with
> > >>>>>>>>>>> an empty `ParTypeMap`, is
equivalent to the
> > >>>>>>>>>>> `TargetLibraryInfo`
method `isFunctionVectorizable(StrinRef Name)`.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> ###
`SVFS::getVectorizedFunction`
> > >>>>>>>>>>> 
> > >>>>>>>>>>> This method returns the
vector function declaration that
> > >>>>>>>>>>> correspond to the needs
of the vectorization technique that is being run.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The signature of the
function is as follows.
> > >>>>>>>>>>> 
> > >>>>>>>>>>>   
std::pair<llvm::FunctionType *, std::string> getVectorizedFunction(
> > >>>>>>>>>>>      llvm::CallInst *
Call, unsigned VF, bool IsMasked,
> > >>>>>>>>>>> ParTypeSet Params);
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The `Call` parameter is
the call instance that is being
> > >>>>>>>>>>> vectorized, the `VF`
parameter represent the
> > >>>>>>>>>>> vectorization factor (how
many lanes), the `IsMasked`
> > >>>>>>>>>>> parameter decides whether
or not the signature of the
> > >>>>>>>>>>> vector function is
required to have a mask parameter,
> > >>>>>>>>>>> the `Params` parameter
describes the shape of the vector function as in the `isFunctionVectorizable`
method.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The methods uses the
`vector-variant` metadata and
> > >>>>>>>>>>> returns the function
signature and the name of the function based on the input parameters.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The SVFS can add new
function definitions, in the same
> > >>>>>>>>>>> module as the `Call`, to
provide vector functions that
> > >>>>>>>>>>> are not present within
the vector-variant metadata. For
> > >>>>>>>>>>> example, if a library
provides a vector version of a
> > >>>>>>>>>>> function with a
vectorization factor of 2, but the
> > >>>>>>>>>>> vectorizer is requesting
a vectorization factor of 4,
> > >>>>>>>>>>> the SVFS is allowed to
create a definition that calls
> > >>>>>>>>>>> the 2-lane version twice.
This capability applies similarly for providing masked and unmasked versions
when the request does not match what is available in the library.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> This method is equivalent
to the TLI method `StringRef
> > >>>>>>>>>>>
getVectorizedFunction(StringRef F, unsigned VF) const;`.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> Notice that to fully
support OpenMP vectorization we
> > >>>>>>>>>>> need to think about a
fuzzy matching mechanism that is
> > >>>>>>>>>>> able to select a
candidate in the calling context.
> > >>>>>>>>>>> However, this proposal is
intended for scalar-to-vector
> > >>>>>>>>>>> mappings of math-like
functions that are most likely to
> > >>>>>>>>>>> associate a unique vector
candidate in most contexts.
> > >>>>>>>>>>> Therefore, extending this
behavior to a generic one is an aspect of the implementation that will be
treated in a separate RFC about the vectorization pass.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> ### Scalable
vectorization
> > >>>>>>>>>>> 
> > >>>>>>>>>>> Both methods of the SVFS
API will be extended with a
> > >>>>>>>>>>> boolean parameter to
specify whether scalable signatures
> > >>>>>>>>>>> are needed by the user of
the SVFS.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> Changes in clang {#clang}
> > >>>>>>>>>>> ----------------
> > >>>>>>>>>>> 
> > >>>>>>>>>>> We use clang to generate
the metadata described above.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> In the compilation unit,
the vector function definition
> > >>>>>>>>>>> or declaration must be
visible and associated to the
> > >>>>>>>>>>> scalar version via the
`#pragma clang declare variant`
> > >>>>>>>>>>> according to the rule
defined by the correspondent
> > >>>>>>>>>>> `#pragma omp declare
variant` defined in OpenMP 5.0, as in the following example.
> > >>>>>>>>>>> 
> > >>>>>>>>>>>    #pragma clang declare
variant(vector_sinf) \
> > >>>>>>>>>>>   
match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})
> > >>>>>>>>>>>    extern float
sinf(float);
> > >>>>>>>>>>> 
> > >>>>>>>>>>>    float32x4_t
vector_sinf(float32x4_t x);
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The `construct` set in
the directive, together with the
> > >>>>>>>>>>> `device` set, is used to
generate the vector mangled
> > >>>>>>>>>>> name to be used in the
`vector-variant` attribute, for
> > >>>>>>>>>>> example `_ZGVnN2v_sin`,
when targeting
> > >>>>>>>>>>> AArch64 Advanced SIMD
code generation. The rule for
> > >>>>>>>>>>> mangling the name of the
scalar function in the vector
> > >>>>>>>>>>> name are defined in the
the Vector Function ABI specification of the target.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The part of the
vector-variant attribute that redirects
> > >>>>>>>>>>> the call to `vector_sinf`
is derived from the
> > >>>>>>>>>>> `variant-id` specified in
the `variant` clause.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> Summary
> > >>>>>>>>>>> ======> >
>>>>>>>>>>>
> > >>>>>>>>>>> New `clang` directive in
clang
> > >>>>>>>>>>>
------------------------------
> > >>>>>>>>>>> 
> > >>>>>>>>>>> `#pragma omp declare
variant`, same as `#pragma omp
> > >>>>>>>>>>> declare variant`
restricted to the `simd` context selector, from OpenMP 5.0+.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> Option behavior, and
interaction with OpenMP
> > >>>>>>>>>>>
--------------------------------------------
> > >>>>>>>>>>> 
> > >>>>>>>>>>> The behavior described
below makes sure that `#pragma
> > >>>>>>>>>>> cland declare variant`
function vectorization and OpenMP
> > >>>>>>>>>>> function vectorization
are orthogonal.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> `-fclang-declare-variant`
> > >>>>>>>>>>> 
> > >>>>>>>>>>> :   The `#pragma clang
declare variant` directives are parsed and used
> > >>>>>>>>>>>    to populate the
`vector-variant` attribute.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> `-fopenmp[-simd]`
> > >>>>>>>>>>> 
> > >>>>>>>>>>> :   The `#pragma omp
declare variant` directives are parsed and used to
> > >>>>>>>>>>>    populate the
`vector-variant` attribute.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> `-fopenmp[-simd]`and
`-fno-clang-declare-variant`
> > >>>>>>>>>>> 
> > >>>>>>>>>>> :   The directive
`#pragma omp declare variant` is used to populate the
> > >>>>>>>>>>>    `vector-variant`
attribute in IR. The directive
> > >>>>>>>>>>>    `#pragma   clang
declare variant` are ignored.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> [^1]: 
> > >>>>>>>>>>>
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Sp
> > >>>>>>>>>>> ec
> > >>>>>>>>>>> if
> > >>>>>>>>>>> ication-5.0.pdf>
> > >>>>>>>>>>> 
> > >>>>>>>>>>> [^2]: Vector Function ABI
for x86:
> > >>>>>>>>>>>   
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
> > >>>>>>>>>>>    Vector Function ABI
for AArch64:
> > >>>>>>>>>>>    
> > >>>>>>>>>>>
https://developer.arm.com/products/software-development-
> > >>>>>>>>>>> to ol
s/hpc/arm-compiler-for-hpc/vector-function-abi
> > >>>>>>>>>>> 
> > >>>>>>>>>>> [^3]: 
> > >>>>>>>>>>>
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/0477
> > >>>>>>>>>>> 32
> > >>>>>>>>>>> .h
> > >>>>>>>>>>> tml>
> > >>>>>>>>>>> 
> > >>>>>>>>>>>
_______________________________________________
> > >>>>>>>>>>> LLVM Developers mailing
list llvm-dev at lists.llvm.org
> > >>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >>>>>>>>>>
_______________________________________________
> > >>>>>>>>>> cfe-dev mailing list
> > >>>>>>>>>> cfe-dev at lists.llvm.org
> > >>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> > >>>>>>> --
> > >>>>>>> Hal Finkel
> > >>>>>>> Lead, Compiler Technology and Programming
Languages
> > >>>>>>> Leadership Computing Facility Argonne
National Laboratory
> > >>>>>>> 
> > >>>>>>>
_______________________________________________
> > >>>>>>> cfe-dev mailing list
> > >>>>>>> cfe-dev at lists.llvm.org
> > >>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> > >>>> 
> > >>> 
> > >>> --
> > >>> 
> > >>> Johannes Doerfert
> > >>> Researcher
> > >>> 
> > >>> Argonne National Laboratory
> > >>> Lemont, IL 60439, USA
> > >>> 
> > >>> jdoerfert at anl.gov
> > >> 
> > > 
> > > --
> > > 
> > > Johannes Doerfert
> > > Researcher
> > > 
> > > Argonne National Laboratory
> > > Lemont, IL 60439, USA
> > > 
> > > jdoerfert at anl.gov
> > 
> 
> --
> 
> Johannes Doerfert
> Researcher
> 
> Argonne National Laboratory
> Lemont, IL 60439, USA
> 
> jdoerfert at anl.gov
-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov

Doerfert, Johannes via llvm-dev

2019-Jun-01 01:17 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

On 06/01, Saito, Hideki wrote:> 
> Page 22 of OpenMP 5.0 specification (Lines 13/14):
> 
> 	When any thread encounters a simd construct, the iterations of the loop
associated with the
> 	construct may be executed concurrently using the SIMD lanes that are
available to the thread
> 
> This is the Execution Model. The word here is "may" i.e., not
"must".
The question I'm asking is:
Can the user observe concurrent execution of loop iterations for part of
the loop while other parts are executed sequentially? I'm aware that
this is not necessarily practical but it seems good to be aware of
nevertheless.
> Declare simd is not explicitly mentioned here, but requiring
> vectorization in declare simd when the caller simd construct may not
> be vectorized is odd. 
My problem is using a declare simd version inside a not vectorized code
called from a otherwise vectorized simd context. Thinking about it, that
is actually less theoretical than I thought.

> Having said that, ICC implementation is "work
> extremely hard to vectorize", and we are proud of it. Our
> recommendation is do the same as ICC (i.e., requires a lot of beefing
> up in LoopVectorize) and I think that's what you are asking. If so, we
> are aligned.
I agree that we should do a better job but I would argue something like
the region vectorizer (RV) [0] would be a much better starting point.
Though, that is a different discussion I guess.

[0] https://github.com/cdl-saarland/rv

> If anyone strongly goes against that idea (i.e., anyone wanting to keep
OpenMP simd as just an optimization hint for auto-vectorizer), please speak up.
> 
> >if you think there is a problem to reuse that for OpenCL/SYCL, let us
know.
> 
> Sure.
> 
> -----Original Message-----
> From: Doerfert, Johannes [mailto:jdoerfert at anl.gov] 
> Sent: Friday, May 31, 2019 4:58 PM
> To: Saito, Hideki <hideki.saito at intel.com>
> Cc: Francesco Petrogalli <Francesco.Petrogalli at arm.com>; Philip
Reames <listmail at philipreames.com>; Finkel, Hal J. <hfinkel at
anl.gov>; LLVM Development List <llvm-dev at lists.llvm.org>; nd <nd
at arm.com>; Clang Dev <cfe-dev at lists.llvm.org>; scogland1 at
llnl.gov
> Subject: Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided vector
function for auto-vectorization.
> 
> On 05/31, Saito, Hideki wrote:
> > 
> > >Is this also the case if the user did require lock-step semantic
for the code to be correct?
> > 
> > Certainly not, but that part is actually beyond OpenMP specification.
> > I suggest looking up ICC's "#pragma simd assert"
description and see
> > if the assert feature is something you may be interested in seeing as 
> > an extended part of LLVM implementation of OpenMP (declare) simd.
> > Else, vectorization report would tell you whether it was vectorized or
> > not.
> 
> Wait, why do you think it is beyond the OpenMP specification? If an OpenMP
variant is picked based on the context the user should be able to assume the
context requirements are fulfilled. If we agree on that, I don't see how we
can do anything else than emitting an error if we optimistically picked a vector
variant but failed to vectorize. I don't think that precludes the approach
you've taken though.
>  
> > >How does OpenCL/SYCL play in this now?
> > 
> > Not right now, when we are working to get OpenMP stuff going --- 
> > except that I don't think we need to change the design (e.g., on 
> > function attribute, VecClone direction, etc.) in the future for those 
> > or similar languages.
> 
> OK. Whatever is considered here, if you think there is a problem to reuse
that for OpenCL/SYCL, let us know.
> 
> 
> > -----Original Message-----
> > From: Doerfert, Johannes [mailto:jdoerfert at anl.gov]
> > Sent: Friday, May 31, 2019 4:16 PM
> > To: Saito, Hideki <hideki.saito at intel.com>
> > Cc: Francesco Petrogalli <Francesco.Petrogalli at arm.com>;
Philip Reames
> > <listmail at philipreames.com>; Finkel, Hal J. <hfinkel at
anl.gov>; LLVM
> > Development List <llvm-dev at lists.llvm.org>; nd <nd at
arm.com>; Clang Dev
> > <cfe-dev at lists.llvm.org>; scogland1 at llnl.gov
> > Subject: Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided vector
function for auto-vectorization.
> > 
> > On 05/31, Saito, Hideki wrote:
> > > 
> > > >VectorClone does more than just mapping a scalar version to a
vector one. It builds also the vector version definition by auto-vectorizing the
body of the scalar function.
> > > [...]
> > > The code is still fully functional w/o LoopVectorize vectorizing
that loop.
> > 
> > Is this also the case if the user did require lock-step semantic for
the code to be correct?
> > 
> > > >I don’t know if the patches related to VecClone also are
intended to use the `vector-variant` attribute for function declaration with a
#pragma omp declare simd.
> > > 
> > > VecClone predated #pragma omp declare variant. So that patches 
> > > doesn’t know about declare variant. VecClone was written for
handling #pragma omp declare simd, as described above. OpenCL/SYCL kernel is
similar enough to OpenMP declare simd. Most code can be reused.
> > 
> > How does OpenCL/SYCL play in this now?
> > 
> > 
> > > -----Original Message-----
> > > From: Francesco Petrogalli [mailto:Francesco.Petrogalli at
arm.com]
> > > Sent: Friday, May 31, 2019 3:06 PM
> > > To: Doerfert, Johannes <jdoerfert at anl.gov>
> > > Cc: Philip Reames <listmail at philipreames.com>; Finkel,
Hal J.
> > > <hfinkel at anl.gov>; LLVM Development List <llvm-dev at
lists.llvm.org>;
> > > nd <nd at arm.com>; Saito, Hideki <hideki.saito at
intel.com>; Clang Dev
> > > <cfe-dev at lists.llvm.org>; scogland1 at llnl.gov
> > > Subject: Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided
vector function for auto-vectorization.
> > > 
> > > 
> > > 
> > > > On May 31, 2019, at 2:56 PM, Doerfert, Johannes
<jdoerfert at anl.gov> wrote:
> > > > 
> > > > I think I did misunderstand what you want to do with
attributes.
> > > > This is my bad. Let me try to explain:
> > > > 
> > > > It seems you want the "vector-variants" attributes
(which I could
> > > > not find with this name in trunk, correct?) to
"remember" what
> > > > vector versions can be created (wrt. validity), assuming a 
> > > > definition is available? Correct?
> > > 
> > > Yes.
> > > 
> > > > What I was concerned with is the example I sketched
somewhere
> > > > below which motivates the need for a
generalized/standardized name
> > > > mangling for OpenMP. I though you wanted to avoid that
somehow but
> > > > if you don't I misunderstood you. I basically removed
the part
> > > > where the vector versions have to be created first but I
assumed
> > > > them to be existent (in the module or somewhere else). That
is, I
> > > > assumed a call to foo and various symbols available that are
> > > > specializations of foo. When we then vectorize foo (or
otherwise
> > > > specialize at some point in the future), you would scan the
module
> > > > and pick the best match based on the context of the call.
> > > > 
> > > 
> > > Yes, although the syntax you use below is wrong. Declare variant
is attached to the scalar definition, and points to a vector definitions (the
variant) that is declared/defined in the same compilation unit where the scalar
version is visible.
> > > 
> > > 
> > > > Now I don't know if I understood your proposal by now
but let me
> > > > ask a question anyway:
> > > > 
> > > > VecClone.cpp:276-278 mentions that the vectorizer is
supposed to
> > > > look at the vector-variants functions. This works for
variants
> > > > that are created from definitions in the module but what
about
> > > > #omp declare simd declarations?
> > > > 
> > > 
> > > VectorClone does more than just mapping a scalar version to a
vector one. It builds also the vector version definition by auto-vectorizing the
body of the scalar function.
> > > 
> > > I don’t know if the patches related to VecClone also are intended
to use the `vector-variant` attribute for function declaration with a #pragma
omp declare simd. On aarch64, in Arm compiler for HPC, we do that to support
vector math libraries. It works in principle, but `vector variant` allows more
context selection (and custom names instead of vector ABI names, which are
easier for users).
> > > 
> > > 
> > > > 
> > > > On 05/31, Francesco Petrogalli wrote:
> > > >>> On May 31, 2019, at 11:47 AM, Doerfert, Johannes
<jdoerfert at anl.gov> wrote:
> > > >>> 
> > > >>> I think we should split this discussion:
> > > >>> TOPIC 1 & 2 & 4: How do implement all use
cases and OpenMP 5.X
> > > >>>                  features, including compatibility
with other
> > > >>>                  compilers and cross module support.
> > > >> 
> > > >> Yes, and we have to carefully make this as standard and
compatible as possible.
> > > > 
> > > > Agreed.
> > > > 
> > > > 
> > > >>> TOPIC 3b & 5: Interoperability with clang
declare (system vs. user
> > > >>>                declares)
> > > >> 
> > > >> 
> > > >> I think that Alexey explanation of how the directive are
handled
> > > >> internally in the frontend makes us propound towards the
attribute.
> > > > 
> > > > How things are handled right now, especially given that
declare
> > > > variant is not handled at all, should not limit our design
space.
> > > > If the argument is that we cannot reasonably implement a
solution,
> > > > that is a different story.
> > > > 
> > > > 
> > > >>> TOPIC 3a & 3c: floating point issues?
> > > >>> 
> > > >> 
> > > >> I believe there is no issue there. I have quoted the
openMP standard in reply to Renato:
> > > >> 
> > > >> See
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page
118, lines 23-24:
> > > >> 
> > > >> “The execution of the function or subroutine cannot have
any side
> > > >> effects that would alter its execution for concurrent
iterations
> > > >> of a SIMD chunk."
> > > > 
> > > > Great.
> > > > 
> > > > 
> > > >>> I inlined comments for Topic 1 below.
> > > >>> 
> > > >>> I hope that we do not have to discuss topic 2 if we
agree
> > > >>> neither attributes nor metadata is necessary, or
better, will
> > > >>> solve the actual problem at hand. I don't have
strong feeling on
> > > >>> topic 4 but I have the feeling this will become less
problematic once we figure out topic 1.
> > > >>> 
> > > >>> Thanks,
> > > >>> Johannes
> > > >>> 
> > > >>> 
> > > >>> On 05/31, Francesco Petrogalli wrote:
> > > >>>> # TOPIC 1: concerns about name mangling
> > > >>>> 
> > > >>>> I understand that there are concerns in using
the mangling
> > > >>>> scheme I proposed, and that it would be
preferred to have a
> > > >>>> mangling scheme that is based on (and
standardized by) OpenMP.
> > > >>> 
> > > >>> I still think it will be required to have a
standardized one,
> > > >>> not only preferred.
> > > >>> 
> > > >>> 
> > > >> 
> > > >> I am all with you in standardizing. x86 and arch64 have
their own
> > > >> vector function ABI, which, although “private”, are to
be
> > > >> considered standard. Opensource and commercial compilers
are
> > > >> using them, therefore we have to deal with this mangling
scheme,
> > > >> whether or not OpenMP comes up with a standard mangling
scheme.
> > > > 
> > > > I don't get the point you are trying to make here. What
do you
> > > > mean by "we have to deal with"? (I do not suggest
to get rid of
> > > > them.)
> > > > 
> > > 
> > > That we cannot ignore the fact that the name scheme is already
standardized by the vendors, so let’s first deal with what we have, and think
about the OpenMP mangling scheme only once there is one available.
> > > 
> > > > 
> > > >>>> I hear the argument on having some common ground
here. In fact,
> > > >>>> there is already common ground between the x86
and aarch64
> > > >>>> backend, who have based their respective Vector
Function ABI specifications on OpenMP.
> > > >>>> 
> > > >>>> In fact, the mangled name grammar can be
summarized as follows:
> > > >>>> 
> > > >>>>
_ZGV<isa><masking><VLEN><parameter type>_<scalar
name>
> > > >>>> 
> > > >>>> Across vector extensions the only <token>
that will differ is
> > > >>>> the <isa> token.
> > > >>>> 
> > > >>>> This might lead people to think that we could
drop the
> > > >>>> _ZGV<isa> prefix and consider the
<masking><VLEN><parameter
> > > >>>> type>_<scalar
> > > >>>> name> part as a sort of unofficial OpenMP
mangling scheme: in
> > > >>>> name> fact,
> > > >>>> the signature of an “unmasked 2-lane vector
vector of `sin`”
> > > >>>> will always be `<2 x double>(2 x
double>).
> > > >>>> 
> > > >>>> The problem with this choice is the number of
vector version
> > > >>>> available for a target is not unique.
> > > >>> 
> > > >>> For me, this simply means this mangling scheme is
not sufficient.
> > > >>> 
> > > >> 
> > > >> Can you explain more why you think the mangling scheme
is not
> > > >> sufficient? The mangling scheme is shaped to provide all
the
> > > >> information that the OpenMP directive describes.
> > > > 
> > > > I don't know if it is insufficient but I though you
hinted towards that.
> > > 
> > > I didn’t mean that, the tokens in the vector function ABI mangled
schemes are sufficient.
> > > 
> > > > If we can handle/decode everything we need for declare
variants
> > > > then I do not object at all. If not, we require respective 
> > > > extension such that we can. The result should be a superset
of the
> > > > current SIMD encoding and compatible with the current one.
> > > > 
> > > > 
> > > 
> > > We can handle/decode everything for a SIMD context. :)
> > > 
> > > 
> > > > 
> > > >> The fact that x86 and aarch64 realize such information
in
> > > >> different way (multiple signature/vector extensions) is
something
> > > >> that cannot be avoided, because it is related to
architectural
> > > >> aspects that are specific to the vector extension and
transparent
> > > >> to the OpenMP standard.
> > > > 
> > > > I don't think that is a problem (that's why I
"failed to see the
> > > > problem" in the comment below). I look at it this way:
If #declare
> > > > simd, or similar, results in N variants, it should at the
end of
> > > > the day not be different from declaring these N variants 
> > > > explicitly with the respective declare variant match clause.
> > > > 
> > > 
> > > That’s not the case. #declare simd should create all the versions
that are optimal for the target. We carefully thoght about that when writing the
vector function ABI. Most of the constrains derive by the fact that each target
has a specific register size.
> > > 
> > > Example:
> > > 
> > > #pragma omp declare simd
> > > Float foo(float);
> > > 
> > > X86 -> 8 version {2, 4, 8, 16 lanes} x {masking, no masking},
see
> > > https://godbolt.org/z/m1BUVt Arm NEON: -> 4 versions {2, 4
lanes} x
> > > {masking, no masking } Arm SVE: -> 1 version
> > > 
> > > Therefore, the outcome of declare simd is not target independent.
Your expectation are met only inside one target.
> > > 
> > > 
> > > > 
> > > >>>> In particular, the following declaration
generates multiple
> > > >>>> vector versions, depending on the target:
> > > >>>> 
> > > >>>> #pragma omp declare simd simdlen(2) notinbranch
double
> > > >>>> foo(double) {…};
> > > >>>> 
> > > >>>> On x86, this generates at least 4 symbols (one
for SSE, one for
> > > >>>> AVX, one for AVX2, and one for AVX512:
> > > >>>> https://godbolt.org/z/TLYXPi)
> > > >>>> 
> > > >>>> On aarch64, the same declaration generates a
unique symbol, as
> > > >>>> specified in the Vector Function ABI.
> > > >>> 
> > > >>> I fail to see the problem. We generate X symbols for
X different
> > > >>> contexts. Once we get to the point where we
vectorize, we
> > > >>> determine which context fits best and choose the
corresponding symbol version.
> > > >>> 
> > > >> 
> > > >> Yes, this is exactly what we need to do, under the
constrains
> > > >> that the rules for  generating "X symbols for X
different
> > > >> contexts” are decided by the Vector Function ABI of the
target.
> > > > 
> > > > Sounds good. The vector ABI is used to determine what
contexts
> > > > exists and what symbols should be created. I would assume
the
> > > > encoding should be the same as if we specified the versions
> > > > (/contexts) ourselves via #declare variant.
> > > > 
> > > 
> > > Oh yes, vector functions listed in a declare variant should obey
the vector function ABI rules (other than the function name).
> > > 
> > > > 
> > > >>> Maybe my view is to naive here, please feel free to
correct me.
> > > >>> 
> > > >>> 
> > > >>>> This means that the attribute (or metadata) that
carries the
> > > >>>> information on the available vector version
needs to deal also
> > > >>>> with things that are not usually visible at IR
level, but that
> > > >>>> might still need to be provided to be able to
decide which
> > > >>>> particular instruction set/ vector extension
needs to be targeted.
> > > >>> 
> > > >>> The symbol names should carry all the information we
need. If
> > > >>> they do not, we need to improve the mangling scheme
such that they do.
> > > >>> There is no attributes/metadata we could use at
library boundaries.
> > > >>> 
> > > >> Hum, I am not sure what you mean by "There is no 
> > > >> attributes/metadata we could use at library
boundaries."
> > > > 
> > > > (This seems to be part of the misunderstanding, I leave my
comment
> > > > here
> > > > anyway:)
> > > > 
> > > > The simd-related stuff works because it is a uniform
mangling
> > > > scheme used by all compilers. Take the situation below in
which I
> > > > think we want to call foo_CTX in the library. If so, we need
a name for it.
> > > > 
> > > 
> > > In the situation below, the mangled name is going to be the same
for both compilers, as long as they adhere to the vector function ABI.
> > > 
> > > > 
> > > > a.c:  // Compiled by gcc into a library #omp declare variant
(foo)
> > > > match(CTX) void foo_CTX(...) {...}
> > > > 
> > > > b.c:  // Compiled by clang linked against the library above.
> > > > #omp declare variant (foo) match(CTX) void foo_CTX(...);
> > > > 
> > > > void bar(...) {
> > > >  #pragma omp CTX
> > > >  foo();   // <- What function (symbol) do we call if a.c
was compiled
> > > >           //    by gcc and b.c with clang?
> > > > }
> > > > 
> > > 
> > > Please notice that `declare variant` needs to be attached to the
scalar function, not the vector one.
> > > 
> > > ```
> > > #pragma omp declare variant(foo_CTX) match (context=simd… double
foo
> > > (double) {…}
> > > 
> > > Vector_double_ty foo_CTX(vector_double_ty) {…} ```
> > > 
> > > In vectorizing foo in bar, the compiler will not care where
foo_CTX would come from (of course, as long as the scalar+declare variant
declarations are visible).
> > > 
> > > >> In our downstream compiler (Arm compiler for HPC, based
on LLVM),
> > > >> we use `declare simd` to provide vector math functions
via custom
> > > >> header file. It works brilliantly, if not for specific
aspects
> > > >> that would be perfectly covered by the `declare
variant`, which
> > > >> might be one of the reason why the OpenMP committee
decided to
> > > >> introduce `declare variant`.
> > > > 
> > > > But you (assume that you) control the mangling scheme across
the
> > > > entire infrastructure. Given that the simd mangling is
de-facto
> > > > standardized, that works.
> > > > 
> > > > Side note:
> > > > Declare variant, as of 5.0, is not flexible enough for a
sensible
> > > > inclusion of target specific headers. That will change in
5.1.
> > > > 
> > > 
> > > Could you point me at the discussion in 5.1 on this specific
aspect?
> > > 
> > > 
> > > > 
> > > >> If your concerns is that by adding an attribute that
somehow
> > > >> represent something that is available in an external
library is
> > > >> not enough to guarantee that that symbol is available in
the
> > > >> library… not even C code can guarantee that? If the
linker is not
> > > >> pointing to the right library, there is nothing that can
prevent
> > > >> it to fail if the symbol is not present?
> > > > 
> > > > I don't follow the example you describe. I don't
want to change
> > > > anything in how symbols are looked up or what happens if
they are missing.
> > > > 
> > > > 
> > > 
> > > I don’t want to change that too :). I think we are
misunderstanding each other here...
> > > 
> > > >>>> I used an example based on `declare simd`
instead of `declare
> > > >>>> variant` because the attribute/metadata needed
for `declare
> > > >>>> variant` is a modification of the one needed for
`declare
> > > >>>> simd`, which has already been agreed in a
previous RFC proposed
> > > >>>> by Intel [1], and for which Intel has already
provided an
> > > >>>> implementation [2]. The changes proposed in this
RFC are fully
> > > >>>> compatible with the work that is being don for
the VecClone pass in [2].
> > > >>>> 
> > > >>>> [1]
> > > >>>>
http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
> > > >>>> [2] VecCLone pass:
https://reviews.llvm.org/D22792
> > > >>> 
> > > >>> Having an agreed upon mangling for the older feature
is not
> > > >>> necessarily important here. We will need more
functionality for
> > > >>> variants and keeping the old scheme around with some
metadata is
> > > >>> not an extensible long-term solution. So, I would
not try to fit
> > > >>> variants into the existing simd-scheme but instead
do it the
> > > >>> other way around. We define what we need for
variants and implement simd in that scheme.
> > > >>> 
> > > >> 
> > > >> I kinda think that having agreed on something is
important. It
> > > >> allows to build other things on top of what have been
agreed
> > > >> without breaking compatibility.
> > > >> 
> > > >> On the specific, which are the new functionalities
needed for the
> > > >> variants that would make the current metadata
(attributes) for
> > > >> declare simd non extensible?
> > > > 
> > > > See first comment.
> > > > 
> > > >>>> The good news is that as far as AArch64 and x86
are concerned, the only thing that will differ in the mangled name is the
“<isa>” token. As far as I can tell, the mangling scheme of the rest of
the vector name is the same, therefore a lot of infrastructure in terms of
mangling and demangling can be reused. In fact, the `mangleVectorParameters`
function in
https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could
already be shared among x86 and aarch64.
> > > >>>> 
> > > >>>> TOPIC 2: metadata vs attribute
> > > >>>> 
> > > >>>> From a functionality point of view, I don’t care
whether we use metadata or attributes. The VecClone pass mentioned in TOPIC 1
uses the following:
> > > >>>> 
> > > >>>> attributes #0 = { nounwind uwtable 
> > > >>>>
“vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8v
> > > >>>> v_
> > > >>>> ve
> > > >>>>
c_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZG
> > > >>>> Ve
> > > >>>> M1
> > > >>>> 6vv_vec_sum,_ZGVeN16”}
> > > >>>> 
> > > >>>> This is an attribute (I though it was
metadata?), I am happy to reword the RFC using the right terminology (sorry for
messing this up).
> > > >>>> 
> > > >>>> Also, @Renato expressed concern that metadata
might be dropped by optimization passes - would using attributes prevent that?
> > > >>>> 
> > > >>>> TOPIC 3: "There is no way to notify the
backend how conformant the SIMD versions are.”
> > > >>>> 
> > > >>>> @Shawn, I am afraid I don’t understand what you
mean by “conformant” here. Can you elaborate with an example?
> > > >>>> 
> > > >>>> TOPIC 3: interaction of the `omp declare
variant` with `clang
> > > >>>> declare variant`
> > > >>>> 
> > > >>>> I believe this is described in the `Option
behavior, and interaction with OpenMP`. The option `-fclang-declare-variant` is
there to make the OpenMP based one orthogonal. Of course, we might decide to
make -fclang-declare-variant on/off by default, and have default behavior when
interacting with -fopenmp-simd. For the sake of compatibility with other
compilers, we might need to require -fno-clang-declare-variant when targeting
-fopenmp-[simd].
> > > >>>> 
> > > >>>> TOPIC 3: "there are no special arguments /
flags / status regs that are used / changed in the vector version that the
compiler will have to "just know”
> > > >>>> 
> > > >>>> I believe that this concern is raised by the
problem of handling FP exceptions? If that’s the case, the compiler is not
allowed to do any assumption on the vector function about that, and treat it
with the same knowledge of any other function, depending on the visibility it
has in the compilation unit. @Renato, does this answer your question?
> > > >>>> 
> > > >>>> TOPIC 4: attribute in function declaration vs
attribute
> > > >>>> function call site
> > > >>>> 
> > > >>>> We discussed this in the previous version of the
proposal. Having it in the call sites guarantees that incompatible vector
version are used when merging modules compiled for different targets. I don’t
have a use case for this, if I remember correctly this was asked by @Hideki
Saito. Hideki, any comment on this?
> > > >>>> 
> > > >>>> TOPIC 5: overriding system header (the
discussion on #pragma omp/clang/system variants initiated by @Hal Finkel).
> > > >>>> 
> > > >>>> I though that the split among #pragma clang
declare variant and #pragma omp declare variant was already providing the
orthogonality between system header and user header. Meaning that a user should
always prefer the omp version (for portability to other compilers) instead of
the #pragma clang one, which would be relegated to system headers and headers
provided by the compiler. Am I missing something? If so, I am happy to add a
“system” version of the directive, as it would be quite easy to do given most of
the parsing infrastructure will be shared.
> > > >>>> 
> > > >>>> 
> > > >>>>> On May 30, 2019, at 12:53 PM, Philip Reames
<listmail at philipreames.com> wrote:
> > > >>>>> 
> > > >>>>> 
> > > >>>>> On 5/30/19 9:05 AM, Doerfert, Johannes
wrote:
> > > >>>>>> On 05/29, Finkel, Hal J. via cfe-dev
wrote:
> > > >>>>>>> On 5/29/19 1:52 PM, Philip Reames
wrote:
> > > >>>>>>>> On 5/28/19 7:55 PM, Finkel, Hal
J. wrote:
> > > >>>>>>>>> On 5/28/19 3:31 PM, Philip
Reames via cfe-dev wrote:
> > > >>>>>>>>>> I generally like the
idea of having support in IR for
> > > >>>>>>>>>> vectorization of custom
functions.  I have several use cases which would benefit from this.
> > > >>>>>>>>>> 
> > > >>>>>>>>>> I'd suggest a couple
of reframings to the IR representation though.
> > > >>>>>>>>>> 
> > > >>>>>>>>>> First, this should
probably be specified as
> > > >>>>>>>>>> metadata/attribute on a
function declaration.  Allowing
> > > >>>>>>>>>> the callsite variant is
fine, but it should primarily be
> > > >>>>>>>>>> a property of the called
function, not of the call site.  Being able to specify it once per declaration
is much cleaner.
> > > >>>>>>>>> I agree. We should support
this both on the function
> > > >>>>>>>>> declaration and on the call
sites.
> > > >>>>>>>>> 
> > > >>>>>>>>> 
> > > >>>>>>>>>> Second, I really
don't like the mangling use here.  We
> > > >>>>>>>>>> need a better way to
specify the properties of the
> > > >>>>>>>>>> function then it's
mangled name.  One thought to explore
> > > >>>>>>>>>> is to directly use the
Value of the function declaration
> > > >>>>>>>>>> (since this is metadata
and we can do that), and then tie
> > > >>>>>>>>>> the properties to the
function declaration in some way?  Sorry, I don't really have a specific
suggestion here.
> > > >>>>>>>>> Is the problem the mangling
or the fact that the mangling
> > > >>>>>>>>> is ABI/target-specific? One
option is to use LLVM's
> > > >>>>>>>>> mangling scheme (the one we
use for intrinsics) and then
> > > >>>>>>>>> provide some backend
infrastructure to translate later.
> > > >>>>>>>> Well, both honestly.  But
mangling with a non-target specific scheme is
> > > >>>>>>>> a lot better, so I might be okay
with that.   Good idea.
> > > >>>>>>> 
> > > >>>>>>> I liked your idea of directly
encoding the signature in the
> > > >>>>>>> metadata, but I think that we want
to continue to use
> > > >>>>>>> attributes, and not metadata, and
the options for attributes
> > > >>>>>>> seem more limited - unless we allow
attributes to take
> > > >>>>>>> metadata arguments - maybe
that's an enhancement worth considering.
> > > >>>>>> I recently talked to people in the
OpenMP language committee
> > > >>>>>> meeting about this and, thinking forward
to the actual
> > > >>>>>> implementation/use of the OpenMP 5.x
declare variant feature, I'd say:
> > > >>>>>> 
> > > >>>>>> - We will need a mangling scheme if we
want to allow variants
> > > >>>>>> on declarations that are defined
elsewhere.
> > > >>>>>> - We will need a (OpenMP) standardized
mangling scheme if we
> > > >>>>>> want interoperability between compilers.
> > > >>>>>> 
> > > >>>>>> I assume we want both so I think we will
need both.
> > > >>>>> If I'm reading this correctly, this
describes a need for the
> > > >>>>> frontend to have a mangling scheme.  Nothing
in here would
> > > >>>>> seem to prevent the frontend for generating
a declaration for
> > > >>>>> a mangled external symbol and then
referencing that declaration.  Am I missing something?
> > > >>>>>> 
> > > >>>>>> That said, I think this should allow us
to avoid
> > > >>>>>> attributes/metadata which seems to me
like a good thing right now.
> > > >>>>>> 
> > > >>>>>> Cheers,
> > > >>>>>> Johannes
> > > >>>>>> 
> > > >>>>>> 
> > > >>>>>>>>>> On 5/28/19 12:44 PM,
Francesco Petrogalli via llvm-dev wrote:
> > > >>>>>>>>>>> Dear all,
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> This RFC is a
proposal to provide auto-vectorization functionality for user provided vector
functions.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The proposal is a
modification of an RFC that I have sent out a couple of months ago, with the
title `[RFC] Re-implementing -fveclib with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The original RFC was
proposing to re-implement the `-fveclib` command line option. This proposal
avoids that, and limits its scope to the mechanics of providing vector function
in user code that the compiler can pick up for auto-vectorization. This narrower
scope limits the impact of changes that are needed in both clang and LLVM.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> Please let me know
what you think.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> Kind regards,
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> Francesco
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>>
=======================================================> > >
>>>>>>>>>>> => > >
>>>>>>>>>>> => > >
>>>>>>>>>>> ====================> > >
>>>>>>>>>>>
> > > >>>>>>>>>>> Introduction
> > > >>>>>>>>>>> ===========> >
> >>>>>>>>>>>
> > > >>>>>>>>>>> This RFC encompasses
the proposal of informing the
> > > >>>>>>>>>>> vectorizer about the
availability of vector functions
> > > >>>>>>>>>>> provided by the
user. The mechanism is based on the use
> > > >>>>>>>>>>> of the directive
`declare variant` introduced in OpenMP
> > > >>>>>>>>>>> 5.0 [^1].
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The mechanism
proposed has the following properties:
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> 1.  Decouples the
compiler front-end that knows about the availability
> > > >>>>>>>>>>>    of vectorized
routines, from the back-end that knows how to make use
> > > >>>>>>>>>>>    of them.
> > > >>>>>>>>>>> 2.  Enable support
for a developer's own vector libraries without
> > > >>>>>>>>>>>    requiring changes
to the compiler.
> > > >>>>>>>>>>> 3.  Enables other
frontends (e.g. f18) to add scalar-to-vector function
> > > >>>>>>>>>>>    mappings as
relevant for their own runtime libraries, etc.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The implemetation
consists of two separate sets of changes.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The first set is a
set o changes in `llvm`, and consists of:
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> 1.  [Changes in LLVM
IR](#llvmIR) to provide information about the
> > > >>>>>>>>>>>    availability of
user-defined vector functions via metadata attached
> > > >>>>>>>>>>>    to an
`llvm::CallInst`.
> > > >>>>>>>>>>> 2.  [An
infrastructure](#infrastructure) that can be queried to retrive
> > > >>>>>>>>>>>    information about
the available vector functions associated to a
> > > >>>>>>>>>>>    `llvm::CallInst`.
> > > >>>>>>>>>>> 3.  [Changes in the
LoopVectorizer](#LV) to use the API to query the
> > > >>>>>>>>>>>    metadata.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The second set
consists of the changes [changes in
> > > >>>>>>>>>>> clang](#clang) that
are needed too to recognize the
> > > >>>>>>>>>>> `#pragma clang
declare variant` directive.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> Proposed changes
> > > >>>>>>>>>>> ===============>
> > >>>>>>>>>>>
> > > >>>>>>>>>>> We propose an
implementation that uses `#pragma clang
> > > >>>>>>>>>>> declare variant` to
inform the backend components about
> > > >>>>>>>>>>> the availability of
vector version of scalar functions
> > > >>>>>>>>>>> found in IR. The
mechanism relies in storing such
> > > >>>>>>>>>>> information in IR
metadata, and therefore makes the
> > > >>>>>>>>>>> auto-vectorization
of function calls a mid-end (`opt`) process that is independent on the front-end
that generated such IR metadata.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> This implementation
provides a generic mechanism that
> > > >>>>>>>>>>> the users of the
LLVM compiler will be able to use for
> > > >>>>>>>>>>> interfacing their
own vector routines for generic code.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The implementation
can also expose
> > > >>>>>>>>>>>
vectorization-specific descriptors -- for example, like
> > > >>>>>>>>>>> the `linear` and
`uniform` clauses of the OpenMP
> > > >>>>>>>>>>> `declare simd`
directive
> > > >>>>>>>>>>> -- that could be
used to finely tune the automatic
> > > >>>>>>>>>>> vectorization of
some functions (think for example the
> > > >>>>>>>>>>> vectorization of
`double sincos(double , double *,
> > > >>>>>>>>>>> double *)`, where
`linear` can be used to give extra information about the memory layout of the 2
pointers parameters in the vector version).
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The directive
`#pragma clang declare variant` follows
> > > >>>>>>>>>>> the syntax of the
`#pragma omp declare variant` directive of OpenMP.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> We define the new
directive in the `clang` namespace
> > > >>>>>>>>>>> instead of using the
`omp` one of OpenMP to allow the
> > > >>>>>>>>>>> compiler to perform
auto-vectorization outside of an OpenMP SIMD context.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The mechanism is
base on OpenMP to provide a uniform
> > > >>>>>>>>>>> user experience
across the two mechanism, and to
> > > >>>>>>>>>>> maximise the number
of shared components of the
> > > >>>>>>>>>>> infrastructure
needed in the compiler frontend to enable the feature.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> Changes in LLVM IR
{#llvmIR}
> > > >>>>>>>>>>> ------------------
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The IR is enriched
with metadata that details the
> > > >>>>>>>>>>> availability of
vector versions of an associated scalar
> > > >>>>>>>>>>> function. This
metadata is attached to the call site of the scalar function.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The metadata takes
the form of an attribute containing a
> > > >>>>>>>>>>> comma separated list
of vector function mappings. Each
> > > >>>>>>>>>>> entry has a unique
name that follows the Vector Function
> > > >>>>>>>>>>> ABI[^2] and real
name that is used when generating calls to this vector function.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>>   
vfunc_name1(real_name1), vfunc_name2(real_name2)
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The Vector Function
ABI name describes the signature of
> > > >>>>>>>>>>> the vector function
so that properties like
> > > >>>>>>>>>>> vectorisation factor
can be queried during compilation.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The `(real name)`
token is optional and assumed to match
> > > >>>>>>>>>>> the Vector Function
ABI name when omitted.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> For example, the
availability of a 2-lane double
> > > >>>>>>>>>>> precision `sin`
function via SVML when targeting AVX on
> > > >>>>>>>>>>> x86 is provided by
the following IR.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>>    // ...
> > > >>>>>>>>>>>    ... = call double
@sin(double) #0
> > > >>>>>>>>>>>    // ...
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>>    #0 = {
vector-variant = {"_ZGVcN2v_sin(__svml_sin2),
> > > >>>>>>>>>>>                     
_ZGVdN4v_sin(__svml_sin4),
> > > >>>>>>>>>>>                     
..."} }
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The string
`"_ZGVcN2v_sin(__svml_sin2)"` in this
> > > >>>>>>>>>>> vector-variant
attribute provides information on the
> > > >>>>>>>>>>> shape of the vector
function via the string
> > > >>>>>>>>>>> `_ZGVcN2v_sin`,
mangled according to the Vector Function
> > > >>>>>>>>>>> ABI for Intel, and
remaps the standard Vector Function ABI name to the non-standard name
`__svml_sin2`.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> This metadata is
compatible with the proposal "Proposal
> > > >>>>>>>>>>> for function
vectorization and loop vectorization with
> > > >>>>>>>>>>> function
calls",[^3] that uses Vector Function ABI
> > > >>>>>>>>>>> mangled names to
inform the vectorizer about the
> > > >>>>>>>>>>> availability of
vector functions. The proposal extends
> > > >>>>>>>>>>> the original by
allowing the explicit mapping of the Vector Function ABI mangled name to a
non-standard name, which allows the use of existing vector libraries.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The `vector-variant`
attribute needs to be attached on a
> > > >>>>>>>>>>> per-call basis to
avoid conflicts when merging modules with different vector variants.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The query
infrastructure: SVFS {#infrastructure}
> > > >>>>>>>>>>>
------------------------------
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The Search Vector
Function System (SVFS) is constructed
> > > >>>>>>>>>>> from an
`llvm::Module` instance so it can create
> > > >>>>>>>>>>> function
definitions. The SVFS exposes an API with two methods.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> ###
`SVFS::isFunctionVectorizable`
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> This method queries
the avilability of a vectorized
> > > >>>>>>>>>>> version of a
function. The signature of the method is as follows.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>>    bool
isFunctionVectorizable(llvm::CallInst * Call,
> > > >>>>>>>>>>> ParTypeMap Params);
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The method determine
the availability of vector version
> > > >>>>>>>>>>> of the function
invoked by the `Call` parameter by
> > > >>>>>>>>>>> looking at the
`vector-variant` metadata.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The `Params`
argument is a map that associates the
> > > >>>>>>>>>>> position of a
parameter in the `CallInst` to its
> > > >>>>>>>>>>> `ParameterType`
descriptor. The `ParameterType`
> > > >>>>>>>>>>> descriptor holds
information about the shape of the
> > > >>>>>>>>>>> correspondend
parameter in the signature of the vector
> > > >>>>>>>>>>> function. This
`ParamaterType` is used to query the SVMS
> > > >>>>>>>>>>> about the
availability of vector version that have `linear`, `uniform` or `align`
parameters (in the sense of OpenMP 4.0 and onwards).
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The method
`isFunctionVectorizable`, when invoked with
> > > >>>>>>>>>>> an empty
`ParTypeMap`, is equivalent to the
> > > >>>>>>>>>>> `TargetLibraryInfo`
method `isFunctionVectorizable(StrinRef Name)`.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> ###
`SVFS::getVectorizedFunction`
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> This method returns
the vector function declaration that
> > > >>>>>>>>>>> correspond to the
needs of the vectorization technique that is being run.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The signature of the
function is as follows.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>>   
std::pair<llvm::FunctionType *, std::string> getVectorizedFunction(
> > > >>>>>>>>>>>      llvm::CallInst
* Call, unsigned VF, bool IsMasked,
> > > >>>>>>>>>>> ParTypeSet Params);
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The `Call` parameter
is the call instance that is being
> > > >>>>>>>>>>> vectorized, the `VF`
parameter represent the
> > > >>>>>>>>>>> vectorization factor
(how many lanes), the `IsMasked`
> > > >>>>>>>>>>> parameter decides
whether or not the signature of the
> > > >>>>>>>>>>> vector function is
required to have a mask parameter,
> > > >>>>>>>>>>> the `Params`
parameter describes the shape of the vector function as in the
`isFunctionVectorizable` method.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The methods uses the
`vector-variant` metadata and
> > > >>>>>>>>>>> returns the function
signature and the name of the function based on the input parameters.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The SVFS can add new
function definitions, in the same
> > > >>>>>>>>>>> module as the
`Call`, to provide vector functions that
> > > >>>>>>>>>>> are not present
within the vector-variant metadata. For
> > > >>>>>>>>>>> example, if a
library provides a vector version of a
> > > >>>>>>>>>>> function with a
vectorization factor of 2, but the
> > > >>>>>>>>>>> vectorizer is
requesting a vectorization factor of 4,
> > > >>>>>>>>>>> the SVFS is allowed
to create a definition that calls
> > > >>>>>>>>>>> the 2-lane version
twice. This capability applies similarly for providing masked and unmasked
versions when the request does not match what is available in the library.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> This method is
equivalent to the TLI method `StringRef
> > > >>>>>>>>>>>
getVectorizedFunction(StringRef F, unsigned VF) const;`.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> Notice that to fully
support OpenMP vectorization we
> > > >>>>>>>>>>> need to think about
a fuzzy matching mechanism that is
> > > >>>>>>>>>>> able to select a
candidate in the calling context.
> > > >>>>>>>>>>> However, this
proposal is intended for scalar-to-vector
> > > >>>>>>>>>>> mappings of
math-like functions that are most likely to
> > > >>>>>>>>>>> associate a unique
vector candidate in most contexts.
> > > >>>>>>>>>>> Therefore, extending
this behavior to a generic one is an aspect of the implementation that will be
treated in a separate RFC about the vectorization pass.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> ### Scalable
vectorization
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> Both methods of the
SVFS API will be extended with a
> > > >>>>>>>>>>> boolean parameter to
specify whether scalable signatures
> > > >>>>>>>>>>> are needed by the
user of the SVFS.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> Changes in clang
{#clang}
> > > >>>>>>>>>>> ----------------
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> We use clang to
generate the metadata described above.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> In the compilation
unit, the vector function definition
> > > >>>>>>>>>>> or declaration must
be visible and associated to the
> > > >>>>>>>>>>> scalar version via
the `#pragma clang declare variant`
> > > >>>>>>>>>>> according to the
rule defined by the correspondent
> > > >>>>>>>>>>> `#pragma omp declare
variant` defined in OpenMP 5.0, as in the following example.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>>    #pragma clang
declare variant(vector_sinf) \
> > > >>>>>>>>>>>   
match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})
> > > >>>>>>>>>>>    extern float
sinf(float);
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>>    float32x4_t
vector_sinf(float32x4_t x);
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The `construct` set
in the directive, together with the
> > > >>>>>>>>>>> `device` set, is
used to generate the vector mangled
> > > >>>>>>>>>>> name to be used in
the `vector-variant` attribute, for
> > > >>>>>>>>>>> example
`_ZGVnN2v_sin`, when targeting
> > > >>>>>>>>>>> AArch64 Advanced
SIMD code generation. The rule for
> > > >>>>>>>>>>> mangling the name of
the scalar function in the vector
> > > >>>>>>>>>>> name are defined in
the the Vector Function ABI specification of the target.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The part of the
vector-variant attribute that redirects
> > > >>>>>>>>>>> the call to
`vector_sinf` is derived from the
> > > >>>>>>>>>>> `variant-id`
specified in the `variant` clause.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> Summary
> > > >>>>>>>>>>> ======> > >
>>>>>>>>>>>
> > > >>>>>>>>>>> New `clang`
directive in clang
> > > >>>>>>>>>>>
------------------------------
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> `#pragma omp declare
variant`, same as `#pragma omp
> > > >>>>>>>>>>> declare variant`
restricted to the `simd` context selector, from OpenMP 5.0+.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> Option behavior, and
interaction with OpenMP
> > > >>>>>>>>>>>
--------------------------------------------
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> The behavior
described below makes sure that `#pragma
> > > >>>>>>>>>>> cland declare
variant` function vectorization and OpenMP
> > > >>>>>>>>>>> function
vectorization are orthogonal.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>>
`-fclang-declare-variant`
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> :   The `#pragma
clang declare variant` directives are parsed and used
> > > >>>>>>>>>>>    to populate the
`vector-variant` attribute.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> `-fopenmp[-simd]`
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> :   The `#pragma omp
declare variant` directives are parsed and used to
> > > >>>>>>>>>>>    populate the
`vector-variant` attribute.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> `-fopenmp[-simd]`and
`-fno-clang-declare-variant`
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> :   The directive
`#pragma omp declare variant` is used to populate the
> > > >>>>>>>>>>>    `vector-variant`
attribute in IR. The directive
> > > >>>>>>>>>>>    `#pragma   clang
declare variant` are ignored.
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> [^1]: 
> > > >>>>>>>>>>>
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Sp
> > > >>>>>>>>>>> ec
> > > >>>>>>>>>>> if
> > > >>>>>>>>>>> ication-5.0.pdf>
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> [^2]: Vector
Function ABI for x86:
> > > >>>>>>>>>>>   
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
> > > >>>>>>>>>>>    Vector Function
ABI for AArch64:
> > > >>>>>>>>>>>    
> > > >>>>>>>>>>>
https://developer.arm.com/products/software-development-
> > > >>>>>>>>>>> to ol
s/hpc/arm-compiler-for-hpc/vector-function-abi
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>> [^3]: 
> > > >>>>>>>>>>>
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/0477
> > > >>>>>>>>>>> 32
> > > >>>>>>>>>>> .h
> > > >>>>>>>>>>> tml>
> > > >>>>>>>>>>> 
> > > >>>>>>>>>>>
_______________________________________________
> > > >>>>>>>>>>> LLVM Developers
mailing list llvm-dev at lists.llvm.org
> > > >>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > >>>>>>>>>>
_______________________________________________
> > > >>>>>>>>>> cfe-dev mailing list
> > > >>>>>>>>>> cfe-dev at
lists.llvm.org
> > > >>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> > > >>>>>>> --
> > > >>>>>>> Hal Finkel
> > > >>>>>>> Lead, Compiler Technology and
Programming Languages
> > > >>>>>>> Leadership Computing Facility
Argonne National Laboratory
> > > >>>>>>> 
> > > >>>>>>>
_______________________________________________
> > > >>>>>>> cfe-dev mailing list
> > > >>>>>>> cfe-dev at lists.llvm.org
> > > >>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> > > >>>> 
> > > >>> 
> > > >>> --
> > > >>> 
> > > >>> Johannes Doerfert
> > > >>> Researcher
> > > >>> 
> > > >>> Argonne National Laboratory
> > > >>> Lemont, IL 60439, USA
> > > >>> 
> > > >>> jdoerfert at anl.gov
> > > >> 
> > > > 
> > > > --
> > > > 
> > > > Johannes Doerfert
> > > > Researcher
> > > > 
> > > > Argonne National Laboratory
> > > > Lemont, IL 60439, USA
> > > > 
> > > > jdoerfert at anl.gov
> > > 
> > 
> > --
> > 
> > Johannes Doerfert
> > Researcher
> > 
> > Argonne National Laboratory
> > Lemont, IL 60439, USA
> > 
> > jdoerfert at anl.gov
> 
> -- 
> 
> Johannes Doerfert
> Researcher
> 
> Argonne National Laboratory
> Lemont, IL 60439, USA
> 
> jdoerfert at anl.gov
-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190601/77c729c6/attachment.sig>

Francesco Petrogalli via llvm-dev

2019-Jun-03 17:59 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

Hi All,

The original intend of this thread is to "Expose user provided vector
function for auto-vectorization.”

I originally proposed to use OpenMP `declare variant` for the sake of using
something that is defined by a standard. The RFC itself is not about fully
implementing the `declare variant` directive. In fact, given the amount of
complication it is bringing, I would like to move the discussion away from
`declare variant`. Therefore, I kindly ask to move any further discussion about
`declare variant` to a separate thread.

I believe that to "Expose user provided vector function for
auto-vectorization” we need three components.

1. The main component is the IR representation we want to give to this
information. My proposal is to use the `vector-variant` attribute with custom
symbol redirection.

	vector-variant = {“_ZGVnN2v_f(custon_vector_f_2),
_ZGVnN4v_f(custon_vector_f_4)”}

The names here are made of the Vector Function ABI mangled name, plus custom
symbol redirection in parenthesis. I believe that themes mangled according to
the Vector Function ABI have all the information needed to build the signature
of the vector function and the properties of its parameters (linear, uniform,
aligned…). This format will cover most (if not all) the cases that are needed
for auto-vectorization. I am not aware of any situation in which this
information might not be sufficient. Please provide such an example if you know
of any.

We can attach the IR attribute to call instructions (preferred for avoiding
conflicts when merging modules who don’t see the same attributes) or to function
declaration, or both.

2. The second component is a tool that other parts of LLVM (for example, the
loop vectorizer) can use to query the availability of the vector function, the
SVFS I have described in the original post of the RFC, which is based on
interpreting the `vector-variant` attribute.

The final component is the one that seems to have generated most of the
controversies discussed in the thread, and for which I decided to move away from
`declare variant`.

3. The third component is a set of descriptors that can be attached to the
scalar function declaration / definition in the C/C++ source file, to be able to
inform about the availability of an associated vector functions that can be used
when / if needed.

As someone as suggested, we should use a custom attribute. Because the mangling
scheme of the Vector Function ABI provides all the information about the shape
and properties of the vector function, I propose the approach exemplified in the
following code:


```
// AArch64 Advanced SIMD compilation
double foo(double) __attribute__(simd_variant(“nN2v”,”neon_foo”));
float64x2_t neon_foo(float64x2_t x) {…}

// x86 SSE compilation
double foo(double) __attribute__(simd_variant(“aN2v”,”sse_foo”));
__m128 sse_foo(__m128 x) {…}
```

The attribute would use the “core” tokens of the mangled names (without _ZGV
prefix and the scalar function name postfix) to describe the vector function
provided in the redirection.

Formal syntax:

```
__attribute__(simd_variant(“<isa><mask><VLEN><par_type_list>”,
“custom_vector_name”))

<isa> := “a” (SSE), “b” (AVX) , …, “n” (NEON), “s” (SVE) (from the vector
function ABI specifications of each of the targets that support this, for now
AArch64 and x86)

<mask> := “N” for no mask, or “M” for masking

<VLEN> := number of lanes in a vector | “x” for scalable vectorization
(defined in the AArch64 Vector function ABI).

<part_type_list> := “v” | “l” | … all these tokens are defined in the
Vector Function ABI of the target (which get selected by the <isa>). FWIW,
they are the same for x86 and AArch64.
```

Please let me know what you thing about this proposal. I will rework the
proposal if it makes it easier to follow and submit a new RFC about it, but
before getting into rewriting everything I want to have some feedback on this
change.

Kind regards,

Francesco
> On May 31, 2019, at 8:17 PM, Doerfert, Johannes <jdoerfert at
anl.gov> wrote:
> 
> On 06/01, Saito, Hideki wrote:
>> 
>> Page 22 of OpenMP 5.0 specification (Lines 13/14):
>> 
>> 	When any thread encounters a simd construct, the iterations of the
loop associated with the
>> 	construct may be executed concurrently using the SIMD lanes that are
available to the thread
>> 
>> This is the Execution Model. The word here is "may" i.e., not
"must".
> 
> The question I'm asking is:
> Can the user observe concurrent execution of loop iterations for part of
> the loop while other parts are executed sequentially? I'm aware that
> this is not necessarily practical but it seems good to be aware of
> nevertheless.
> 
>> Declare simd is not explicitly mentioned here, but requiring
>> vectorization in declare simd when the caller simd construct may not
>> be vectorized is odd. 
> 
> My problem is using a declare simd version inside a not vectorized code
> called from a otherwise vectorized simd context. Thinking about it, that
> is actually less theoretical than I thought.
> 
> 
>> Having said that, ICC implementation is "work
>> extremely hard to vectorize", and we are proud of it. Our
>> recommendation is do the same as ICC (i.e., requires a lot of beefing
>> up in LoopVectorize) and I think that's what you are asking. If so,
we
>> are aligned.
> 
> I agree that we should do a better job but I would argue something like
> the region vectorizer (RV) [0] would be a much better starting point.
> Though, that is a different discussion I guess.
> 
> [0] https://github.com/cdl-saarland/rv
> 
> 
>> If anyone strongly goes against that idea (i.e., anyone wanting to keep
OpenMP simd as just an optimization hint for auto-vectorizer), please speak up.
>> 
>>> if you think there is a problem to reuse that for OpenCL/SYCL, let
us know.
>> 
>> Sure.
>> 
>> -----Original Message-----
>> From: Doerfert, Johannes [mailto:jdoerfert at anl.gov] 
>> Sent: Friday, May 31, 2019 4:58 PM
>> To: Saito, Hideki <hideki.saito at intel.com>
>> Cc: Francesco Petrogalli <Francesco.Petrogalli at arm.com>;
Philip Reames <listmail at philipreames.com>; Finkel, Hal J. <hfinkel
at anl.gov>; LLVM Development List <llvm-dev at lists.llvm.org>; nd
<nd at arm.com>; Clang Dev <cfe-dev at lists.llvm.org>; scogland1 at
llnl.gov
>> Subject: Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided vector
function for auto-vectorization.
>> 
>> On 05/31, Saito, Hideki wrote:
>>> 
>>>> Is this also the case if the user did require lock-step
semantic for the code to be correct?
>>> 
>>> Certainly not, but that part is actually beyond OpenMP
specification.
>>> I suggest looking up ICC's "#pragma simd assert"
description and see
>>> if the assert feature is something you may be interested in seeing
as
>>> an extended part of LLVM implementation of OpenMP (declare) simd.
>>> Else, vectorization report would tell you whether it was vectorized
or
>>> not.
>> 
>> Wait, why do you think it is beyond the OpenMP specification? If an
OpenMP variant is picked based on the context the user should be able to assume
the context requirements are fulfilled. If we agree on that, I don't see how
we can do anything else than emitting an error if we optimistically picked a
vector variant but failed to vectorize. I don't think that precludes the
approach you've taken though.
>> 
>>>> How does OpenCL/SYCL play in this now?
>>> 
>>> Not right now, when we are working to get OpenMP stuff going --- 
>>> except that I don't think we need to change the design (e.g.,
on
>>> function attribute, VecClone direction, etc.) in the future for
those
>>> or similar languages.
>> 
>> OK. Whatever is considered here, if you think there is a problem to
reuse that for OpenCL/SYCL, let us know.
>> 
>> 
>>> -----Original Message-----
>>> From: Doerfert, Johannes [mailto:jdoerfert at anl.gov]
>>> Sent: Friday, May 31, 2019 4:16 PM
>>> To: Saito, Hideki <hideki.saito at intel.com>
>>> Cc: Francesco Petrogalli <Francesco.Petrogalli at arm.com>;
Philip Reames
>>> <listmail at philipreames.com>; Finkel, Hal J. <hfinkel at
anl.gov>; LLVM
>>> Development List <llvm-dev at lists.llvm.org>; nd <nd at
arm.com>; Clang Dev
>>> <cfe-dev at lists.llvm.org>; scogland1 at llnl.gov
>>> Subject: Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided vector
function for auto-vectorization.
>>> 
>>> On 05/31, Saito, Hideki wrote:
>>>> 
>>>>> VectorClone does more than just mapping a scalar version to
a vector one. It builds also the vector version definition by auto-vectorizing
the body of the scalar function.
>>>> [...]
>>>> The code is still fully functional w/o LoopVectorize
vectorizing that loop.
>>> 
>>> Is this also the case if the user did require lock-step semantic
for the code to be correct?
>>> 
>>>>> I don’t know if the patches related to VecClone also are
intended to use the `vector-variant` attribute for function declaration with a
#pragma omp declare simd.
>>>> 
>>>> VecClone predated #pragma omp declare variant. So that patches 
>>>> doesn’t know about declare variant. VecClone was written for
handling #pragma omp declare simd, as described above. OpenCL/SYCL kernel is
similar enough to OpenMP declare simd. Most code can be reused.
>>> 
>>> How does OpenCL/SYCL play in this now?
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Francesco Petrogalli [mailto:Francesco.Petrogalli at
arm.com]
>>>> Sent: Friday, May 31, 2019 3:06 PM
>>>> To: Doerfert, Johannes <jdoerfert at anl.gov>
>>>> Cc: Philip Reames <listmail at philipreames.com>; Finkel,
Hal J.
>>>> <hfinkel at anl.gov>; LLVM Development List <llvm-dev
at lists.llvm.org>;
>>>> nd <nd at arm.com>; Saito, Hideki <hideki.saito at
intel.com>; Clang Dev
>>>> <cfe-dev at lists.llvm.org>; scogland1 at llnl.gov
>>>> Subject: Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided
vector function for auto-vectorization.
>>>> 
>>>> 
>>>> 
>>>>> On May 31, 2019, at 2:56 PM, Doerfert, Johannes
<jdoerfert at anl.gov> wrote:
>>>>> 
>>>>> I think I did misunderstand what you want to do with
attributes.
>>>>> This is my bad. Let me try to explain:
>>>>> 
>>>>> It seems you want the "vector-variants"
attributes (which I could
>>>>> not find with this name in trunk, correct?) to
"remember" what
>>>>> vector versions can be created (wrt. validity), assuming a 
>>>>> definition is available? Correct?
>>>> 
>>>> Yes.
>>>> 
>>>>> What I was concerned with is the example I sketched
somewhere
>>>>> below which motivates the need for a
generalized/standardized name
>>>>> mangling for OpenMP. I though you wanted to avoid that
somehow but
>>>>> if you don't I misunderstood you. I basically removed
the part
>>>>> where the vector versions have to be created first but I
assumed
>>>>> them to be existent (in the module or somewhere else). That
is, I
>>>>> assumed a call to foo and various symbols available that
are
>>>>> specializations of foo. When we then vectorize foo (or
otherwise
>>>>> specialize at some point in the future), you would scan the
module
>>>>> and pick the best match based on the context of the call.
>>>>> 
>>>> 
>>>> Yes, although the syntax you use below is wrong. Declare
variant is attached to the scalar definition, and points to a vector definitions
(the variant) that is declared/defined in the same compilation unit where the
scalar version is visible.
>>>> 
>>>> 
>>>>> Now I don't know if I understood your proposal by now
but let me
>>>>> ask a question anyway:
>>>>> 
>>>>> VecClone.cpp:276-278 mentions that the vectorizer is
supposed to
>>>>> look at the vector-variants functions. This works for
variants
>>>>> that are created from definitions in the module but what
about
>>>>> #omp declare simd declarations?
>>>>> 
>>>> 
>>>> VectorClone does more than just mapping a scalar version to a
vector one. It builds also the vector version definition by auto-vectorizing the
body of the scalar function.
>>>> 
>>>> I don’t know if the patches related to VecClone also are
intended to use the `vector-variant` attribute for function declaration with a
#pragma omp declare simd. On aarch64, in Arm compiler for HPC, we do that to
support vector math libraries. It works in principle, but `vector variant`
allows more context selection (and custom names instead of vector ABI names,
which are easier for users).
>>>> 
>>>> 
>>>>> 
>>>>> On 05/31, Francesco Petrogalli wrote:
>>>>>>> On May 31, 2019, at 11:47 AM, Doerfert, Johannes
<jdoerfert at anl.gov> wrote:
>>>>>>> 
>>>>>>> I think we should split this discussion:
>>>>>>> TOPIC 1 & 2 & 4: How do implement all use
cases and OpenMP 5.X
>>>>>>>                 features, including compatibility
with other
>>>>>>>                 compilers and cross module support.
>>>>>> 
>>>>>> Yes, and we have to carefully make this as standard and
compatible as possible.
>>>>> 
>>>>> Agreed.
>>>>> 
>>>>> 
>>>>>>> TOPIC 3b & 5: Interoperability with clang
declare (system vs. user
>>>>>>>               declares)
>>>>>> 
>>>>>> 
>>>>>> I think that Alexey explanation of how the directive
are handled
>>>>>> internally in the frontend makes us propound towards
the attribute.
>>>>> 
>>>>> How things are handled right now, especially given that
declare
>>>>> variant is not handled at all, should not limit our design
space.
>>>>> If the argument is that we cannot reasonably implement a
solution,
>>>>> that is a different story.
>>>>> 
>>>>> 
>>>>>>> TOPIC 3a & 3c: floating point issues?
>>>>>>> 
>>>>>> 
>>>>>> I believe there is no issue there. I have quoted the
openMP standard in reply to Renato:
>>>>>> 
>>>>>> See
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page
118, lines 23-24:
>>>>>> 
>>>>>> “The execution of the function or subroutine cannot
have any side
>>>>>> effects that would alter its execution for concurrent
iterations
>>>>>> of a SIMD chunk."
>>>>> 
>>>>> Great.
>>>>> 
>>>>> 
>>>>>>> I inlined comments for Topic 1 below.
>>>>>>> 
>>>>>>> I hope that we do not have to discuss topic 2 if we
agree
>>>>>>> neither attributes nor metadata is necessary, or
better, will
>>>>>>> solve the actual problem at hand. I don't have
strong feeling on
>>>>>>> topic 4 but I have the feeling this will become
less problematic once we figure out topic 1.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Johannes
>>>>>>> 
>>>>>>> 
>>>>>>> On 05/31, Francesco Petrogalli wrote:
>>>>>>>> # TOPIC 1: concerns about name mangling
>>>>>>>> 
>>>>>>>> I understand that there are concerns in using
the mangling
>>>>>>>> scheme I proposed, and that it would be
preferred to have a
>>>>>>>> mangling scheme that is based on (and
standardized by) OpenMP.
>>>>>>> 
>>>>>>> I still think it will be required to have a
standardized one,
>>>>>>> not only preferred.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> I am all with you in standardizing. x86 and arch64 have
their own
>>>>>> vector function ABI, which, although “private”, are to
be
>>>>>> considered standard. Opensource and commercial
compilers are
>>>>>> using them, therefore we have to deal with this
mangling scheme,
>>>>>> whether or not OpenMP comes up with a standard mangling
scheme.
>>>>> 
>>>>> I don't get the point you are trying to make here. What
do you
>>>>> mean by "we have to deal with"? (I do not suggest
to get rid of
>>>>> them.)
>>>>> 
>>>> 
>>>> That we cannot ignore the fact that the name scheme is already
standardized by the vendors, so let’s first deal with what we have, and think
about the OpenMP mangling scheme only once there is one available.
>>>> 
>>>>> 
>>>>>>>> I hear the argument on having some common
ground here. In fact,
>>>>>>>> there is already common ground between the x86
and aarch64
>>>>>>>> backend, who have based their respective Vector
Function ABI specifications on OpenMP.
>>>>>>>> 
>>>>>>>> In fact, the mangled name grammar can be
summarized as follows:
>>>>>>>> 
>>>>>>>>
_ZGV<isa><masking><VLEN><parameter type>_<scalar
name>
>>>>>>>> 
>>>>>>>> Across vector extensions the only <token>
that will differ is
>>>>>>>> the <isa> token.
>>>>>>>> 
>>>>>>>> This might lead people to think that we could
drop the
>>>>>>>> _ZGV<isa> prefix and consider the
<masking><VLEN><parameter
>>>>>>>> type>_<scalar
>>>>>>>> name> part as a sort of unofficial OpenMP
mangling scheme: in
>>>>>>>> name> fact,
>>>>>>>> the signature of an “unmasked 2-lane vector
vector of `sin`”
>>>>>>>> will always be `<2 x double>(2 x
double>).
>>>>>>>> 
>>>>>>>> The problem with this choice is the number of
vector version
>>>>>>>> available for a target is not unique.
>>>>>>> 
>>>>>>> For me, this simply means this mangling scheme is
not sufficient.
>>>>>>> 
>>>>>> 
>>>>>> Can you explain more why you think the mangling scheme
is not
>>>>>> sufficient? The mangling scheme is shaped to provide
all the
>>>>>> information that the OpenMP directive describes.
>>>>> 
>>>>> I don't know if it is insufficient but I though you
hinted towards that.
>>>> 
>>>> I didn’t mean that, the tokens in the vector function ABI
mangled schemes are sufficient.
>>>> 
>>>>> If we can handle/decode everything we need for declare
variants
>>>>> then I do not object at all. If not, we require respective 
>>>>> extension such that we can. The result should be a superset
of the
>>>>> current SIMD encoding and compatible with the current one.
>>>>> 
>>>>> 
>>>> 
>>>> We can handle/decode everything for a SIMD context. :)
>>>> 
>>>> 
>>>>> 
>>>>>> The fact that x86 and aarch64 realize such information
in
>>>>>> different way (multiple signature/vector extensions) is
something
>>>>>> that cannot be avoided, because it is related to
architectural
>>>>>> aspects that are specific to the vector extension and
transparent
>>>>>> to the OpenMP standard.
>>>>> 
>>>>> I don't think that is a problem (that's why I
"failed to see the
>>>>> problem" in the comment below). I look at it this way:
If #declare
>>>>> simd, or similar, results in N variants, it should at the
end of
>>>>> the day not be different from declaring these N variants 
>>>>> explicitly with the respective declare variant match
clause.
>>>>> 
>>>> 
>>>> That’s not the case. #declare simd should create all the
versions that are optimal for the target. We carefully thoght about that when
writing the vector function ABI. Most of the constrains derive by the fact that
each target has a specific register size.
>>>> 
>>>> Example:
>>>> 
>>>> #pragma omp declare simd
>>>> Float foo(float);
>>>> 
>>>> X86 -> 8 version {2, 4, 8, 16 lanes} x {masking, no
masking}, see
>>>> https://godbolt.org/z/m1BUVt Arm NEON: -> 4 versions {2, 4
lanes} x
>>>> {masking, no masking } Arm SVE: -> 1 version
>>>> 
>>>> Therefore, the outcome of declare simd is not target
independent. Your expectation are met only inside one target.
>>>> 
>>>> 
>>>>> 
>>>>>>>> In particular, the following declaration
generates multiple
>>>>>>>> vector versions, depending on the target:
>>>>>>>> 
>>>>>>>> #pragma omp declare simd simdlen(2) notinbranch
double
>>>>>>>> foo(double) {…};
>>>>>>>> 
>>>>>>>> On x86, this generates at least 4 symbols (one
for SSE, one for
>>>>>>>> AVX, one for AVX2, and one for AVX512:
>>>>>>>> https://godbolt.org/z/TLYXPi)
>>>>>>>> 
>>>>>>>> On aarch64, the same declaration generates a
unique symbol, as
>>>>>>>> specified in the Vector Function ABI.
>>>>>>> 
>>>>>>> I fail to see the problem. We generate X symbols
for X different
>>>>>>> contexts. Once we get to the point where we
vectorize, we
>>>>>>> determine which context fits best and choose the
corresponding symbol version.
>>>>>>> 
>>>>>> 
>>>>>> Yes, this is exactly what we need to do, under the
constrains
>>>>>> that the rules for  generating "X symbols for X
different
>>>>>> contexts” are decided by the Vector Function ABI of the
target.
>>>>> 
>>>>> Sounds good. The vector ABI is used to determine what
contexts
>>>>> exists and what symbols should be created. I would assume
the
>>>>> encoding should be the same as if we specified the versions
>>>>> (/contexts) ourselves via #declare variant.
>>>>> 
>>>> 
>>>> Oh yes, vector functions listed in a declare variant should
obey the vector function ABI rules (other than the function name).
>>>> 
>>>>> 
>>>>>>> Maybe my view is to naive here, please feel free to
correct me.
>>>>>>> 
>>>>>>> 
>>>>>>>> This means that the attribute (or metadata)
that carries the
>>>>>>>> information on the available vector version
needs to deal also
>>>>>>>> with things that are not usually visible at IR
level, but that
>>>>>>>> might still need to be provided to be able to
decide which
>>>>>>>> particular instruction set/ vector extension
needs to be targeted.
>>>>>>> 
>>>>>>> The symbol names should carry all the information
we need. If
>>>>>>> they do not, we need to improve the mangling scheme
such that they do.
>>>>>>> There is no attributes/metadata we could use at
library boundaries.
>>>>>>> 
>>>>>> Hum, I am not sure what you mean by "There is no 
>>>>>> attributes/metadata we could use at library
boundaries."
>>>>> 
>>>>> (This seems to be part of the misunderstanding, I leave my
comment
>>>>> here
>>>>> anyway:)
>>>>> 
>>>>> The simd-related stuff works because it is a uniform
mangling
>>>>> scheme used by all compilers. Take the situation below in
which I
>>>>> think we want to call foo_CTX in the library. If so, we
need a name for it.
>>>>> 
>>>> 
>>>> In the situation below, the mangled name is going to be the
same for both compilers, as long as they adhere to the vector function ABI.
>>>> 
>>>>> 
>>>>> a.c:  // Compiled by gcc into a library #omp declare
variant (foo)
>>>>> match(CTX) void foo_CTX(...) {...}
>>>>> 
>>>>> b.c:  // Compiled by clang linked against the library
above.
>>>>> #omp declare variant (foo) match(CTX) void foo_CTX(...);
>>>>> 
>>>>> void bar(...) {
>>>>> #pragma omp CTX
>>>>> foo();   // <- What function (symbol) do we call if a.c
was compiled
>>>>>          //    by gcc and b.c with clang?
>>>>> }
>>>>> 
>>>> 
>>>> Please notice that `declare variant` needs to be attached to
the scalar function, not the vector one.
>>>> 
>>>> ```
>>>> #pragma omp declare variant(foo_CTX) match (context=simd…
double foo
>>>> (double) {…}
>>>> 
>>>> Vector_double_ty foo_CTX(vector_double_ty) {…} ```
>>>> 
>>>> In vectorizing foo in bar, the compiler will not care where
foo_CTX would come from (of course, as long as the scalar+declare variant
declarations are visible).
>>>> 
>>>>>> In our downstream compiler (Arm compiler for HPC, based
on LLVM),
>>>>>> we use `declare simd` to provide vector math functions
via custom
>>>>>> header file. It works brilliantly, if not for specific
aspects
>>>>>> that would be perfectly covered by the `declare
variant`, which
>>>>>> might be one of the reason why the OpenMP committee
decided to
>>>>>> introduce `declare variant`.
>>>>> 
>>>>> But you (assume that you) control the mangling scheme
across the
>>>>> entire infrastructure. Given that the simd mangling is
de-facto
>>>>> standardized, that works.
>>>>> 
>>>>> Side note:
>>>>> Declare variant, as of 5.0, is not flexible enough for a
sensible
>>>>> inclusion of target specific headers. That will change in
5.1.
>>>>> 
>>>> 
>>>> Could you point me at the discussion in 5.1 on this specific
aspect?
>>>> 
>>>> 
>>>>> 
>>>>>> If your concerns is that by adding an attribute that
somehow
>>>>>> represent something that is available in an external
library is
>>>>>> not enough to guarantee that that symbol is available
in the
>>>>>> library… not even C code can guarantee that? If the
linker is not
>>>>>> pointing to the right library, there is nothing that
can prevent
>>>>>> it to fail if the symbol is not present?
>>>>> 
>>>>> I don't follow the example you describe. I don't
want to change
>>>>> anything in how symbols are looked up or what happens if
they are missing.
>>>>> 
>>>>> 
>>>> 
>>>> I don’t want to change that too :). I think we are
misunderstanding each other here...
>>>> 
>>>>>>>> I used an example based on `declare simd`
instead of `declare
>>>>>>>> variant` because the attribute/metadata needed
for `declare
>>>>>>>> variant` is a modification of the one needed
for `declare
>>>>>>>> simd`, which has already been agreed in a
previous RFC proposed
>>>>>>>> by Intel [1], and for which Intel has already
provided an
>>>>>>>> implementation [2]. The changes proposed in
this RFC are fully
>>>>>>>> compatible with the work that is being don for
the VecClone pass in [2].
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>>
http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
>>>>>>>> [2] VecCLone pass:
https://reviews.llvm.org/D22792
>>>>>>> 
>>>>>>> Having an agreed upon mangling for the older
feature is not
>>>>>>> necessarily important here. We will need more
functionality for
>>>>>>> variants and keeping the old scheme around with
some metadata is
>>>>>>> not an extensible long-term solution. So, I would
not try to fit
>>>>>>> variants into the existing simd-scheme but instead
do it the
>>>>>>> other way around. We define what we need for
variants and implement simd in that scheme.
>>>>>>> 
>>>>>> 
>>>>>> I kinda think that having agreed on something is
important. It
>>>>>> allows to build other things on top of what have been
agreed
>>>>>> without breaking compatibility.
>>>>>> 
>>>>>> On the specific, which are the new functionalities
needed for the
>>>>>> variants that would make the current metadata
(attributes) for
>>>>>> declare simd non extensible?
>>>>> 
>>>>> See first comment.
>>>>> 
>>>>>>>> The good news is that as far as AArch64 and x86
are concerned, the only thing that will differ in the mangled name is the
“<isa>” token. As far as I can tell, the mangling scheme of the rest of
the vector name is the same, therefore a lot of infrastructure in terms of
mangling and demangling can be reused. In fact, the `mangleVectorParameters`
function in
https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could
already be shared among x86 and aarch64.
>>>>>>>> 
>>>>>>>> TOPIC 2: metadata vs attribute
>>>>>>>> 
>>>>>>>> From a functionality point of view, I don’t
care whether we use metadata or attributes. The VecClone pass mentioned in TOPIC
1 uses the following:
>>>>>>>> 
>>>>>>>> attributes #0 = { nounwind uwtable 
>>>>>>>>
“vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8v
>>>>>>>> v_
>>>>>>>> ve
>>>>>>>>
c_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZG
>>>>>>>> Ve
>>>>>>>> M1
>>>>>>>> 6vv_vec_sum,_ZGVeN16”}
>>>>>>>> 
>>>>>>>> This is an attribute (I though it was
metadata?), I am happy to reword the RFC using the right terminology (sorry for
messing this up).
>>>>>>>> 
>>>>>>>> Also, @Renato expressed concern that metadata
might be dropped by optimization passes - would using attributes prevent that?
>>>>>>>> 
>>>>>>>> TOPIC 3: "There is no way to notify the
backend how conformant the SIMD versions are.”
>>>>>>>> 
>>>>>>>> @Shawn, I am afraid I don’t understand what you
mean by “conformant” here. Can you elaborate with an example?
>>>>>>>> 
>>>>>>>> TOPIC 3: interaction of the `omp declare
variant` with `clang
>>>>>>>> declare variant`
>>>>>>>> 
>>>>>>>> I believe this is described in the `Option
behavior, and interaction with OpenMP`. The option `-fclang-declare-variant` is
there to make the OpenMP based one orthogonal. Of course, we might decide to
make -fclang-declare-variant on/off by default, and have default behavior when
interacting with -fopenmp-simd. For the sake of compatibility with other
compilers, we might need to require -fno-clang-declare-variant when targeting
-fopenmp-[simd].
>>>>>>>> 
>>>>>>>> TOPIC 3: "there are no special arguments /
flags / status regs that are used / changed in the vector version that the
compiler will have to "just know”
>>>>>>>> 
>>>>>>>> I believe that this concern is raised by the
problem of handling FP exceptions? If that’s the case, the compiler is not
allowed to do any assumption on the vector function about that, and treat it
with the same knowledge of any other function, depending on the visibility it
has in the compilation unit. @Renato, does this answer your question?
>>>>>>>> 
>>>>>>>> TOPIC 4: attribute in function declaration vs
attribute
>>>>>>>> function call site
>>>>>>>> 
>>>>>>>> We discussed this in the previous version of
the proposal. Having it in the call sites guarantees that incompatible vector
version are used when merging modules compiled for different targets. I don’t
have a use case for this, if I remember correctly this was asked by @Hideki
Saito. Hideki, any comment on this?
>>>>>>>> 
>>>>>>>> TOPIC 5: overriding system header (the
discussion on #pragma omp/clang/system variants initiated by @Hal Finkel).
>>>>>>>> 
>>>>>>>> I though that the split among #pragma clang
declare variant and #pragma omp declare variant was already providing the
orthogonality between system header and user header. Meaning that a user should
always prefer the omp version (for portability to other compilers) instead of
the #pragma clang one, which would be relegated to system headers and headers
provided by the compiler. Am I missing something? If so, I am happy to add a
“system” version of the directive, as it would be quite easy to do given most of
the parsing infrastructure will be shared.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On May 30, 2019, at 12:53 PM, Philip Reames
<listmail at philipreames.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 5/30/19 9:05 AM, Doerfert, Johannes
wrote:
>>>>>>>>>> On 05/29, Finkel, Hal J. via cfe-dev
wrote:
>>>>>>>>>>> On 5/29/19 1:52 PM, Philip Reames
wrote:
>>>>>>>>>>>> On 5/28/19 7:55 PM, Finkel, Hal
J. wrote:
>>>>>>>>>>>>> On 5/28/19 3:31 PM, Philip
Reames via cfe-dev wrote:
>>>>>>>>>>>>>> I generally like the
idea of having support in IR for
>>>>>>>>>>>>>> vectorization of custom
functions.  I have several use cases which would benefit from this.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'd suggest a
couple of reframings to the IR representation though.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> First, this should
probably be specified as
>>>>>>>>>>>>>> metadata/attribute on a
function declaration.  Allowing
>>>>>>>>>>>>>> the callsite variant is
fine, but it should primarily be
>>>>>>>>>>>>>> a property of the
called function, not of the call site.  Being able to specify it once per
declaration is much cleaner.
>>>>>>>>>>>>> I agree. We should support
this both on the function
>>>>>>>>>>>>> declaration and on the call
sites.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Second, I really
don't like the mangling use here.  We
>>>>>>>>>>>>>> need a better way to
specify the properties of the
>>>>>>>>>>>>>> function then it's
mangled name.  One thought to explore
>>>>>>>>>>>>>> is to directly use the
Value of the function declaration
>>>>>>>>>>>>>> (since this is metadata
and we can do that), and then tie
>>>>>>>>>>>>>> the properties to the
function declaration in some way?  Sorry, I don't really have a specific
suggestion here.
>>>>>>>>>>>>> Is the problem the mangling
or the fact that the mangling
>>>>>>>>>>>>> is ABI/target-specific? One
option is to use LLVM's
>>>>>>>>>>>>> mangling scheme (the one we
use for intrinsics) and then
>>>>>>>>>>>>> provide some backend
infrastructure to translate later.
>>>>>>>>>>>> Well, both honestly.  But
mangling with a non-target specific scheme is
>>>>>>>>>>>> a lot better, so I might be
okay with that.   Good idea.
>>>>>>>>>>> 
>>>>>>>>>>> I liked your idea of directly
encoding the signature in the
>>>>>>>>>>> metadata, but I think that we want
to continue to use
>>>>>>>>>>> attributes, and not metadata, and
the options for attributes
>>>>>>>>>>> seem more limited - unless we allow
attributes to take
>>>>>>>>>>> metadata arguments - maybe
that's an enhancement worth considering.
>>>>>>>>>> I recently talked to people in the
OpenMP language committee
>>>>>>>>>> meeting about this and, thinking
forward to the actual
>>>>>>>>>> implementation/use of the OpenMP 5.x
declare variant feature, I'd say:
>>>>>>>>>> 
>>>>>>>>>> - We will need a mangling scheme if we
want to allow variants
>>>>>>>>>> on declarations that are defined
elsewhere.
>>>>>>>>>> - We will need a (OpenMP) standardized
mangling scheme if we
>>>>>>>>>> want interoperability between
compilers.
>>>>>>>>>> 
>>>>>>>>>> I assume we want both so I think we
will need both.
>>>>>>>>> If I'm reading this correctly, this
describes a need for the
>>>>>>>>> frontend to have a mangling scheme. 
Nothing in here would
>>>>>>>>> seem to prevent the frontend for generating
a declaration for
>>>>>>>>> a mangled external symbol and then
referencing that declaration.  Am I missing something?
>>>>>>>>>> 
>>>>>>>>>> That said, I think this should allow us
to avoid
>>>>>>>>>> attributes/metadata which seems to me
like a good thing right now.
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> Johannes
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>>>>> On 5/28/19 12:44 PM,
Francesco Petrogalli via llvm-dev wrote:
>>>>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This RFC is a
proposal to provide auto-vectorization functionality for user provided vector
functions.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The proposal is a
modification of an RFC that I have sent out a couple of months ago, with the
title `[RFC] Re-implementing -fveclib with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The original RFC
was proposing to re-implement the `-fveclib` command line option. This proposal
avoids that, and limits its scope to the mechanics of providing vector function
in user code that the compiler can pick up for auto-vectorization. This narrower
scope limits the impact of changes that are needed in both clang and LLVM.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Please let me know
what you think.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Francesco
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>
=======================================================>>>>>>>>>>>>>>>
=>>>>>>>>>>>>>>>
=>>>>>>>>>>>>>>>
====================>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Introduction
>>>>>>>>>>>>>>>
===========>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This RFC
encompasses the proposal of informing the
>>>>>>>>>>>>>>> vectorizer about
the availability of vector functions
>>>>>>>>>>>>>>> provided by the
user. The mechanism is based on the use
>>>>>>>>>>>>>>> of the directive
`declare variant` introduced in OpenMP
>>>>>>>>>>>>>>> 5.0 [^1].
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The mechanism
proposed has the following properties:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 1.  Decouples the
compiler front-end that knows about the availability
>>>>>>>>>>>>>>>   of vectorized
routines, from the back-end that knows how to make use
>>>>>>>>>>>>>>>   of them.
>>>>>>>>>>>>>>> 2.  Enable support
for a developer's own vector libraries without
>>>>>>>>>>>>>>>   requiring changes
to the compiler.
>>>>>>>>>>>>>>> 3.  Enables other
frontends (e.g. f18) to add scalar-to-vector function
>>>>>>>>>>>>>>>   mappings as
relevant for their own runtime libraries, etc.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The implemetation
consists of two separate sets of changes.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The first set is a
set o changes in `llvm`, and consists of:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 1.  [Changes in
LLVM IR](#llvmIR) to provide information about the
>>>>>>>>>>>>>>>   availability of
user-defined vector functions via metadata attached
>>>>>>>>>>>>>>>   to an
`llvm::CallInst`.
>>>>>>>>>>>>>>> 2.  [An
infrastructure](#infrastructure) that can be queried to retrive
>>>>>>>>>>>>>>>   information about
the available vector functions associated to a
>>>>>>>>>>>>>>>   `llvm::CallInst`.
>>>>>>>>>>>>>>> 3.  [Changes in the
LoopVectorizer](#LV) to use the API to query the
>>>>>>>>>>>>>>>   metadata.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The second set
consists of the changes [changes in
>>>>>>>>>>>>>>> clang](#clang) that
are needed too to recognize the
>>>>>>>>>>>>>>> `#pragma clang
declare variant` directive.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Proposed changes
>>>>>>>>>>>>>>>
===============>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We propose an
implementation that uses `#pragma clang
>>>>>>>>>>>>>>> declare variant` to
inform the backend components about
>>>>>>>>>>>>>>> the availability of
vector version of scalar functions
>>>>>>>>>>>>>>> found in IR. The
mechanism relies in storing such
>>>>>>>>>>>>>>> information in IR
metadata, and therefore makes the
>>>>>>>>>>>>>>> auto-vectorization
of function calls a mid-end (`opt`) process that is independent on the front-end
that generated such IR metadata.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This implementation
provides a generic mechanism that
>>>>>>>>>>>>>>> the users of the
LLVM compiler will be able to use for
>>>>>>>>>>>>>>> interfacing their
own vector routines for generic code.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The implementation
can also expose
>>>>>>>>>>>>>>>
vectorization-specific descriptors -- for example, like
>>>>>>>>>>>>>>> the `linear` and
`uniform` clauses of the OpenMP
>>>>>>>>>>>>>>> `declare simd`
directive
>>>>>>>>>>>>>>> -- that could be
used to finely tune the automatic
>>>>>>>>>>>>>>> vectorization of
some functions (think for example the
>>>>>>>>>>>>>>> vectorization of
`double sincos(double , double *,
>>>>>>>>>>>>>>> double *)`, where
`linear` can be used to give extra information about the memory layout of the 2
pointers parameters in the vector version).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The directive
`#pragma clang declare variant` follows
>>>>>>>>>>>>>>> the syntax of the
`#pragma omp declare variant` directive of OpenMP.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> We define the new
directive in the `clang` namespace
>>>>>>>>>>>>>>> instead of using
the `omp` one of OpenMP to allow the
>>>>>>>>>>>>>>> compiler to perform
auto-vectorization outside of an OpenMP SIMD context.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The mechanism is
base on OpenMP to provide a uniform
>>>>>>>>>>>>>>> user experience
across the two mechanism, and to
>>>>>>>>>>>>>>> maximise the number
of shared components of the
>>>>>>>>>>>>>>> infrastructure
needed in the compiler frontend to enable the feature.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Changes in LLVM IR
{#llvmIR}
>>>>>>>>>>>>>>> ------------------
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The IR is enriched
with metadata that details the
>>>>>>>>>>>>>>> availability of
vector versions of an associated scalar
>>>>>>>>>>>>>>> function. This
metadata is attached to the call site of the scalar function.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The metadata takes
the form of an attribute containing a
>>>>>>>>>>>>>>> comma separated
list of vector function mappings. Each
>>>>>>>>>>>>>>> entry has a unique
name that follows the Vector Function
>>>>>>>>>>>>>>> ABI[^2] and real
name that is used when generating calls to this vector function.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>  
vfunc_name1(real_name1), vfunc_name2(real_name2)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The Vector Function
ABI name describes the signature of
>>>>>>>>>>>>>>> the vector function
so that properties like
>>>>>>>>>>>>>>> vectorisation
factor can be queried during compilation.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The `(real name)`
token is optional and assumed to match
>>>>>>>>>>>>>>> the Vector Function
ABI name when omitted.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> For example, the
availability of a 2-lane double
>>>>>>>>>>>>>>> precision `sin`
function via SVML when targeting AVX on
>>>>>>>>>>>>>>> x86 is provided by
the following IR.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>   // ...
>>>>>>>>>>>>>>>   ... = call double
@sin(double) #0
>>>>>>>>>>>>>>>   // ...
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>   #0 = {
vector-variant = {"_ZGVcN2v_sin(__svml_sin2),
>>>>>>>>>>>>>>>                    
_ZGVdN4v_sin(__svml_sin4),
>>>>>>>>>>>>>>>                    
..."} }
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The string
`"_ZGVcN2v_sin(__svml_sin2)"` in this
>>>>>>>>>>>>>>> vector-variant
attribute provides information on the
>>>>>>>>>>>>>>> shape of the vector
function via the string
>>>>>>>>>>>>>>> `_ZGVcN2v_sin`,
mangled according to the Vector Function
>>>>>>>>>>>>>>> ABI for Intel, and
remaps the standard Vector Function ABI name to the non-standard name
`__svml_sin2`.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This metadata is
compatible with the proposal "Proposal
>>>>>>>>>>>>>>> for function
vectorization and loop vectorization with
>>>>>>>>>>>>>>> function
calls",[^3] that uses Vector Function ABI
>>>>>>>>>>>>>>> mangled names to
inform the vectorizer about the
>>>>>>>>>>>>>>> availability of
vector functions. The proposal extends
>>>>>>>>>>>>>>> the original by
allowing the explicit mapping of the Vector Function ABI mangled name to a
non-standard name, which allows the use of existing vector libraries.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The
`vector-variant` attribute needs to be attached on a
>>>>>>>>>>>>>>> per-call basis to
avoid conflicts when merging modules with different vector variants.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The query
infrastructure: SVFS {#infrastructure}
>>>>>>>>>>>>>>>
------------------------------
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The Search Vector
Function System (SVFS) is constructed
>>>>>>>>>>>>>>> from an
`llvm::Module` instance so it can create
>>>>>>>>>>>>>>> function
definitions. The SVFS exposes an API with two methods.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> ###
`SVFS::isFunctionVectorizable`
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This method queries
the avilability of a vectorized
>>>>>>>>>>>>>>> version of a
function. The signature of the method is as follows.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>   bool
isFunctionVectorizable(llvm::CallInst * Call,
>>>>>>>>>>>>>>> ParTypeMap Params);
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The method
determine the availability of vector version
>>>>>>>>>>>>>>> of the function
invoked by the `Call` parameter by
>>>>>>>>>>>>>>> looking at the
`vector-variant` metadata.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The `Params`
argument is a map that associates the
>>>>>>>>>>>>>>> position of a
parameter in the `CallInst` to its
>>>>>>>>>>>>>>> `ParameterType`
descriptor. The `ParameterType`
>>>>>>>>>>>>>>> descriptor holds
information about the shape of the
>>>>>>>>>>>>>>> correspondend
parameter in the signature of the vector
>>>>>>>>>>>>>>> function. This
`ParamaterType` is used to query the SVMS
>>>>>>>>>>>>>>> about the
availability of vector version that have `linear`, `uniform` or `align`
parameters (in the sense of OpenMP 4.0 and onwards).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The method
`isFunctionVectorizable`, when invoked with
>>>>>>>>>>>>>>> an empty
`ParTypeMap`, is equivalent to the
>>>>>>>>>>>>>>> `TargetLibraryInfo`
method `isFunctionVectorizable(StrinRef Name)`.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> ###
`SVFS::getVectorizedFunction`
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This method returns
the vector function declaration that
>>>>>>>>>>>>>>> correspond to the
needs of the vectorization technique that is being run.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The signature of
the function is as follows.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>  
std::pair<llvm::FunctionType *, std::string> getVectorizedFunction(
>>>>>>>>>>>>>>>     llvm::CallInst
* Call, unsigned VF, bool IsMasked,
>>>>>>>>>>>>>>> ParTypeSet Params);
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The `Call`
parameter is the call instance that is being
>>>>>>>>>>>>>>> vectorized, the
`VF` parameter represent the
>>>>>>>>>>>>>>> vectorization
factor (how many lanes), the `IsMasked`
>>>>>>>>>>>>>>> parameter decides
whether or not the signature of the
>>>>>>>>>>>>>>> vector function is
required to have a mask parameter,
>>>>>>>>>>>>>>> the `Params`
parameter describes the shape of the vector function as in the
`isFunctionVectorizable` method.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The methods uses
the `vector-variant` metadata and
>>>>>>>>>>>>>>> returns the
function signature and the name of the function based on the input parameters.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The SVFS can add
new function definitions, in the same
>>>>>>>>>>>>>>> module as the
`Call`, to provide vector functions that
>>>>>>>>>>>>>>> are not present
within the vector-variant metadata. For
>>>>>>>>>>>>>>> example, if a
library provides a vector version of a
>>>>>>>>>>>>>>> function with a
vectorization factor of 2, but the
>>>>>>>>>>>>>>> vectorizer is
requesting a vectorization factor of 4,
>>>>>>>>>>>>>>> the SVFS is allowed
to create a definition that calls
>>>>>>>>>>>>>>> the 2-lane version
twice. This capability applies similarly for providing masked and unmasked
versions when the request does not match what is available in the library.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This method is
equivalent to the TLI method `StringRef
>>>>>>>>>>>>>>>
getVectorizedFunction(StringRef F, unsigned VF) const;`.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Notice that to
fully support OpenMP vectorization we
>>>>>>>>>>>>>>> need to think about
a fuzzy matching mechanism that is
>>>>>>>>>>>>>>> able to select a
candidate in the calling context.
>>>>>>>>>>>>>>> However, this
proposal is intended for scalar-to-vector
>>>>>>>>>>>>>>> mappings of
math-like functions that are most likely to
>>>>>>>>>>>>>>> associate a unique
vector candidate in most contexts.
>>>>>>>>>>>>>>> Therefore,
extending this behavior to a generic one is an aspect of the implementation that
will be treated in a separate RFC about the vectorization pass.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> ### Scalable
vectorization
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Both methods of the
SVFS API will be extended with a
>>>>>>>>>>>>>>> boolean parameter
to specify whether scalable signatures
>>>>>>>>>>>>>>> are needed by the
user of the SVFS.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Changes in clang
{#clang}
>>>>>>>>>>>>>>> ----------------
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> We use clang to
generate the metadata described above.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> In the compilation
unit, the vector function definition
>>>>>>>>>>>>>>> or declaration must
be visible and associated to the
>>>>>>>>>>>>>>> scalar version via
the `#pragma clang declare variant`
>>>>>>>>>>>>>>> according to the
rule defined by the correspondent
>>>>>>>>>>>>>>> `#pragma omp
declare variant` defined in OpenMP 5.0, as in the following example.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>   #pragma clang
declare variant(vector_sinf) \
>>>>>>>>>>>>>>>  
match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})
>>>>>>>>>>>>>>>   extern float
sinf(float);
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>   float32x4_t
vector_sinf(float32x4_t x);
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The `construct` set
in the directive, together with the
>>>>>>>>>>>>>>> `device` set, is
used to generate the vector mangled
>>>>>>>>>>>>>>> name to be used in
the `vector-variant` attribute, for
>>>>>>>>>>>>>>> example
`_ZGVnN2v_sin`, when targeting
>>>>>>>>>>>>>>> AArch64 Advanced
SIMD code generation. The rule for
>>>>>>>>>>>>>>> mangling the name
of the scalar function in the vector
>>>>>>>>>>>>>>> name are defined in
the the Vector Function ABI specification of the target.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The part of the
vector-variant attribute that redirects
>>>>>>>>>>>>>>> the call to
`vector_sinf` is derived from the
>>>>>>>>>>>>>>> `variant-id`
specified in the `variant` clause.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Summary
>>>>>>>>>>>>>>>
======>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> New `clang`
directive in clang
>>>>>>>>>>>>>>>
------------------------------
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> `#pragma omp
declare variant`, same as `#pragma omp
>>>>>>>>>>>>>>> declare variant`
restricted to the `simd` context selector, from OpenMP 5.0+.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Option behavior,
and interaction with OpenMP
>>>>>>>>>>>>>>>
--------------------------------------------
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The behavior
described below makes sure that `#pragma
>>>>>>>>>>>>>>> cland declare
variant` function vectorization and OpenMP
>>>>>>>>>>>>>>> function
vectorization are orthogonal.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>
`-fclang-declare-variant`
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> :   The `#pragma
clang declare variant` directives are parsed and used
>>>>>>>>>>>>>>>   to populate the
`vector-variant` attribute.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> `-fopenmp[-simd]`
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> :   The `#pragma
omp declare variant` directives are parsed and used to
>>>>>>>>>>>>>>>   populate the
`vector-variant` attribute.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>
`-fopenmp[-simd]`and `-fno-clang-declare-variant`
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> :   The directive
`#pragma omp declare variant` is used to populate the
>>>>>>>>>>>>>>>   `vector-variant`
attribute in IR. The directive
>>>>>>>>>>>>>>>   `#pragma   clang
declare variant` are ignored.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> [^1]: 
>>>>>>>>>>>>>>>
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Sp
>>>>>>>>>>>>>>> ec
>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>> ication-5.0.pdf>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> [^2]: Vector
Function ABI for x86:
>>>>>>>>>>>>>>>  
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
>>>>>>>>>>>>>>>   Vector Function
ABI for AArch64:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>
https://developer.arm.com/products/software-development-
>>>>>>>>>>>>>>> to ol
s/hpc/arm-compiler-for-hpc/vector-function-abi
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> [^3]: 
>>>>>>>>>>>>>>>
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/0477
>>>>>>>>>>>>>>> 32
>>>>>>>>>>>>>>> .h
>>>>>>>>>>>>>>> tml>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>> LLVM Developers
mailing list llvm-dev at lists.llvm.org
>>>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>>>>> cfe-dev at
lists.llvm.org
>>>>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>>> --
>>>>>>>>>>> Hal Finkel
>>>>>>>>>>> Lead, Compiler Technology and
Programming Languages
>>>>>>>>>>> Leadership Computing Facility
Argonne National Laboratory
>>>>>>>>>>> 
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> Johannes Doerfert
>>>>>>> Researcher
>>>>>>> 
>>>>>>> Argonne National Laboratory
>>>>>>> Lemont, IL 60439, USA
>>>>>>> 
>>>>>>> jdoerfert at anl.gov
>>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Johannes Doerfert
>>>>> Researcher
>>>>> 
>>>>> Argonne National Laboratory
>>>>> Lemont, IL 60439, USA
>>>>> 
>>>>> jdoerfert at anl.gov
>>>> 
>>> 
>>> --
>>> 
>>> Johannes Doerfert
>>> Researcher
>>> 
>>> Argonne National Laboratory
>>> Lemont, IL 60439, USA
>>> 
>>> jdoerfert at anl.gov
>> 
>> -- 
>> 
>> Johannes Doerfert
>> Researcher
>> 
>> Argonne National Laboratory
>> Lemont, IL 60439, USA
>> 
>> jdoerfert at anl.gov
> 
> -- 
> 
> Johannes Doerfert
> Researcher
> 
> Argonne National Laboratory
> Lemont, IL 60439, USA
> 
> jdoerfert at anl.gov

llvm dev - Jun 2019 - [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.