thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization. [May 2019]

If this information is useful, please help other people find it:
Share via:

Doerfert, Johannes via llvm-dev

2019-May-31 19:56 UTC

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

I think I did misunderstand what you want to do with attributes. This is
my bad. Let me try to explain:

It seems you want the "vector-variants" attributes (which I could not
find with this name in trunk, correct?) to "remember" what vector
versions can be created (wrt. validity), assuming a definition is
available? Correct?
What I was concerned with is the example I sketched somewhere below
which motivates the need for a generalized/standardized name mangling
for OpenMP. I though you wanted to avoid that somehow but if you don't I
misunderstood you. I basically removed the part where the vector
versions have to be created first but I assumed them to be existent (in
the module or somewhere else). That is, I assumed a call to foo and
various symbols available that are specializations of foo. When we then
vectorize foo (or otherwise specialize at some point in the future), you
would scan the module and pick the best match based on the context of
the call.

Now I don't know if I understood your proposal by now but let me ask a
question anyway:

VecClone.cpp:276-278 mentions that the vectorizer is supposed to look at
the vector-variants functions. This works for variants that are created
from definitions in the module but what about #omp declare simd
declarations?


On 05/31, Francesco Petrogalli wrote:> > On May 31, 2019, at 11:47 AM, Doerfert, Johannes <jdoerfert at
anl.gov> wrote:
> > 
> > I think we should split this discussion:
> >  TOPIC 1 & 2 & 4: How do implement all use cases and OpenMP
5.X
> >                   features, including compatibility with other
> >                   compilers and cross module support.
> 
> Yes, and we have to carefully make this as standard and compatible as
possible.
Agreed.

> >  TOPIC 3b & 5: Interoperability with clang declare (system vs.
user
> >                 declares)
> 
> 
> I think that Alexey explanation of how the directive are handled
> internally in the frontend makes us propound towards the attribute. 
How things are handled right now, especially given that declare variant
is not handled at all, should not limit our design space. If the
argument is that we cannot reasonably implement a solution, that is a
different story.

> >  TOPIC 3a & 3c: floating point issues?
> > 
> 
> I believe there is no issue there. I have quoted the openMP standard in
reply to Renato:
> 
> See
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page
118, lines 23-24:
> 
> “The execution of the function or subroutine cannot have any side
> effects that would alter its execution for concurrent iterations of a
> SIMD chunk."
Great.

> > I inlined comments for Topic 1 below.
> > 
> > I hope that we do not have to discuss topic 2 if we agree neither
> > attributes nor metadata is necessary, or better, will solve the actual
> > problem at hand. I don't have strong feeling on topic 4 but I have
the
> > feeling this will become less problematic once we figure out topic 1.
> > 
> > Thanks,
> >  Johannes
> > 
> > 
> > On 05/31, Francesco Petrogalli wrote:
> >> # TOPIC 1: concerns about name mangling
> >> 
> >> I understand that there are concerns in using the mangling scheme
I
> >> proposed, and that it would be preferred to have a mangling scheme
> >> that is based on (and standardized by) OpenMP. 
> > 
> > I still think it will be required to have a standardized one, not
> > only preferred.
> > 
> > 
> 
> I am all with you in standardizing. x86 and arch64 have their own
> vector function ABI, which, although “private”, are to be considered
> standard. Opensource and commercial compilers are using them,
> therefore we have to deal with this mangling scheme, whether or not
> OpenMP comes up with a standard mangling scheme.
I don't get the point you are trying to make here. What do you mean by
"we have to deal with"? (I do not suggest to get rid of them.)

 > >> I hear the argument on having some common ground here. In fact,
there
> >> is already common ground between the x86 and aarch64 backend, who
have
> >> based their respective Vector Function ABI specifications on
OpenMP.
> >> 
> >> In fact, the mangled name grammar can be summarized as follows:
> >> 
> >> _ZGV<isa><masking><VLEN><parameter
type>_<scalar name>
> >> 
> >> Across vector extensions the only <token> that will differ
is the
> >> <isa> token.
> >> 
> >> This might lead people to think that we could drop the
_ZGV<isa>
> >> prefix and consider the <masking><VLEN><parameter
type>_<scalar name>
> >> part as a sort of unofficial OpenMP mangling scheme: in fact, the
> >> signature of an “unmasked 2-lane vector vector of `sin`” will
always
> >> be `<2 x double>(2 x double>).
> >> 
> >> The problem with this choice is the number of vector version
available
> >> for a target is not unique.
> > 
> > For me, this simply means this mangling scheme is not sufficient.
> > 
> 
> Can you explain more why you think the mangling scheme is not
> sufficient? The mangling scheme is shaped to provide all the
> information that the OpenMP directive describes.
I don't know if it is insufficient but I though you hinted towards that.
If we can handle/decode everything we need for declare variants then I
do not object at all. If not, we require respective extension such that
we can. The result should be a superset of the current SIMD encoding and
compatible with the current one.


> The fact that x86 and aarch64 realize such information in different
> way (multiple signature/vector extensions) is something that cannot be
> avoided, because it is related to architectural aspects that are
> specific to the vector extension and transparent to the OpenMP
> standard.
I don't think that is a problem (that's why I "failed to see the
problem" in the comment below). I look at it this way: If #declare simd,
or similar, results in N variants, it should at the end of the day not
be different from declaring these N variants explicitly with the
respective declare variant match clause.

> >> In particular, the following declaration generates multiple vector
> >> versions, depending on the target:
> >> 
> >> #pragma omp declare simd simdlen(2) notinbranch
> >> double foo(double) {…};
> >> 
> >> On x86, this generates at least 4 symbols (one for SSE, one for
AVX,
> >> one for AVX2, and one for AVX512: https://godbolt.org/z/TLYXPi)
> >> 
> >> On aarch64, the same declaration generates a unique symbol, as
> >> specified in the Vector Function ABI.
> > 
> > I fail to see the problem. We generate X symbols for X different
> > contexts. Once we get to the point where we vectorize, we determine
> > which context fits best and choose the corresponding symbol version.
> > 
> 
> Yes, this is exactly what we need to do, under the constrains that
> the rules for  generating "X symbols for X different contexts” are
> decided by the Vector Function ABI of the target.
Sounds good. The vector ABI is used to determine what contexts exists
and what symbols should be created. I would assume the encoding should
be the same as if we specified the versions (/contexts) ourselves via
#declare variant.

> > Maybe my view is to naive here, please feel free to correct me.
> > 
> > 
> >> This means that the attribute (or metadata) that carries the
> >> information on the available vector version needs to deal also
with
> >> things that are not usually visible at IR level, but that might
still
> >> need to be provided to be able to decide which particular
instruction
> >> set/ vector extension needs to be targeted.
> > 
> > The symbol names should carry all the information we need. If they do
> > not, we need to improve the mangling scheme such that they do. There
is
> > no attributes/metadata we could use at library boundaries.
> > 
> Hum, I am not sure what you mean by "There is no attributes/metadata
> we could use at library boundaries."
(This seems to be part of the misunderstanding, I leave my comment here
anyway:)

The simd-related stuff works because it is a uniform mangling scheme
used by all compilers. Take the situation below in which I think we want
to call foo_CTX in the library. If so, we need a name for it.


a.c:  // Compiled by gcc into a library
#omp declare variant (foo) match(CTX)
void foo_CTX(...) {...}

b.c:  // Compiled by clang linked against the library above.
#omp declare variant (foo) match(CTX)
void foo_CTX(...);

void bar(...) {
  #pragma omp CTX
  foo();   // <- What function (symbol) do we call if a.c was compiled
           //    by gcc and b.c with clang?
}
> In our downstream compiler (Arm compiler for HPC, based on LLVM), we
> use `declare simd` to provide vector math functions via custom header
> file. It works brilliantly, if not for specific aspects that would be
> perfectly covered by the `declare variant`, which might be one of the
> reason why the OpenMP committee decided to introduce `declare
> variant`.
But you (assume that you) control the mangling scheme across the entire
infrastructure. Given that the simd mangling is de-facto standardized,
that works.

Side note:
Declare variant, as of 5.0, is not flexible enough for a sensible
inclusion of target specific headers. That will change in 5.1.

> If your concerns is that by adding an attribute that somehow represent
> something that is available in an external library is not enough to
> guarantee that that symbol is available in the library… not even C
> code can guarantee that? If the linker is not pointing to the right
> library, there is nothing that can prevent it to fail if the symbol is
> not present? 
I don't follow the example you describe. I don't want to change anything
in how symbols are looked up or what happens if they are missing.

> >> I used an example based on `declare simd` instead of `declare
variant`
> >> because the attribute/metadata needed for `declare variant` is a
> >> modification of the one needed for `declare simd`, which has
already
> >> been agreed in a previous RFC proposed by Intel [1], and for which
> >> Intel has already provided an implementation [2]. The changes
proposed
> >> in this RFC are fully compatible with the work that is being don
for
> >> the VecClone pass in [2].
> >> 
> >> [1] http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
> >> [2] VecCLone pass: https://reviews.llvm.org/D22792
> > 
> > Having an agreed upon mangling for the older feature is not
necessarily
> > important here. We will need more functionality for variants and
keeping
> > the old scheme around with some metadata is not an extensible
long-term
> > solution. So, I would not try to fit variants into the existing
> > simd-scheme but instead do it the other way around. We define what we
> > need for variants and implement simd in that scheme.
> > 
> 
> I kinda think that having agreed on something is important. It allows
> to build other things on top of what have been agreed without breaking
> compatibility.
>
> On the specific, which are the new functionalities needed for the
> variants that would make the current metadata (attributes) for declare
> simd non extensible?
See first comment.
> >> The good news is that as far as AArch64 and x86 are concerned, the
only thing that will differ in the mangled name is the “<isa>” token. As
far as I can tell, the mangling scheme of the rest of the vector name is the
same, therefore a lot of infrastructure in terms of mangling and demangling can
be reused. In fact, the `mangleVectorParameters` function in
https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could
already be shared among x86 and aarch64.
> >> 
> >> TOPIC 2: metadata vs attribute
> >> 
> >> From a functionality point of view, I don’t care whether we use
metadata or attributes. The VecClone pass mentioned in TOPIC 1 uses the
following:
> >> 
> >> attributes #0 = { nounwind uwtable
“vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16”}
> >> 
> >> This is an attribute (I though it was metadata?), I am happy to
reword the RFC using the right terminology (sorry for messing this up).
> >> 
> >> Also, @Renato expressed concern that metadata might be dropped by
optimization passes - would using attributes prevent that?
> >> 
> >> TOPIC 3: "There is no way to notify the backend how
conformant the SIMD versions are.”
> >> 
> >> @Shawn, I am afraid I don’t understand what you mean by
“conformant” here. Can you elaborate with an example?
> >> 
> >> TOPIC 3: interaction of the `omp declare variant` with `clang
declare variant`
> >> 
> >> I believe this is described in the `Option behavior, and
interaction with OpenMP`. The option `-fclang-declare-variant` is there to make
the OpenMP based one orthogonal. Of course, we might decide to make
-fclang-declare-variant on/off by default, and have default behavior when
interacting with -fopenmp-simd. For the sake of compatibility with other
compilers, we might need to require -fno-clang-declare-variant when targeting
-fopenmp-[simd].
> >> 
> >> TOPIC 3: "there are no special arguments / flags / status
regs that are used / changed in the vector version that the compiler will have
to "just know”
> >> 
> >> I believe that this concern is raised by the problem of handling
FP exceptions? If that’s the case, the compiler is not allowed to do any
assumption on the vector function about that, and treat it with the same
knowledge of any other function, depending on the visibility it has in the
compilation unit. @Renato, does this answer your question?
> >> 
> >> TOPIC 4: attribute in function declaration vs attribute function
call site
> >> 
> >> We discussed this in the previous version of the proposal. Having
it in the call sites guarantees that incompatible vector version are used when
merging modules compiled for different targets. I don’t have a use case for
this, if I remember correctly this was asked by @Hideki Saito. Hideki, any
comment on this?
> >> 
> >> TOPIC 5: overriding system header (the discussion on #pragma
omp/clang/system variants initiated by @Hal Finkel).
> >> 
> >> I though that the split among #pragma clang declare variant and
#pragma omp declare variant was already providing the orthogonality between
system header and user header. Meaning that a user should always prefer the omp
version (for portability to other compilers) instead of the #pragma clang one,
which would be relegated to system headers and headers provided by the compiler.
Am I missing something? If so, I am happy to add a “system” version of the
directive, as it would be quite easy to do given most of the parsing
infrastructure will be shared.
> >> 
> >> 
> >>> On May 30, 2019, at 12:53 PM, Philip Reames <listmail at
philipreames.com> wrote:
> >>> 
> >>> 
> >>> On 5/30/19 9:05 AM, Doerfert, Johannes wrote:
> >>>> On 05/29, Finkel, Hal J. via cfe-dev wrote:
> >>>>> On 5/29/19 1:52 PM, Philip Reames wrote:
> >>>>>> On 5/28/19 7:55 PM, Finkel, Hal J. wrote:
> >>>>>>> On 5/28/19 3:31 PM, Philip Reames via cfe-dev
wrote:
> >>>>>>>> I generally like the idea of having
support in IR for vectorization of
> >>>>>>>> custom functions.  I have several use
cases which would benefit from this.
> >>>>>>>> 
> >>>>>>>> I'd suggest a couple of reframings to
the IR representation though.
> >>>>>>>> 
> >>>>>>>> First, this should probably be specified
as metadata/attribute on a
> >>>>>>>> function declaration.  Allowing the
callsite variant is fine, but it
> >>>>>>>> should primarily be a property of the
called function, not of the call
> >>>>>>>> site.  Being able to specify it once per
declaration is much cleaner.
> >>>>>>> I agree. We should support this both on the
function declaration and on
> >>>>>>> the call sites.
> >>>>>>> 
> >>>>>>> 
> >>>>>>>> Second, I really don't like the
mangling use here.  We need a better way
> >>>>>>>> to specify the properties of the function
then it's mangled name.  One
> >>>>>>>> thought to explore is to directly use the
Value of the function
> >>>>>>>> declaration (since this is metadata and we
can do that), and then tie
> >>>>>>>> the properties to the function declaration
in some way?  Sorry, I don't
> >>>>>>>> really have a specific suggestion here.
> >>>>>>> Is the problem the mangling or the fact that
the mangling is
> >>>>>>> ABI/target-specific? One option is to use
LLVM's mangling scheme (the
> >>>>>>> one we use for intrinsics) and then provide
some backend infrastructure
> >>>>>>> to translate later.
> >>>>>> Well, both honestly.  But mangling with a
non-target specific scheme is
> >>>>>> a lot better, so I might be okay with that.   Good
idea.
> >>>>> 
> >>>>> I liked your idea of directly encoding the signature
in the metadata,
> >>>>> but I think that we want to continue to use
attributes, and not
> >>>>> metadata, and the options for attributes seem more
limited - unless we
> >>>>> allow attributes to take metadata arguments - maybe
that's an
> >>>>> enhancement worth considering.
> >>>> I recently talked to people in the OpenMP language
committee meeting
> >>>> about this and, thinking forward to the actual
implementation/use of the
> >>>> OpenMP 5.x declare variant feature, I'd say:
> >>>> 
> >>>> - We will need a mangling scheme if we want to allow
variants on
> >>>>   declarations that are defined elsewhere.
> >>>> - We will need a (OpenMP) standardized mangling scheme if
we want
> >>>>   interoperability between compilers.
> >>>> 
> >>>> I assume we want both so I think we will need both.
> >>> If I'm reading this correctly, this describes a need for
the frontend to
> >>> have a mangling scheme.  Nothing in here would seem to prevent
the
> >>> frontend for generating a declaration for a mangled external
symbol and
> >>> then referencing that declaration.  Am I missing something?
> >>>> 
> >>>> That said, I think this should allow us to avoid
attributes/metadata
> >>>> which seems to me like a good thing right now.
> >>>> 
> >>>> Cheers,
> >>>> Johannes
> >>>> 
> >>>> 
> >>>>>>>> On 5/28/19 12:44 PM, Francesco Petrogalli
via llvm-dev wrote:
> >>>>>>>>> Dear all,
> >>>>>>>>> 
> >>>>>>>>> This RFC is a proposal to provide
auto-vectorization functionality for user provided vector functions.
> >>>>>>>>> 
> >>>>>>>>> The proposal is a modification of an
RFC that I have sent out a couple of months ago, with the title `[RFC]
Re-implementing -fveclib with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
> >>>>>>>>> 
> >>>>>>>>> The original RFC was proposing to
re-implement the `-fveclib` command line option. This proposal avoids that, and
limits its scope to the mechanics of providing vector function in user code that
the compiler can pick up for auto-vectorization. This narrower scope limits the
impact of changes that are needed in both clang and LLVM.
> >>>>>>>>> 
> >>>>>>>>> Please let me know what you think.
> >>>>>>>>> 
> >>>>>>>>> Kind regards,
> >>>>>>>>> 
> >>>>>>>>> Francesco
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>>
================================================================================>
>>>>>>>>>
> >>>>>>>>> Introduction
> >>>>>>>>> ===========>
>>>>>>>>>
> >>>>>>>>> This RFC encompasses the proposal of
informing the vectorizer about the
> >>>>>>>>> availability of vector functions
provided by the user. The mechanism is
> >>>>>>>>> based on the use of the directive
`declare variant` introduced in OpenMP
> >>>>>>>>> 5.0 [^1].
> >>>>>>>>> 
> >>>>>>>>> The mechanism proposed has the
following properties:
> >>>>>>>>> 
> >>>>>>>>> 1.  Decouples the compiler front-end
that knows about the availability
> >>>>>>>>>     of vectorized routines, from the
back-end that knows how to make use
> >>>>>>>>>     of them.
> >>>>>>>>> 2.  Enable support for a
developer's own vector libraries without
> >>>>>>>>>     requiring changes to the compiler.
> >>>>>>>>> 3.  Enables other frontends (e.g. f18)
to add scalar-to-vector function
> >>>>>>>>>     mappings as relevant for their own
runtime libraries, etc.
> >>>>>>>>> 
> >>>>>>>>> The implemetation consists of two
separate sets of changes.
> >>>>>>>>> 
> >>>>>>>>> The first set is a set o changes in
`llvm`, and consists of:
> >>>>>>>>> 
> >>>>>>>>> 1.  [Changes in LLVM IR](#llvmIR) to
provide information about the
> >>>>>>>>>     availability of user-defined
vector functions via metadata attached
> >>>>>>>>>     to an `llvm::CallInst`.
> >>>>>>>>> 2.  [An
infrastructure](#infrastructure) that can be queried to retrive
> >>>>>>>>>     information about the available
vector functions associated to a
> >>>>>>>>>     `llvm::CallInst`.
> >>>>>>>>> 3.  [Changes in the
LoopVectorizer](#LV) to use the API to query the
> >>>>>>>>>     metadata.
> >>>>>>>>> 
> >>>>>>>>> The second set consists of the changes
[changes in clang](#clang) that
> >>>>>>>>> are needed too to recognize the
`#pragma clang declare variant`
> >>>>>>>>> directive.
> >>>>>>>>> 
> >>>>>>>>> Proposed changes
> >>>>>>>>> ===============>
>>>>>>>>>
> >>>>>>>>> We propose an implementation that uses
`#pragma clang declare variant`
> >>>>>>>>> to inform the backend components about
the availability of vector
> >>>>>>>>> version of scalar functions found in
IR. The mechanism relies in storing
> >>>>>>>>> such information in IR metadata, and
therefore makes the
> >>>>>>>>> auto-vectorization of function calls a
mid-end (`opt`) process that is
> >>>>>>>>> independent on the front-end that
generated such IR metadata.
> >>>>>>>>> 
> >>>>>>>>> This implementation provides a generic
mechanism that the users of the
> >>>>>>>>> LLVM compiler will be able to use for
interfacing their own vector
> >>>>>>>>> routines for generic code.
> >>>>>>>>> 
> >>>>>>>>> The implementation can also expose
vectorization-specific descriptors --
> >>>>>>>>> for example, like the `linear` and
`uniform` clauses of the OpenMP
> >>>>>>>>> `declare simd` directive -- that could
be used to finely tune the
> >>>>>>>>> automatic vectorization of some
functions (think for example the
> >>>>>>>>> vectorization of `double sincos(double
, double *, double *)`, where
> >>>>>>>>> `linear` can be used to give extra
information about the memory layout
> >>>>>>>>> of the 2 pointers parameters in the
vector version).
> >>>>>>>>> 
> >>>>>>>>> The directive `#pragma clang declare
variant` follows the syntax of the
> >>>>>>>>> `#pragma omp declare variant`
directive of OpenMP.
> >>>>>>>>> 
> >>>>>>>>> We define the new directive in the
`clang` namespace instead of using
> >>>>>>>>> the `omp` one of OpenMP to allow the
compiler to perform
> >>>>>>>>> auto-vectorization outside of an
OpenMP SIMD context.
> >>>>>>>>> 
> >>>>>>>>> The mechanism is base on OpenMP to
provide a uniform user experience
> >>>>>>>>> across the two mechanism, and to
maximise the number of shared
> >>>>>>>>> components of the infrastructure
needed in the compiler frontend to
> >>>>>>>>> enable the feature.
> >>>>>>>>> 
> >>>>>>>>> Changes in LLVM IR {#llvmIR}
> >>>>>>>>> ------------------
> >>>>>>>>> 
> >>>>>>>>> The IR is enriched with metadata that
details the availability of vector
> >>>>>>>>> versions of an associated scalar
function. This metadata is attached to
> >>>>>>>>> the call site of the scalar function.
> >>>>>>>>> 
> >>>>>>>>> The metadata takes the form of an
attribute containing a comma separated
> >>>>>>>>> list of vector function mappings. Each
entry has a unique name that
> >>>>>>>>> follows the Vector Function ABI[^2]
and real name that is used when
> >>>>>>>>> generating calls to this vector
function.
> >>>>>>>>> 
> >>>>>>>>>     vfunc_name1(real_name1),
vfunc_name2(real_name2)
> >>>>>>>>> 
> >>>>>>>>> The Vector Function ABI name describes
the signature of the vector
> >>>>>>>>> function so that properties like
vectorisation factor can be queried
> >>>>>>>>> during compilation.
> >>>>>>>>> 
> >>>>>>>>> The `(real name)` token is optional
and assumed to match the Vector
> >>>>>>>>> Function ABI name when omitted.
> >>>>>>>>> 
> >>>>>>>>> For example, the availability of a
2-lane double precision `sin`
> >>>>>>>>> function via SVML when targeting AVX
on x86 is provided by the following
> >>>>>>>>> IR.
> >>>>>>>>> 
> >>>>>>>>>     // ...
> >>>>>>>>>     ... = call double @sin(double) #0
> >>>>>>>>>     // ...
> >>>>>>>>> 
> >>>>>>>>>     #0 = { vector-variant =
{"_ZGVcN2v_sin(__svml_sin2),
> >>>>>>>>>                              
_ZGVdN4v_sin(__svml_sin4),
> >>>>>>>>>                              
..."} }
> >>>>>>>>> 
> >>>>>>>>> The string
`"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant
> >>>>>>>>> attribute provides information on the
shape of the vector function via
> >>>>>>>>> the string `_ZGVcN2v_sin`, mangled
according to the Vector Function ABI
> >>>>>>>>> for Intel, and remaps the standard
Vector Function ABI name to the
> >>>>>>>>> non-standard name `__svml_sin2`.
> >>>>>>>>> 
> >>>>>>>>> This metadata is compatible with the
proposal "Proposal for function
> >>>>>>>>> vectorization and loop vectorization
with function calls",[^3] that uses
> >>>>>>>>> Vector Function ABI mangled names to
inform the vectorizer about the
> >>>>>>>>> availability of vector functions. The
proposal extends the original by
> >>>>>>>>> allowing the explicit mapping of the
Vector Function ABI mangled name to
> >>>>>>>>> a non-standard name, which allows the
use of existing vector libraries.
> >>>>>>>>> 
> >>>>>>>>> The `vector-variant` attribute needs
to be attached on a per-call basis
> >>>>>>>>> to avoid conflicts when merging
modules with different vector variants.
> >>>>>>>>> 
> >>>>>>>>> The query infrastructure: SVFS
{#infrastructure}
> >>>>>>>>> ------------------------------
> >>>>>>>>> 
> >>>>>>>>> The Search Vector Function System
(SVFS) is constructed from an
> >>>>>>>>> `llvm::Module` instance so it can
create function definitions. The SVFS
> >>>>>>>>> exposes an API with two methods.
> >>>>>>>>> 
> >>>>>>>>> ### `SVFS::isFunctionVectorizable`
> >>>>>>>>> 
> >>>>>>>>> This method queries the avilability of
a vectorized version of a
> >>>>>>>>> function. The signature of the method
is as follows.
> >>>>>>>>> 
> >>>>>>>>>     bool
isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap Params);
> >>>>>>>>> 
> >>>>>>>>> The method determine the availability
of vector version of the function
> >>>>>>>>> invoked by the `Call` parameter by
looking at the `vector-variant`
> >>>>>>>>> metadata.
> >>>>>>>>> 
> >>>>>>>>> The `Params` argument is a map that
associates the position of a
> >>>>>>>>> parameter in the `CallInst` to its
`ParameterType` descriptor. The
> >>>>>>>>> `ParameterType` descriptor holds
information about the shape of the
> >>>>>>>>> correspondend parameter in the
signature of the vector function. This
> >>>>>>>>> `ParamaterType` is used to query the
SVMS about the availability of
> >>>>>>>>> vector version that have `linear`,
`uniform` or `align` parameters (in
> >>>>>>>>> the sense of OpenMP 4.0 and onwards).
> >>>>>>>>> 
> >>>>>>>>> The method `isFunctionVectorizable`,
when invoked with an empty
> >>>>>>>>> `ParTypeMap`, is equivalent to the
`TargetLibraryInfo` method
> >>>>>>>>> `isFunctionVectorizable(StrinRef
Name)`.
> >>>>>>>>> 
> >>>>>>>>> ### `SVFS::getVectorizedFunction`
> >>>>>>>>> 
> >>>>>>>>> This method returns the vector
function declaration that correspond to
> >>>>>>>>> the needs of the vectorization
technique that is being run.
> >>>>>>>>> 
> >>>>>>>>> The signature of the function is as
follows.
> >>>>>>>>> 
> >>>>>>>>>     std::pair<llvm::FunctionType *,
std::string> getVectorizedFunction(
> >>>>>>>>>       llvm::CallInst * Call, unsigned
VF, bool IsMasked, ParTypeSet Params);
> >>>>>>>>> 
> >>>>>>>>> The `Call` parameter is the call
instance that is being vectorized, the
> >>>>>>>>> `VF` parameter represent the
vectorization factor (how many lanes), the
> >>>>>>>>> `IsMasked` parameter decides whether
or not the signature of the vector
> >>>>>>>>> function is required to have a mask
parameter, the `Params` parameter
> >>>>>>>>> describes the shape of the vector
function as in the
> >>>>>>>>> `isFunctionVectorizable` method.
> >>>>>>>>> 
> >>>>>>>>> The methods uses the `vector-variant`
metadata and returns the function
> >>>>>>>>> signature and the name of the function
based on the input parameters.
> >>>>>>>>> 
> >>>>>>>>> The SVFS can add new function
definitions, in the same module as the
> >>>>>>>>> `Call`, to provide vector functions
that are not present within the
> >>>>>>>>> vector-variant metadata. For example,
if a library provides a vector
> >>>>>>>>> version of a function with a
vectorization factor of 2, but the
> >>>>>>>>> vectorizer is requesting a
vectorization factor of 4, the SVFS is
> >>>>>>>>> allowed to create a definition that
calls the 2-lane version twice. This
> >>>>>>>>> capability applies similarly for
providing masked and unmasked versions
> >>>>>>>>> when the request does not match what
is available in the library.
> >>>>>>>>> 
> >>>>>>>>> This method is equivalent to the TLI
method
> >>>>>>>>> `StringRef
getVectorizedFunction(StringRef F, unsigned VF) const;`.
> >>>>>>>>> 
> >>>>>>>>> Notice that to fully support OpenMP
vectorization we need to think about
> >>>>>>>>> a fuzzy matching mechanism that is
able to select a candidate in the
> >>>>>>>>> calling context. However, this
proposal is intended for scalar-to-vector
> >>>>>>>>> mappings of math-like functions that
are most likely to associate a
> >>>>>>>>> unique vector candidate in most
contexts. Therefore, extending this
> >>>>>>>>> behavior to a generic one is an aspect
of the implementation that will
> >>>>>>>>> be treated in a separate RFC about the
vectorization pass.
> >>>>>>>>> 
> >>>>>>>>> ### Scalable vectorization
> >>>>>>>>> 
> >>>>>>>>> Both methods of the SVFS API will be
extended with a boolean parameter
> >>>>>>>>> to specify whether scalable signatures
are needed by the user of the
> >>>>>>>>> SVFS.
> >>>>>>>>> 
> >>>>>>>>> Changes in clang {#clang}
> >>>>>>>>> ----------------
> >>>>>>>>> 
> >>>>>>>>> We use clang to generate the metadata
described above.
> >>>>>>>>> 
> >>>>>>>>> In the compilation unit, the vector
function definition or declaration
> >>>>>>>>> must be visible and associated to the
scalar version via the
> >>>>>>>>> `#pragma clang declare variant`
according to the rule defined by the
> >>>>>>>>> correspondent `#pragma omp declare
variant` defined in OpenMP 5.0, as in
> >>>>>>>>> the following example.
> >>>>>>>>> 
> >>>>>>>>>     #pragma clang declare
variant(vector_sinf) \
> >>>>>>>>>    
match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})
> >>>>>>>>>     extern float sinf(float);
> >>>>>>>>> 
> >>>>>>>>>     float32x4_t
vector_sinf(float32x4_t x);
> >>>>>>>>> 
> >>>>>>>>> The `construct` set in the directive,
together with the `device` set, is
> >>>>>>>>> used to generate the vector mangled
name to be used in the
> >>>>>>>>> `vector-variant` attribute, for
example `_ZGVnN2v_sin`, when targeting
> >>>>>>>>> AArch64 Advanced SIMD code generation.
The rule for mangling the name of
> >>>>>>>>> the scalar function in the vector name
are defined in the the Vector
> >>>>>>>>> Function ABI specification of the
target.
> >>>>>>>>> 
> >>>>>>>>> The part of the vector-variant
attribute that redirects the call to
> >>>>>>>>> `vector_sinf` is derived from the
`variant-id` specified in the
> >>>>>>>>> `variant` clause.
> >>>>>>>>> 
> >>>>>>>>> Summary
> >>>>>>>>> ======>
>>>>>>>>>
> >>>>>>>>> New `clang` directive in clang
> >>>>>>>>> ------------------------------
> >>>>>>>>> 
> >>>>>>>>> `#pragma omp declare variant`, same as
`#pragma omp declare variant`
> >>>>>>>>> restricted to the `simd` context
selector, from OpenMP 5.0+.
> >>>>>>>>> 
> >>>>>>>>> Option behavior, and interaction with
OpenMP
> >>>>>>>>>
--------------------------------------------
> >>>>>>>>> 
> >>>>>>>>> The behavior described below makes
sure that
> >>>>>>>>> `#pragma cland declare variant`
function vectorization and OpenMP
> >>>>>>>>> function vectorization are orthogonal.
> >>>>>>>>> 
> >>>>>>>>> `-fclang-declare-variant`
> >>>>>>>>> 
> >>>>>>>>> :   The `#pragma clang declare
variant` directives are parsed and used
> >>>>>>>>>     to populate the `vector-variant`
attribute.
> >>>>>>>>> 
> >>>>>>>>> `-fopenmp[-simd]`
> >>>>>>>>> 
> >>>>>>>>> :   The `#pragma omp declare variant`
directives are parsed and used to
> >>>>>>>>>     populate the `vector-variant`
attribute.
> >>>>>>>>> 
> >>>>>>>>> `-fopenmp[-simd]`and
`-fno-clang-declare-variant`
> >>>>>>>>> 
> >>>>>>>>> :   The directive `#pragma omp declare
variant` is used to populate the
> >>>>>>>>>     `vector-variant` attribute in IR.
The directive
> >>>>>>>>>     `#pragma   clang declare variant`
are ignored.
> >>>>>>>>> 
> >>>>>>>>> [^1]:
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
> >>>>>>>>> 
> >>>>>>>>> [^2]: Vector Function ABI for x86:
> >>>>>>>>>    
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
> >>>>>>>>>     Vector Function ABI for AArch64:
> >>>>>>>>>    
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi
> >>>>>>>>> 
> >>>>>>>>> [^3]:
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
> >>>>>>>>> 
> >>>>>>>>>
_______________________________________________
> >>>>>>>>> LLVM Developers mailing list
> >>>>>>>>> llvm-dev at lists.llvm.org
> >>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>>>>>>
_______________________________________________
> >>>>>>>> cfe-dev mailing list
> >>>>>>>> cfe-dev at lists.llvm.org
> >>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>>>> -- 
> >>>>> Hal Finkel
> >>>>> Lead, Compiler Technology and Programming Languages
> >>>>> Leadership Computing Facility
> >>>>> Argonne National Laboratory
> >>>>> 
> >>>>> _______________________________________________
> >>>>> cfe-dev mailing list
> >>>>> cfe-dev at lists.llvm.org
> >>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >> 
> > 
> > -- 
> > 
> > Johannes Doerfert
> > Researcher
> > 
> > Argonne National Laboratory
> > Lemont, IL 60439, USA
> > 
> > jdoerfert at anl.gov
> 
-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190531/47d95918/attachment-0001.sig>

Saito, Hideki via llvm-dev

2019-May-31 20:07 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

>This works for variants that are created from definitions in the module but
what about #omp declare simd declarations?
I'm sorry that I haven't digested this thread in its entirety, but let
me just deal with this one point for now.
Suppose #pragma omp declare simd is applied to foo(). I'd expect the
corresponding Function Attrs to be attached to
"declare dso_local void @foo(i32)". For Intel vector function ABI, the
attribute comes with a mangled name like "_ZGV........foo".

Thanks,
Hideki
---------------------------------
void foo(int x);
 
void bar(){
  foo(3);
}

==>

; Function Attrs: noinline nounwind optnone uwtable
define dso_local void @bar() #0 {
  call void @foo(i32 3)
  ret void
}
 
declare dso_local void @foo(i32) #1

-----Original Message-----
From: Doerfert, Johannes [mailto:jdoerfert at anl.gov] 
Sent: Friday, May 31, 2019 12:56 PM
To: Francesco Petrogalli <Francesco.Petrogalli at arm.com>
Cc: Philip Reames <listmail at philipreames.com>; Finkel, Hal J.
<hfinkel at anl.gov>; LLVM Development List <llvm-dev at
lists.llvm.org>; nd <nd at arm.com>; Saito, Hideki <hideki.saito at
intel.com>; Clang Dev <cfe-dev at lists.llvm.org>; scogland1 at
llnl.gov
Subject: Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided vector function for
auto-vectorization.

I think I did misunderstand what you want to do with attributes. This is my bad.
Let me try to explain:

It seems you want the "vector-variants" attributes (which I could not
find with this name in trunk, correct?) to "remember" what vector
versions can be created (wrt. validity), assuming a definition is available?
Correct?
What I was concerned with is the example I sketched somewhere below which
motivates the need for a generalized/standardized name mangling for OpenMP. I
though you wanted to avoid that somehow but if you don't I misunderstood
you. I basically removed the part where the vector versions have to be created
first but I assumed them to be existent (in the module or somewhere else). That
is, I assumed a call to foo and various symbols available that are
specializations of foo. When we then vectorize foo (or otherwise specialize at
some point in the future), you would scan the module and pick the best match
based on the context of the call.

Now I don't know if I understood your proposal by now but let me ask a
question anyway:

VecClone.cpp:276-278 mentions that the vectorizer is supposed to look at the
vector-variants functions. This works for variants that are created from
definitions in the module but what about #omp declare simd declarations?


On 05/31, Francesco Petrogalli wrote:> > On May 31, 2019, at 11:47 AM, Doerfert, Johannes <jdoerfert at
anl.gov> wrote:
> > 
> > I think we should split this discussion:
> >  TOPIC 1 & 2 & 4: How do implement all use cases and OpenMP
5.X
> >                   features, including compatibility with other
> >                   compilers and cross module support.
> 
> Yes, and we have to carefully make this as standard and compatible as
possible.
Agreed.

> >  TOPIC 3b & 5: Interoperability with clang declare (system vs.
user
> >                 declares)
> 
> 
> I think that Alexey explanation of how the directive are handled 
> internally in the frontend makes us propound towards the attribute.
How things are handled right now, especially given that declare variant is not
handled at all, should not limit our design space. If the argument is that we
cannot reasonably implement a solution, that is a different story.

> >  TOPIC 3a & 3c: floating point issues?
> > 
> 
> I believe there is no issue there. I have quoted the openMP standard in
reply to Renato:
> 
> See
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page
118, lines 23-24:
> 
> “The execution of the function or subroutine cannot have any side 
> effects that would alter its execution for concurrent iterations of a 
> SIMD chunk."
Great.

> > I inlined comments for Topic 1 below.
> > 
> > I hope that we do not have to discuss topic 2 if we agree neither 
> > attributes nor metadata is necessary, or better, will solve the 
> > actual problem at hand. I don't have strong feeling on topic 4 but
I
> > have the feeling this will become less problematic once we figure out
topic 1.
> > 
> > Thanks,
> >  Johannes
> > 
> > 
> > On 05/31, Francesco Petrogalli wrote:
> >> # TOPIC 1: concerns about name mangling
> >> 
> >> I understand that there are concerns in using the mangling scheme
I
> >> proposed, and that it would be preferred to have a mangling scheme
> >> that is based on (and standardized by) OpenMP.
> > 
> > I still think it will be required to have a standardized one, not 
> > only preferred.
> > 
> > 
> 
> I am all with you in standardizing. x86 and arch64 have their own 
> vector function ABI, which, although “private”, are to be considered 
> standard. Opensource and commercial compilers are using them, 
> therefore we have to deal with this mangling scheme, whether or not 
> OpenMP comes up with a standard mangling scheme.
I don't get the point you are trying to make here. What do you mean by
"we have to deal with"? (I do not suggest to get rid of them.)

 > >> I hear the argument on having some common ground here. In fact, 
> >> there is already common ground between the x86 and aarch64
backend,
> >> who have based their respective Vector Function ABI specifications
on OpenMP.
> >> 
> >> In fact, the mangled name grammar can be summarized as follows:
> >> 
> >> _ZGV<isa><masking><VLEN><parameter
type>_<scalar name>
> >> 
> >> Across vector extensions the only <token> that will differ
is the
> >> <isa> token.
> >> 
> >> This might lead people to think that we could drop the
_ZGV<isa>
> >> prefix and consider the <masking><VLEN><parameter
type>_<scalar
> >> name> part as a sort of unofficial OpenMP mangling scheme: in
fact,
> >> the signature of an “unmasked 2-lane vector vector of `sin`” will 
> >> always be `<2 x double>(2 x double>).
> >> 
> >> The problem with this choice is the number of vector version 
> >> available for a target is not unique.
> > 
> > For me, this simply means this mangling scheme is not sufficient.
> > 
> 
> Can you explain more why you think the mangling scheme is not 
> sufficient? The mangling scheme is shaped to provide all the 
> information that the OpenMP directive describes.
I don't know if it is insufficient but I though you hinted towards that.
If we can handle/decode everything we need for declare variants then I do not
object at all. If not, we require respective extension such that we can. The
result should be a superset of the current SIMD encoding and compatible with the
current one.


> The fact that x86 and aarch64 realize such information in different 
> way (multiple signature/vector extensions) is something that cannot be 
> avoided, because it is related to architectural aspects that are 
> specific to the vector extension and transparent to the OpenMP 
> standard.
I don't think that is a problem (that's why I "failed to see the
problem" in the comment below). I look at it this way: If #declare simd, or
similar, results in N variants, it should at the end of the day not be different
from declaring these N variants explicitly with the respective declare variant
match clause.

> >> In particular, the following declaration generates multiple vector
> >> versions, depending on the target:
> >> 
> >> #pragma omp declare simd simdlen(2) notinbranch double foo(double)
> >> {…};
> >> 
> >> On x86, this generates at least 4 symbols (one for SSE, one for 
> >> AVX, one for AVX2, and one for AVX512: 
> >> https://godbolt.org/z/TLYXPi)
> >> 
> >> On aarch64, the same declaration generates a unique symbol, as 
> >> specified in the Vector Function ABI.
> > 
> > I fail to see the problem. We generate X symbols for X different 
> > contexts. Once we get to the point where we vectorize, we determine 
> > which context fits best and choose the corresponding symbol version.
> > 
> 
> Yes, this is exactly what we need to do, under the constrains that the 
> rules for  generating "X symbols for X different contexts” are decided
> by the Vector Function ABI of the target.
Sounds good. The vector ABI is used to determine what contexts exists and what
symbols should be created. I would assume the encoding should be the same as if
we specified the versions (/contexts) ourselves via #declare variant.

> > Maybe my view is to naive here, please feel free to correct me.
> > 
> > 
> >> This means that the attribute (or metadata) that carries the 
> >> information on the available vector version needs to deal also
with
> >> things that are not usually visible at IR level, but that might 
> >> still need to be provided to be able to decide which particular 
> >> instruction set/ vector extension needs to be targeted.
> > 
> > The symbol names should carry all the information we need. If they 
> > do not, we need to improve the mangling scheme such that they do. 
> > There is no attributes/metadata we could use at library boundaries.
> > 
> Hum, I am not sure what you mean by "There is no attributes/metadata 
> we could use at library boundaries."
(This seems to be part of the misunderstanding, I leave my comment here
anyway:)

The simd-related stuff works because it is a uniform mangling scheme used by all
compilers. Take the situation below in which I think we want to call foo_CTX in
the library. If so, we need a name for it.


a.c:  // Compiled by gcc into a library
#omp declare variant (foo) match(CTX)
void foo_CTX(...) {...}

b.c:  // Compiled by clang linked against the library above.
#omp declare variant (foo) match(CTX)
void foo_CTX(...);

void bar(...) {
  #pragma omp CTX
  foo();   // <- What function (symbol) do we call if a.c was compiled
           //    by gcc and b.c with clang?
}
> In our downstream compiler (Arm compiler for HPC, based on LLVM), we 
> use `declare simd` to provide vector math functions via custom header 
> file. It works brilliantly, if not for specific aspects that would be 
> perfectly covered by the `declare variant`, which might be one of the 
> reason why the OpenMP committee decided to introduce `declare 
> variant`.
But you (assume that you) control the mangling scheme across the entire
infrastructure. Given that the simd mangling is de-facto standardized, that
works.

Side note:
Declare variant, as of 5.0, is not flexible enough for a sensible inclusion of
target specific headers. That will change in 5.1.

> If your concerns is that by adding an attribute that somehow represent 
> something that is available in an external library is not enough to 
> guarantee that that symbol is available in the library… not even C 
> code can guarantee that? If the linker is not pointing to the right 
> library, there is nothing that can prevent it to fail if the symbol is 
> not present?
I don't follow the example you describe. I don't want to change anything
in how symbols are looked up or what happens if they are missing.

> >> I used an example based on `declare simd` instead of `declare 
> >> variant` because the attribute/metadata needed for `declare 
> >> variant` is a modification of the one needed for `declare simd`, 
> >> which has already been agreed in a previous RFC proposed by Intel 
> >> [1], and for which Intel has already provided an implementation 
> >> [2]. The changes proposed in this RFC are fully compatible with
the
> >> work that is being don for the VecClone pass in [2].
> >> 
> >> [1] http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
> >> [2] VecCLone pass: https://reviews.llvm.org/D22792
> > 
> > Having an agreed upon mangling for the older feature is not 
> > necessarily important here. We will need more functionality for 
> > variants and keeping the old scheme around with some metadata is not 
> > an extensible long-term solution. So, I would not try to fit 
> > variants into the existing simd-scheme but instead do it the other 
> > way around. We define what we need for variants and implement simd in
that scheme.
> > 
> 
> I kinda think that having agreed on something is important. It allows 
> to build other things on top of what have been agreed without breaking 
> compatibility.
>
> On the specific, which are the new functionalities needed for the 
> variants that would make the current metadata (attributes) for declare 
> simd non extensible?
See first comment.
> >> The good news is that as far as AArch64 and x86 are concerned, the
only thing that will differ in the mangled name is the “<isa>” token. As
far as I can tell, the mangling scheme of the rest of the vector name is the
same, therefore a lot of infrastructure in terms of mangling and demangling can
be reused. In fact, the `mangleVectorParameters` function in
https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could
already be shared among x86 and aarch64.
> >> 
> >> TOPIC 2: metadata vs attribute
> >> 
> >> From a functionality point of view, I don’t care whether we use
metadata or attributes. The VecClone pass mentioned in TOPIC 1 uses the
following:
> >> 
> >> attributes #0 = { nounwind uwtable 
> >>
“vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_ve
> >>
c_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM1
> >> 6vv_vec_sum,_ZGVeN16”}
> >> 
> >> This is an attribute (I though it was metadata?), I am happy to
reword the RFC using the right terminology (sorry for messing this up).
> >> 
> >> Also, @Renato expressed concern that metadata might be dropped by
optimization passes - would using attributes prevent that?
> >> 
> >> TOPIC 3: "There is no way to notify the backend how
conformant the SIMD versions are.”
> >> 
> >> @Shawn, I am afraid I don’t understand what you mean by
“conformant” here. Can you elaborate with an example?
> >> 
> >> TOPIC 3: interaction of the `omp declare variant` with `clang 
> >> declare variant`
> >> 
> >> I believe this is described in the `Option behavior, and
interaction with OpenMP`. The option `-fclang-declare-variant` is there to make
the OpenMP based one orthogonal. Of course, we might decide to make
-fclang-declare-variant on/off by default, and have default behavior when
interacting with -fopenmp-simd. For the sake of compatibility with other
compilers, we might need to require -fno-clang-declare-variant when targeting
-fopenmp-[simd].
> >> 
> >> TOPIC 3: "there are no special arguments / flags / status
regs that are used / changed in the vector version that the compiler will have
to "just know”
> >> 
> >> I believe that this concern is raised by the problem of handling
FP exceptions? If that’s the case, the compiler is not allowed to do any
assumption on the vector function about that, and treat it with the same
knowledge of any other function, depending on the visibility it has in the
compilation unit. @Renato, does this answer your question?
> >> 
> >> TOPIC 4: attribute in function declaration vs attribute function 
> >> call site
> >> 
> >> We discussed this in the previous version of the proposal. Having
it in the call sites guarantees that incompatible vector version are used when
merging modules compiled for different targets. I don’t have a use case for
this, if I remember correctly this was asked by @Hideki Saito. Hideki, any
comment on this?
> >> 
> >> TOPIC 5: overriding system header (the discussion on #pragma
omp/clang/system variants initiated by @Hal Finkel).
> >> 
> >> I though that the split among #pragma clang declare variant and
#pragma omp declare variant was already providing the orthogonality between
system header and user header. Meaning that a user should always prefer the omp
version (for portability to other compilers) instead of the #pragma clang one,
which would be relegated to system headers and headers provided by the compiler.
Am I missing something? If so, I am happy to add a “system” version of the
directive, as it would be quite easy to do given most of the parsing
infrastructure will be shared.
> >> 
> >> 
> >>> On May 30, 2019, at 12:53 PM, Philip Reames <listmail at
philipreames.com> wrote:
> >>> 
> >>> 
> >>> On 5/30/19 9:05 AM, Doerfert, Johannes wrote:
> >>>> On 05/29, Finkel, Hal J. via cfe-dev wrote:
> >>>>> On 5/29/19 1:52 PM, Philip Reames wrote:
> >>>>>> On 5/28/19 7:55 PM, Finkel, Hal J. wrote:
> >>>>>>> On 5/28/19 3:31 PM, Philip Reames via cfe-dev
wrote:
> >>>>>>>> I generally like the idea of having
support in IR for
> >>>>>>>> vectorization of custom functions.  I have
several use cases which would benefit from this.
> >>>>>>>> 
> >>>>>>>> I'd suggest a couple of reframings to
the IR representation though.
> >>>>>>>> 
> >>>>>>>> First, this should probably be specified
as
> >>>>>>>> metadata/attribute on a function
declaration.  Allowing the
> >>>>>>>> callsite variant is fine, but it should
primarily be a
> >>>>>>>> property of the called function, not of
the call site.  Being able to specify it once per declaration is much cleaner.
> >>>>>>> I agree. We should support this both on the
function
> >>>>>>> declaration and on the call sites.
> >>>>>>> 
> >>>>>>> 
> >>>>>>>> Second, I really don't like the
mangling use here.  We need a
> >>>>>>>> better way to specify the properties of
the function then
> >>>>>>>> it's mangled name.  One thought to
explore is to directly use
> >>>>>>>> the Value of the function declaration
(since this is metadata
> >>>>>>>> and we can do that), and then tie the
properties to the
> >>>>>>>> function declaration in some way?  Sorry,
I don't really have a specific suggestion here.
> >>>>>>> Is the problem the mangling or the fact that
the mangling is
> >>>>>>> ABI/target-specific? One option is to use
LLVM's mangling
> >>>>>>> scheme (the one we use for intrinsics) and
then provide some
> >>>>>>> backend infrastructure to translate later.
> >>>>>> Well, both honestly.  But mangling with a
non-target specific scheme is
> >>>>>> a lot better, so I might be okay with that.   Good
idea.
> >>>>> 
> >>>>> I liked your idea of directly encoding the signature
in the
> >>>>> metadata, but I think that we want to continue to use 
> >>>>> attributes, and not metadata, and the options for
attributes
> >>>>> seem more limited - unless we allow attributes to take
metadata
> >>>>> arguments - maybe that's an enhancement worth
considering.
> >>>> I recently talked to people in the OpenMP language
committee
> >>>> meeting about this and, thinking forward to the actual 
> >>>> implementation/use of the OpenMP 5.x declare variant
feature, I'd say:
> >>>> 
> >>>> - We will need a mangling scheme if we want to allow
variants on
> >>>>   declarations that are defined elsewhere.
> >>>> - We will need a (OpenMP) standardized mangling scheme if
we want
> >>>>   interoperability between compilers.
> >>>> 
> >>>> I assume we want both so I think we will need both.
> >>> If I'm reading this correctly, this describes a need for
the
> >>> frontend to have a mangling scheme.  Nothing in here would
seem to
> >>> prevent the frontend for generating a declaration for a
mangled
> >>> external symbol and then referencing that declaration.  Am I
missing something?
> >>>> 
> >>>> That said, I think this should allow us to avoid 
> >>>> attributes/metadata which seems to me like a good thing
right now.
> >>>> 
> >>>> Cheers,
> >>>> Johannes
> >>>> 
> >>>> 
> >>>>>>>> On 5/28/19 12:44 PM, Francesco Petrogalli
via llvm-dev wrote:
> >>>>>>>>> Dear all,
> >>>>>>>>> 
> >>>>>>>>> This RFC is a proposal to provide
auto-vectorization functionality for user provided vector functions.
> >>>>>>>>> 
> >>>>>>>>> The proposal is a modification of an
RFC that I have sent out a couple of months ago, with the title `[RFC]
Re-implementing -fveclib with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
> >>>>>>>>> 
> >>>>>>>>> The original RFC was proposing to
re-implement the `-fveclib` command line option. This proposal avoids that, and
limits its scope to the mechanics of providing vector function in user code that
the compiler can pick up for auto-vectorization. This narrower scope limits the
impact of changes that are needed in both clang and LLVM.
> >>>>>>>>> 
> >>>>>>>>> Please let me know what you think.
> >>>>>>>>> 
> >>>>>>>>> Kind regards,
> >>>>>>>>> 
> >>>>>>>>> Francesco
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>>
===========================================================>
>>>>>>>>> ====================>
>>>>>>>>>
> >>>>>>>>> Introduction
> >>>>>>>>> ===========>
>>>>>>>>>
> >>>>>>>>> This RFC encompasses the proposal of
informing the
> >>>>>>>>> vectorizer about the availability of
vector functions
> >>>>>>>>> provided by the user. The mechanism is
based on the use of
> >>>>>>>>> the directive `declare variant`
introduced in OpenMP
> >>>>>>>>> 5.0 [^1].
> >>>>>>>>> 
> >>>>>>>>> The mechanism proposed has the
following properties:
> >>>>>>>>> 
> >>>>>>>>> 1.  Decouples the compiler front-end
that knows about the availability
> >>>>>>>>>     of vectorized routines, from the
back-end that knows how to make use
> >>>>>>>>>     of them.
> >>>>>>>>> 2.  Enable support for a
developer's own vector libraries without
> >>>>>>>>>     requiring changes to the compiler.
> >>>>>>>>> 3.  Enables other frontends (e.g. f18)
to add scalar-to-vector function
> >>>>>>>>>     mappings as relevant for their own
runtime libraries, etc.
> >>>>>>>>> 
> >>>>>>>>> The implemetation consists of two
separate sets of changes.
> >>>>>>>>> 
> >>>>>>>>> The first set is a set o changes in
`llvm`, and consists of:
> >>>>>>>>> 
> >>>>>>>>> 1.  [Changes in LLVM IR](#llvmIR) to
provide information about the
> >>>>>>>>>     availability of user-defined
vector functions via metadata attached
> >>>>>>>>>     to an `llvm::CallInst`.
> >>>>>>>>> 2.  [An
infrastructure](#infrastructure) that can be queried to retrive
> >>>>>>>>>     information about the available
vector functions associated to a
> >>>>>>>>>     `llvm::CallInst`.
> >>>>>>>>> 3.  [Changes in the
LoopVectorizer](#LV) to use the API to query the
> >>>>>>>>>     metadata.
> >>>>>>>>> 
> >>>>>>>>> The second set consists of the changes
[changes in
> >>>>>>>>> clang](#clang) that are needed too to
recognize the `#pragma
> >>>>>>>>> clang declare variant` directive.
> >>>>>>>>> 
> >>>>>>>>> Proposed changes
> >>>>>>>>> ===============>
>>>>>>>>>
> >>>>>>>>> We propose an implementation that uses
`#pragma clang
> >>>>>>>>> declare variant` to inform the backend
components about the
> >>>>>>>>> availability of vector version of
scalar functions found in
> >>>>>>>>> IR. The mechanism relies in storing
such information in IR
> >>>>>>>>> metadata, and therefore makes the
auto-vectorization of
> >>>>>>>>> function calls a mid-end (`opt`)
process that is independent on the front-end that generated such IR metadata.
> >>>>>>>>> 
> >>>>>>>>> This implementation provides a generic
mechanism that the
> >>>>>>>>> users of the LLVM compiler will be
able to use for
> >>>>>>>>> interfacing their own vector routines
for generic code.
> >>>>>>>>> 
> >>>>>>>>> The implementation can also expose
vectorization-specific
> >>>>>>>>> descriptors -- for example, like the
`linear` and `uniform`
> >>>>>>>>> clauses of the OpenMP `declare simd`
directive -- that could
> >>>>>>>>> be used to finely tune the automatic
vectorization of some
> >>>>>>>>> functions (think for example the
vectorization of `double
> >>>>>>>>> sincos(double , double *, double *)`,
where `linear` can be
> >>>>>>>>> used to give extra information about
the memory layout of the 2 pointers parameters in the vector version).
> >>>>>>>>> 
> >>>>>>>>> The directive `#pragma clang declare
variant` follows the
> >>>>>>>>> syntax of the `#pragma omp declare
variant` directive of OpenMP.
> >>>>>>>>> 
> >>>>>>>>> We define the new directive in the
`clang` namespace instead
> >>>>>>>>> of using the `omp` one of OpenMP to
allow the compiler to
> >>>>>>>>> perform auto-vectorization outside of
an OpenMP SIMD context.
> >>>>>>>>> 
> >>>>>>>>> The mechanism is base on OpenMP to
provide a uniform user
> >>>>>>>>> experience across the two mechanism,
and to maximise the
> >>>>>>>>> number of shared components of the
infrastructure needed in
> >>>>>>>>> the compiler frontend to enable the
feature.
> >>>>>>>>> 
> >>>>>>>>> Changes in LLVM IR {#llvmIR}
> >>>>>>>>> ------------------
> >>>>>>>>> 
> >>>>>>>>> The IR is enriched with metadata that
details the
> >>>>>>>>> availability of vector versions of an
associated scalar
> >>>>>>>>> function. This metadata is attached to
the call site of the scalar function.
> >>>>>>>>> 
> >>>>>>>>> The metadata takes the form of an
attribute containing a
> >>>>>>>>> comma separated list of vector
function mappings. Each entry
> >>>>>>>>> has a unique name that follows the
Vector Function ABI[^2]
> >>>>>>>>> and real name that is used when
generating calls to this vector function.
> >>>>>>>>> 
> >>>>>>>>>     vfunc_name1(real_name1),
vfunc_name2(real_name2)
> >>>>>>>>> 
> >>>>>>>>> The Vector Function ABI name describes
the signature of the
> >>>>>>>>> vector function so that properties
like vectorisation factor
> >>>>>>>>> can be queried during compilation.
> >>>>>>>>> 
> >>>>>>>>> The `(real name)` token is optional
and assumed to match the
> >>>>>>>>> Vector Function ABI name when omitted.
> >>>>>>>>> 
> >>>>>>>>> For example, the availability of a
2-lane double precision
> >>>>>>>>> `sin` function via SVML when targeting
AVX on x86 is
> >>>>>>>>> provided by the following IR.
> >>>>>>>>> 
> >>>>>>>>>     // ...
> >>>>>>>>>     ... = call double @sin(double) #0
> >>>>>>>>>     // ...
> >>>>>>>>> 
> >>>>>>>>>     #0 = { vector-variant =
{"_ZGVcN2v_sin(__svml_sin2),
> >>>>>>>>>                              
_ZGVdN4v_sin(__svml_sin4),
> >>>>>>>>>                              
..."} }
> >>>>>>>>> 
> >>>>>>>>> The string
`"_ZGVcN2v_sin(__svml_sin2)"` in this
> >>>>>>>>> vector-variant attribute provides
information on the shape
> >>>>>>>>> of the vector function via the string
`_ZGVcN2v_sin`,
> >>>>>>>>> mangled according to the Vector
Function ABI for Intel, and
> >>>>>>>>> remaps the standard Vector Function
ABI name to the non-standard name `__svml_sin2`.
> >>>>>>>>> 
> >>>>>>>>> This metadata is compatible with the
proposal "Proposal for
> >>>>>>>>> function vectorization and loop
vectorization with function
> >>>>>>>>> calls",[^3] that uses Vector
Function ABI mangled names to
> >>>>>>>>> inform the vectorizer about the
availability of vector
> >>>>>>>>> functions. The proposal extends the
original by allowing the
> >>>>>>>>> explicit mapping of the Vector
Function ABI mangled name to a non-standard name, which allows the use of
existing vector libraries.
> >>>>>>>>> 
> >>>>>>>>> The `vector-variant` attribute needs
to be attached on a
> >>>>>>>>> per-call basis to avoid conflicts when
merging modules with different vector variants.
> >>>>>>>>> 
> >>>>>>>>> The query infrastructure: SVFS
{#infrastructure}
> >>>>>>>>> ------------------------------
> >>>>>>>>> 
> >>>>>>>>> The Search Vector Function System
(SVFS) is constructed from
> >>>>>>>>> an `llvm::Module` instance so it can
create function
> >>>>>>>>> definitions. The SVFS exposes an API
with two methods.
> >>>>>>>>> 
> >>>>>>>>> ### `SVFS::isFunctionVectorizable`
> >>>>>>>>> 
> >>>>>>>>> This method queries the avilability of
a vectorized version
> >>>>>>>>> of a function. The signature of the
method is as follows.
> >>>>>>>>> 
> >>>>>>>>>     bool
isFunctionVectorizable(llvm::CallInst * Call,
> >>>>>>>>> ParTypeMap Params);
> >>>>>>>>> 
> >>>>>>>>> The method determine the availability
of vector version of
> >>>>>>>>> the function invoked by the `Call`
parameter by looking at
> >>>>>>>>> the `vector-variant` metadata.
> >>>>>>>>> 
> >>>>>>>>> The `Params` argument is a map that
associates the position
> >>>>>>>>> of a parameter in the `CallInst` to
its `ParameterType`
> >>>>>>>>> descriptor. The `ParameterType`
descriptor holds information
> >>>>>>>>> about the shape of the correspondend
parameter in the
> >>>>>>>>> signature of the vector function. This
`ParamaterType` is
> >>>>>>>>> used to query the SVMS about the
availability of vector
> >>>>>>>>> version that have `linear`, `uniform`
or `align` parameters (in the sense of OpenMP 4.0 and onwards).
> >>>>>>>>> 
> >>>>>>>>> The method `isFunctionVectorizable`,
when invoked with an
> >>>>>>>>> empty `ParTypeMap`, is equivalent to
the `TargetLibraryInfo`
> >>>>>>>>> method
`isFunctionVectorizable(StrinRef Name)`.
> >>>>>>>>> 
> >>>>>>>>> ### `SVFS::getVectorizedFunction`
> >>>>>>>>> 
> >>>>>>>>> This method returns the vector
function declaration that
> >>>>>>>>> correspond to the needs of the
vectorization technique that is being run.
> >>>>>>>>> 
> >>>>>>>>> The signature of the function is as
follows.
> >>>>>>>>> 
> >>>>>>>>>     std::pair<llvm::FunctionType *,
std::string> getVectorizedFunction(
> >>>>>>>>>       llvm::CallInst * Call, unsigned
VF, bool IsMasked,
> >>>>>>>>> ParTypeSet Params);
> >>>>>>>>> 
> >>>>>>>>> The `Call` parameter is the call
instance that is being
> >>>>>>>>> vectorized, the `VF` parameter
represent the vectorization
> >>>>>>>>> factor (how many lanes), the
`IsMasked` parameter decides
> >>>>>>>>> whether or not the signature of the
vector function is
> >>>>>>>>> required to have a mask parameter, the
`Params` parameter
> >>>>>>>>> describes the shape of the vector
function as in the `isFunctionVectorizable` method.
> >>>>>>>>> 
> >>>>>>>>> The methods uses the `vector-variant`
metadata and returns
> >>>>>>>>> the function signature and the name of
the function based on the input parameters.
> >>>>>>>>> 
> >>>>>>>>> The SVFS can add new function
definitions, in the same
> >>>>>>>>> module as the `Call`, to provide
vector functions that are
> >>>>>>>>> not present within the vector-variant
metadata. For example,
> >>>>>>>>> if a library provides a vector version
of a function with a
> >>>>>>>>> vectorization factor of 2, but the
vectorizer is requesting
> >>>>>>>>> a vectorization factor of 4, the SVFS
is allowed to create a
> >>>>>>>>> definition that calls the 2-lane
version twice. This
> >>>>>>>>> capability applies similarly for
providing masked and unmasked versions when the request does not match what is
available in the library.
> >>>>>>>>> 
> >>>>>>>>> This method is equivalent to the TLI
method `StringRef
> >>>>>>>>> getVectorizedFunction(StringRef F,
unsigned VF) const;`.
> >>>>>>>>> 
> >>>>>>>>> Notice that to fully support OpenMP
vectorization we need to
> >>>>>>>>> think about a fuzzy matching mechanism
that is able to
> >>>>>>>>> select a candidate in the calling
context. However, this
> >>>>>>>>> proposal is intended for
scalar-to-vector mappings of
> >>>>>>>>> math-like functions that are most
likely to associate a
> >>>>>>>>> unique vector candidate in most
contexts. Therefore,
> >>>>>>>>> extending this behavior to a generic
one is an aspect of the implementation that will be treated in a separate RFC
about the vectorization pass.
> >>>>>>>>> 
> >>>>>>>>> ### Scalable vectorization
> >>>>>>>>> 
> >>>>>>>>> Both methods of the SVFS API will be
extended with a boolean
> >>>>>>>>> parameter to specify whether scalable
signatures are needed
> >>>>>>>>> by the user of the SVFS.
> >>>>>>>>> 
> >>>>>>>>> Changes in clang {#clang}
> >>>>>>>>> ----------------
> >>>>>>>>> 
> >>>>>>>>> We use clang to generate the metadata
described above.
> >>>>>>>>> 
> >>>>>>>>> In the compilation unit, the vector
function definition or
> >>>>>>>>> declaration must be visible and
associated to the scalar
> >>>>>>>>> version via the `#pragma clang declare
variant` according to
> >>>>>>>>> the rule defined by the correspondent
`#pragma omp declare
> >>>>>>>>> variant` defined in OpenMP 5.0, as in
the following example.
> >>>>>>>>> 
> >>>>>>>>>     #pragma clang declare
variant(vector_sinf) \
> >>>>>>>>>    
match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})
> >>>>>>>>>     extern float sinf(float);
> >>>>>>>>> 
> >>>>>>>>>     float32x4_t
vector_sinf(float32x4_t x);
> >>>>>>>>> 
> >>>>>>>>> The `construct` set in the directive,
together with the
> >>>>>>>>> `device` set, is used to generate the
vector mangled name to
> >>>>>>>>> be used in the `vector-variant`
attribute, for example
> >>>>>>>>> `_ZGVnN2v_sin`, when targeting
> >>>>>>>>> AArch64 Advanced SIMD code generation.
The rule for mangling
> >>>>>>>>> the name of the scalar function in the
vector name are
> >>>>>>>>> defined in the the Vector Function ABI
specification of the target.
> >>>>>>>>> 
> >>>>>>>>> The part of the vector-variant
attribute that redirects the
> >>>>>>>>> call to `vector_sinf` is derived from
the `variant-id`
> >>>>>>>>> specified in the `variant` clause.
> >>>>>>>>> 
> >>>>>>>>> Summary
> >>>>>>>>> ======>
>>>>>>>>>
> >>>>>>>>> New `clang` directive in clang
> >>>>>>>>> ------------------------------
> >>>>>>>>> 
> >>>>>>>>> `#pragma omp declare variant`, same as
`#pragma omp declare
> >>>>>>>>> variant` restricted to the `simd`
context selector, from OpenMP 5.0+.
> >>>>>>>>> 
> >>>>>>>>> Option behavior, and interaction with
OpenMP
> >>>>>>>>>
--------------------------------------------
> >>>>>>>>> 
> >>>>>>>>> The behavior described below makes
sure that `#pragma cland
> >>>>>>>>> declare variant` function
vectorization and OpenMP function
> >>>>>>>>> vectorization are orthogonal.
> >>>>>>>>> 
> >>>>>>>>> `-fclang-declare-variant`
> >>>>>>>>> 
> >>>>>>>>> :   The `#pragma clang declare
variant` directives are parsed and used
> >>>>>>>>>     to populate the `vector-variant`
attribute.
> >>>>>>>>> 
> >>>>>>>>> `-fopenmp[-simd]`
> >>>>>>>>> 
> >>>>>>>>> :   The `#pragma omp declare variant`
directives are parsed and used to
> >>>>>>>>>     populate the `vector-variant`
attribute.
> >>>>>>>>> 
> >>>>>>>>> `-fopenmp[-simd]`and
`-fno-clang-declare-variant`
> >>>>>>>>> 
> >>>>>>>>> :   The directive `#pragma omp declare
variant` is used to populate the
> >>>>>>>>>     `vector-variant` attribute in IR.
The directive
> >>>>>>>>>     `#pragma   clang declare variant`
are ignored.
> >>>>>>>>> 
> >>>>>>>>> [^1]: 
> >>>>>>>>>
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specif
> >>>>>>>>> ication-5.0.pdf>
> >>>>>>>>> 
> >>>>>>>>> [^2]: Vector Function ABI for x86:
> >>>>>>>>>    
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
> >>>>>>>>>     Vector Function ABI for AArch64:
> >>>>>>>>>     
> >>>>>>>>>
https://developer.arm.com/products/software-development-tool
> >>>>>>>>>
s/hpc/arm-compiler-for-hpc/vector-function-abi
> >>>>>>>>> 
> >>>>>>>>> [^3]: 
> >>>>>>>>>
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.h
> >>>>>>>>> tml>
> >>>>>>>>> 
> >>>>>>>>>
_______________________________________________
> >>>>>>>>> LLVM Developers mailing list llvm-dev
at lists.llvm.org
> >>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>>>>>>
_______________________________________________
> >>>>>>>> cfe-dev mailing list
> >>>>>>>> cfe-dev at lists.llvm.org
> >>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>>>> --
> >>>>> Hal Finkel
> >>>>> Lead, Compiler Technology and Programming Languages
Leadership
> >>>>> Computing Facility Argonne National Laboratory
> >>>>> 
> >>>>> _______________________________________________
> >>>>> cfe-dev mailing list
> >>>>> cfe-dev at lists.llvm.org
> >>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >> 
> > 
> > --
> > 
> > Johannes Doerfert
> > Researcher
> > 
> > Argonne National Laboratory
> > Lemont, IL 60439, USA
> > 
> > jdoerfert at anl.gov
> 
-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov

Doerfert, Johannes via llvm-dev

2019-May-31 20:31 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

On 05/31, Saito, Hideki wrote:> 
> >This works for variants that are created from definitions in the module
but what about #omp declare simd declarations?
> 
> I'm sorry that I haven't digested this thread in its entirety, but
let me just deal with this one point for now.
> Suppose #pragma omp declare simd is applied to foo(). I'd expect the
corresponding Function Attrs to be attached to
> "declare dso_local void @foo(i32)". For Intel vector function
ABI, the attribute comes with a mangled name like "_ZGV........foo".
OK. And we will do the same for #omp declare variant match(simd=...), right?
What if we have #omp declare variant match(parallel, simd=...)?

(My syntax might be off, if you don't understand what I mean, please let
 me know and I correct the syntax)
> ---------------------------------
> void foo(int x);
>  
> void bar(){
>   foo(3);
> }
> 
> ==>
> 
> ; Function Attrs: noinline nounwind optnone uwtable
> define dso_local void @bar() #0 {
>   call void @foo(i32 3)
>   ret void
> }
>  
> declare dso_local void @foo(i32) #1
> 
> -----Original Message-----
> From: Doerfert, Johannes [mailto:jdoerfert at anl.gov] 
> Sent: Friday, May 31, 2019 12:56 PM
> To: Francesco Petrogalli <Francesco.Petrogalli at arm.com>
> Cc: Philip Reames <listmail at philipreames.com>; Finkel, Hal J.
<hfinkel at anl.gov>; LLVM Development List <llvm-dev at
lists.llvm.org>; nd <nd at arm.com>; Saito, Hideki <hideki.saito at
intel.com>; Clang Dev <cfe-dev at lists.llvm.org>; scogland1 at
llnl.gov
> Subject: Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided vector
function for auto-vectorization.
> 
> I think I did misunderstand what you want to do with attributes. This is my
bad. Let me try to explain:
> 
> It seems you want the "vector-variants" attributes (which I could
not find with this name in trunk, correct?) to "remember" what vector
versions can be created (wrt. validity), assuming a definition is available?
Correct?
> What I was concerned with is the example I sketched somewhere below which
motivates the need for a generalized/standardized name mangling for OpenMP. I
though you wanted to avoid that somehow but if you don't I misunderstood
you. I basically removed the part where the vector versions have to be created
first but I assumed them to be existent (in the module or somewhere else). That
is, I assumed a call to foo and various symbols available that are
specializations of foo. When we then vectorize foo (or otherwise specialize at
some point in the future), you would scan the module and pick the best match
based on the context of the call.
> 
> Now I don't know if I understood your proposal by now but let me ask a
question anyway:
> 
> VecClone.cpp:276-278 mentions that the vectorizer is supposed to look at
the vector-variants functions. This works for variants that are created from
definitions in the module but what about #omp declare simd declarations?
> 
> 
> On 05/31, Francesco Petrogalli wrote:
> > > On May 31, 2019, at 11:47 AM, Doerfert, Johannes <jdoerfert at
anl.gov> wrote:
> > > 
> > > I think we should split this discussion:
> > >  TOPIC 1 & 2 & 4: How do implement all use cases and
OpenMP 5.X
> > >                   features, including compatibility with other
> > >                   compilers and cross module support.
> > 
> > Yes, and we have to carefully make this as standard and compatible as
possible.
> 
> Agreed.
> 
> 
> > >  TOPIC 3b & 5: Interoperability with clang declare (system
vs. user
> > >                 declares)
> > 
> > 
> > I think that Alexey explanation of how the directive are handled 
> > internally in the frontend makes us propound towards the attribute.
> 
> How things are handled right now, especially given that declare variant is
not handled at all, should not limit our design space. If the argument is that
we cannot reasonably implement a solution, that is a different story.
> 
> 
> > >  TOPIC 3a & 3c: floating point issues?
> > > 
> > 
> > I believe there is no issue there. I have quoted the openMP standard
in reply to Renato:
> > 
> > See
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page
118, lines 23-24:
> > 
> > “The execution of the function or subroutine cannot have any side 
> > effects that would alter its execution for concurrent iterations of a 
> > SIMD chunk."
> 
> Great.
> 
> 
> > > I inlined comments for Topic 1 below.
> > > 
> > > I hope that we do not have to discuss topic 2 if we agree neither
> > > attributes nor metadata is necessary, or better, will solve the 
> > > actual problem at hand. I don't have strong feeling on topic
4 but I
> > > have the feeling this will become less problematic once we figure
out topic 1.
> > > 
> > > Thanks,
> > >  Johannes
> > > 
> > > 
> > > On 05/31, Francesco Petrogalli wrote:
> > >> # TOPIC 1: concerns about name mangling
> > >> 
> > >> I understand that there are concerns in using the mangling
scheme I
> > >> proposed, and that it would be preferred to have a mangling
scheme
> > >> that is based on (and standardized by) OpenMP.
> > > 
> > > I still think it will be required to have a standardized one, not
> > > only preferred.
> > > 
> > > 
> > 
> > I am all with you in standardizing. x86 and arch64 have their own 
> > vector function ABI, which, although “private”, are to be considered 
> > standard. Opensource and commercial compilers are using them, 
> > therefore we have to deal with this mangling scheme, whether or not 
> > OpenMP comes up with a standard mangling scheme.
> 
> I don't get the point you are trying to make here. What do you mean by
"we have to deal with"? (I do not suggest to get rid of them.)
> 
>  
> > >> I hear the argument on having some common ground here. In
fact,
> > >> there is already common ground between the x86 and aarch64
backend,
> > >> who have based their respective Vector Function ABI
specifications on OpenMP.
> > >> 
> > >> In fact, the mangled name grammar can be summarized as
follows:
> > >> 
> > >> _ZGV<isa><masking><VLEN><parameter
type>_<scalar name>
> > >> 
> > >> Across vector extensions the only <token> that will
differ is the
> > >> <isa> token.
> > >> 
> > >> This might lead people to think that we could drop the
_ZGV<isa>
> > >> prefix and consider the
<masking><VLEN><parameter type>_<scalar
> > >> name> part as a sort of unofficial OpenMP mangling scheme:
in fact,
> > >> the signature of an “unmasked 2-lane vector vector of `sin`”
will
> > >> always be `<2 x double>(2 x double>).
> > >> 
> > >> The problem with this choice is the number of vector version 
> > >> available for a target is not unique.
> > > 
> > > For me, this simply means this mangling scheme is not sufficient.
> > > 
> > 
> > Can you explain more why you think the mangling scheme is not 
> > sufficient? The mangling scheme is shaped to provide all the 
> > information that the OpenMP directive describes.
> 
> I don't know if it is insufficient but I though you hinted towards
that.
> If we can handle/decode everything we need for declare variants then I do
not object at all. If not, we require respective extension such that we can. The
result should be a superset of the current SIMD encoding and compatible with the
current one.
> 
> 
> 
> > The fact that x86 and aarch64 realize such information in different 
> > way (multiple signature/vector extensions) is something that cannot be
> > avoided, because it is related to architectural aspects that are 
> > specific to the vector extension and transparent to the OpenMP 
> > standard.
> 
> I don't think that is a problem (that's why I "failed to see
the problem" in the comment below). I look at it this way: If #declare
simd, or similar, results in N variants, it should at the end of the day not be
different from declaring these N variants explicitly with the respective declare
variant match clause.
> 
> 
> > >> In particular, the following declaration generates multiple
vector
> > >> versions, depending on the target:
> > >> 
> > >> #pragma omp declare simd simdlen(2) notinbranch double
foo(double)
> > >> {…};
> > >> 
> > >> On x86, this generates at least 4 symbols (one for SSE, one
for
> > >> AVX, one for AVX2, and one for AVX512: 
> > >> https://godbolt.org/z/TLYXPi)
> > >> 
> > >> On aarch64, the same declaration generates a unique symbol,
as
> > >> specified in the Vector Function ABI.
> > > 
> > > I fail to see the problem. We generate X symbols for X different 
> > > contexts. Once we get to the point where we vectorize, we
determine
> > > which context fits best and choose the corresponding symbol
version.
> > > 
> > 
> > Yes, this is exactly what we need to do, under the constrains that the
> > rules for  generating "X symbols for X different contexts” are
decided
> > by the Vector Function ABI of the target.
> 
> Sounds good. The vector ABI is used to determine what contexts exists and
what symbols should be created. I would assume the encoding should be the same
as if we specified the versions (/contexts) ourselves via #declare variant.
> 
> 
> > > Maybe my view is to naive here, please feel free to correct me.
> > > 
> > > 
> > >> This means that the attribute (or metadata) that carries the 
> > >> information on the available vector version needs to deal
also with
> > >> things that are not usually visible at IR level, but that
might
> > >> still need to be provided to be able to decide which
particular
> > >> instruction set/ vector extension needs to be targeted.
> > > 
> > > The symbol names should carry all the information we need. If
they
> > > do not, we need to improve the mangling scheme such that they do.
> > > There is no attributes/metadata we could use at library
boundaries.
> > > 
> > Hum, I am not sure what you mean by "There is no
attributes/metadata
> > we could use at library boundaries."
> 
> (This seems to be part of the misunderstanding, I leave my comment here
> anyway:)
> 
> The simd-related stuff works because it is a uniform mangling scheme used
by all compilers. Take the situation below in which I think we want to call
foo_CTX in the library. If so, we need a name for it.
> 
> 
> a.c:  // Compiled by gcc into a library
> #omp declare variant (foo) match(CTX)
> void foo_CTX(...) {...}
> 
> b.c:  // Compiled by clang linked against the library above.
> #omp declare variant (foo) match(CTX)
> void foo_CTX(...);
> 
> void bar(...) {
>   #pragma omp CTX
>   foo();   // <- What function (symbol) do we call if a.c was compiled
>            //    by gcc and b.c with clang?
> }
> 
> > In our downstream compiler (Arm compiler for HPC, based on LLVM), we 
> > use `declare simd` to provide vector math functions via custom header 
> > file. It works brilliantly, if not for specific aspects that would be 
> > perfectly covered by the `declare variant`, which might be one of the 
> > reason why the OpenMP committee decided to introduce `declare 
> > variant`.
> 
> But you (assume that you) control the mangling scheme across the entire
infrastructure. Given that the simd mangling is de-facto standardized, that
works.
> 
> Side note:
> Declare variant, as of 5.0, is not flexible enough for a sensible inclusion
of target specific headers. That will change in 5.1.
> 
> 
> > If your concerns is that by adding an attribute that somehow represent
> > something that is available in an external library is not enough to 
> > guarantee that that symbol is available in the library… not even C 
> > code can guarantee that? If the linker is not pointing to the right 
> > library, there is nothing that can prevent it to fail if the symbol is
> > not present?
> 
> I don't follow the example you describe. I don't want to change
anything in how symbols are looked up or what happens if they are missing.
> 
> 
> > >> I used an example based on `declare simd` instead of `declare
> > >> variant` because the attribute/metadata needed for `declare 
> > >> variant` is a modification of the one needed for `declare
simd`,
> > >> which has already been agreed in a previous RFC proposed by
Intel
> > >> [1], and for which Intel has already provided an
implementation
> > >> [2]. The changes proposed in this RFC are fully compatible
with the
> > >> work that is being don for the VecClone pass in [2].
> > >> 
> > >> [1]
http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
> > >> [2] VecCLone pass: https://reviews.llvm.org/D22792
> > > 
> > > Having an agreed upon mangling for the older feature is not 
> > > necessarily important here. We will need more functionality for 
> > > variants and keeping the old scheme around with some metadata is
not
> > > an extensible long-term solution. So, I would not try to fit 
> > > variants into the existing simd-scheme but instead do it the
other
> > > way around. We define what we need for variants and implement
simd in that scheme.
> > > 
> > 
> > I kinda think that having agreed on something is important. It allows 
> > to build other things on top of what have been agreed without breaking
> > compatibility.
> >
> > On the specific, which are the new functionalities needed for the 
> > variants that would make the current metadata (attributes) for declare
> > simd non extensible?
> 
> See first comment.
> 
> > >> The good news is that as far as AArch64 and x86 are
concerned, the only thing that will differ in the mangled name is the
“<isa>” token. As far as I can tell, the mangling scheme of the rest of
the vector name is the same, therefore a lot of infrastructure in terms of
mangling and demangling can be reused. In fact, the `mangleVectorParameters`
function in
https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could
already be shared among x86 and aarch64.
> > >> 
> > >> TOPIC 2: metadata vs attribute
> > >> 
> > >> From a functionality point of view, I don’t care whether we
use metadata or attributes. The VecClone pass mentioned in TOPIC 1 uses the
following:
> > >> 
> > >> attributes #0 = { nounwind uwtable 
> > >>
“vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_ve
> > >>
c_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM1
> > >> 6vv_vec_sum,_ZGVeN16”}
> > >> 
> > >> This is an attribute (I though it was metadata?), I am happy
to reword the RFC using the right terminology (sorry for messing this up).
> > >> 
> > >> Also, @Renato expressed concern that metadata might be
dropped by optimization passes - would using attributes prevent that?
> > >> 
> > >> TOPIC 3: "There is no way to notify the backend how
conformant the SIMD versions are.”
> > >> 
> > >> @Shawn, I am afraid I don’t understand what you mean by
“conformant” here. Can you elaborate with an example?
> > >> 
> > >> TOPIC 3: interaction of the `omp declare variant` with `clang
> > >> declare variant`
> > >> 
> > >> I believe this is described in the `Option behavior, and
interaction with OpenMP`. The option `-fclang-declare-variant` is there to make
the OpenMP based one orthogonal. Of course, we might decide to make
-fclang-declare-variant on/off by default, and have default behavior when
interacting with -fopenmp-simd. For the sake of compatibility with other
compilers, we might need to require -fno-clang-declare-variant when targeting
-fopenmp-[simd].
> > >> 
> > >> TOPIC 3: "there are no special arguments / flags /
status regs that are used / changed in the vector version that the compiler will
have to "just know”
> > >> 
> > >> I believe that this concern is raised by the problem of
handling FP exceptions? If that’s the case, the compiler is not allowed to do
any assumption on the vector function about that, and treat it with the same
knowledge of any other function, depending on the visibility it has in the
compilation unit. @Renato, does this answer your question?
> > >> 
> > >> TOPIC 4: attribute in function declaration vs attribute
function
> > >> call site
> > >> 
> > >> We discussed this in the previous version of the proposal.
Having it in the call sites guarantees that incompatible vector version are used
when merging modules compiled for different targets. I don’t have a use case for
this, if I remember correctly this was asked by @Hideki Saito. Hideki, any
comment on this?
> > >> 
> > >> TOPIC 5: overriding system header (the discussion on #pragma
omp/clang/system variants initiated by @Hal Finkel).
> > >> 
> > >> I though that the split among #pragma clang declare variant
and #pragma omp declare variant was already providing the orthogonality between
system header and user header. Meaning that a user should always prefer the omp
version (for portability to other compilers) instead of the #pragma clang one,
which would be relegated to system headers and headers provided by the compiler.
Am I missing something? If so, I am happy to add a “system” version of the
directive, as it would be quite easy to do given most of the parsing
infrastructure will be shared.
> > >> 
> > >> 
> > >>> On May 30, 2019, at 12:53 PM, Philip Reames <listmail
at philipreames.com> wrote:
> > >>> 
> > >>> 
> > >>> On 5/30/19 9:05 AM, Doerfert, Johannes wrote:
> > >>>> On 05/29, Finkel, Hal J. via cfe-dev wrote:
> > >>>>> On 5/29/19 1:52 PM, Philip Reames wrote:
> > >>>>>> On 5/28/19 7:55 PM, Finkel, Hal J. wrote:
> > >>>>>>> On 5/28/19 3:31 PM, Philip Reames via
cfe-dev wrote:
> > >>>>>>>> I generally like the idea of having
support in IR for
> > >>>>>>>> vectorization of custom functions.  I
have several use cases which would benefit from this.
> > >>>>>>>> 
> > >>>>>>>> I'd suggest a couple of
reframings to the IR representation though.
> > >>>>>>>> 
> > >>>>>>>> First, this should probably be
specified as
> > >>>>>>>> metadata/attribute on a function
declaration.  Allowing the
> > >>>>>>>> callsite variant is fine, but it
should primarily be a
> > >>>>>>>> property of the called function, not
of the call site.  Being able to specify it once per declaration is much
cleaner.
> > >>>>>>> I agree. We should support this both on
the function
> > >>>>>>> declaration and on the call sites.
> > >>>>>>> 
> > >>>>>>> 
> > >>>>>>>> Second, I really don't like the
mangling use here.  We need a
> > >>>>>>>> better way to specify the properties
of the function then
> > >>>>>>>> it's mangled name.  One thought
to explore is to directly use
> > >>>>>>>> the Value of the function declaration
(since this is metadata
> > >>>>>>>> and we can do that), and then tie the
properties to the
> > >>>>>>>> function declaration in some way? 
Sorry, I don't really have a specific suggestion here.
> > >>>>>>> Is the problem the mangling or the fact
that the mangling is
> > >>>>>>> ABI/target-specific? One option is to use
LLVM's mangling
> > >>>>>>> scheme (the one we use for intrinsics)
and then provide some
> > >>>>>>> backend infrastructure to translate
later.
> > >>>>>> Well, both honestly.  But mangling with a
non-target specific scheme is
> > >>>>>> a lot better, so I might be okay with that.  
Good idea.
> > >>>>> 
> > >>>>> I liked your idea of directly encoding the
signature in the
> > >>>>> metadata, but I think that we want to continue to
use
> > >>>>> attributes, and not metadata, and the options for
attributes
> > >>>>> seem more limited - unless we allow attributes to
take metadata
> > >>>>> arguments - maybe that's an enhancement worth
considering.
> > >>>> I recently talked to people in the OpenMP language
committee
> > >>>> meeting about this and, thinking forward to the
actual
> > >>>> implementation/use of the OpenMP 5.x declare variant
feature, I'd say:
> > >>>> 
> > >>>> - We will need a mangling scheme if we want to allow
variants on
> > >>>>   declarations that are defined elsewhere.
> > >>>> - We will need a (OpenMP) standardized mangling
scheme if we want
> > >>>>   interoperability between compilers.
> > >>>> 
> > >>>> I assume we want both so I think we will need both.
> > >>> If I'm reading this correctly, this describes a need
for the
> > >>> frontend to have a mangling scheme.  Nothing in here
would seem to
> > >>> prevent the frontend for generating a declaration for a
mangled
> > >>> external symbol and then referencing that declaration. 
Am I missing something?
> > >>>> 
> > >>>> That said, I think this should allow us to avoid 
> > >>>> attributes/metadata which seems to me like a good
thing right now.
> > >>>> 
> > >>>> Cheers,
> > >>>> Johannes
> > >>>> 
> > >>>> 
> > >>>>>>>> On 5/28/19 12:44 PM, Francesco
Petrogalli via llvm-dev wrote:
> > >>>>>>>>> Dear all,
> > >>>>>>>>> 
> > >>>>>>>>> This RFC is a proposal to provide
auto-vectorization functionality for user provided vector functions.
> > >>>>>>>>> 
> > >>>>>>>>> The proposal is a modification of
an RFC that I have sent out a couple of months ago, with the title `[RFC]
Re-implementing -fveclib with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
> > >>>>>>>>> 
> > >>>>>>>>> The original RFC was proposing to
re-implement the `-fveclib` command line option. This proposal avoids that, and
limits its scope to the mechanics of providing vector function in user code that
the compiler can pick up for auto-vectorization. This narrower scope limits the
impact of changes that are needed in both clang and LLVM.
> > >>>>>>>>> 
> > >>>>>>>>> Please let me know what you
think.
> > >>>>>>>>> 
> > >>>>>>>>> Kind regards,
> > >>>>>>>>> 
> > >>>>>>>>> Francesco
> > >>>>>>>>> 
> > >>>>>>>>> 
> > >>>>>>>>>
===========================================================> >
>>>>>>>>> ====================> >
>>>>>>>>>
> > >>>>>>>>> Introduction
> > >>>>>>>>> ===========> >
>>>>>>>>>
> > >>>>>>>>> This RFC encompasses the proposal
of informing the
> > >>>>>>>>> vectorizer about the availability
of vector functions
> > >>>>>>>>> provided by the user. The
mechanism is based on the use of
> > >>>>>>>>> the directive `declare variant`
introduced in OpenMP
> > >>>>>>>>> 5.0 [^1].
> > >>>>>>>>> 
> > >>>>>>>>> The mechanism proposed has the
following properties:
> > >>>>>>>>> 
> > >>>>>>>>> 1.  Decouples the compiler
front-end that knows about the availability
> > >>>>>>>>>     of vectorized routines, from
the back-end that knows how to make use
> > >>>>>>>>>     of them.
> > >>>>>>>>> 2.  Enable support for a
developer's own vector libraries without
> > >>>>>>>>>     requiring changes to the
compiler.
> > >>>>>>>>> 3.  Enables other frontends (e.g.
f18) to add scalar-to-vector function
> > >>>>>>>>>     mappings as relevant for
their own runtime libraries, etc.
> > >>>>>>>>> 
> > >>>>>>>>> The implemetation consists of two
separate sets of changes.
> > >>>>>>>>> 
> > >>>>>>>>> The first set is a set o changes
in `llvm`, and consists of:
> > >>>>>>>>> 
> > >>>>>>>>> 1.  [Changes in LLVM IR](#llvmIR)
to provide information about the
> > >>>>>>>>>     availability of user-defined
vector functions via metadata attached
> > >>>>>>>>>     to an `llvm::CallInst`.
> > >>>>>>>>> 2.  [An
infrastructure](#infrastructure) that can be queried to retrive
> > >>>>>>>>>     information about the
available vector functions associated to a
> > >>>>>>>>>     `llvm::CallInst`.
> > >>>>>>>>> 3.  [Changes in the
LoopVectorizer](#LV) to use the API to query the
> > >>>>>>>>>     metadata.
> > >>>>>>>>> 
> > >>>>>>>>> The second set consists of the
changes [changes in
> > >>>>>>>>> clang](#clang) that are needed
too to recognize the `#pragma
> > >>>>>>>>> clang declare variant` directive.
> > >>>>>>>>> 
> > >>>>>>>>> Proposed changes
> > >>>>>>>>> ===============> >
>>>>>>>>>
> > >>>>>>>>> We propose an implementation that
uses `#pragma clang
> > >>>>>>>>> declare variant` to inform the
backend components about the
> > >>>>>>>>> availability of vector version of
scalar functions found in
> > >>>>>>>>> IR. The mechanism relies in
storing such information in IR
> > >>>>>>>>> metadata, and therefore makes the
auto-vectorization of
> > >>>>>>>>> function calls a mid-end (`opt`)
process that is independent on the front-end that generated such IR metadata.
> > >>>>>>>>> 
> > >>>>>>>>> This implementation provides a
generic mechanism that the
> > >>>>>>>>> users of the LLVM compiler will
be able to use for
> > >>>>>>>>> interfacing their own vector
routines for generic code.
> > >>>>>>>>> 
> > >>>>>>>>> The implementation can also
expose vectorization-specific
> > >>>>>>>>> descriptors -- for example, like
the `linear` and `uniform`
> > >>>>>>>>> clauses of the OpenMP `declare
simd` directive -- that could
> > >>>>>>>>> be used to finely tune the
automatic vectorization of some
> > >>>>>>>>> functions (think for example the
vectorization of `double
> > >>>>>>>>> sincos(double , double *, double
*)`, where `linear` can be
> > >>>>>>>>> used to give extra information
about the memory layout of the 2 pointers parameters in the vector version).
> > >>>>>>>>> 
> > >>>>>>>>> The directive `#pragma clang
declare variant` follows the
> > >>>>>>>>> syntax of the `#pragma omp
declare variant` directive of OpenMP.
> > >>>>>>>>> 
> > >>>>>>>>> We define the new directive in
the `clang` namespace instead
> > >>>>>>>>> of using the `omp` one of OpenMP
to allow the compiler to
> > >>>>>>>>> perform auto-vectorization
outside of an OpenMP SIMD context.
> > >>>>>>>>> 
> > >>>>>>>>> The mechanism is base on OpenMP
to provide a uniform user
> > >>>>>>>>> experience across the two
mechanism, and to maximise the
> > >>>>>>>>> number of shared components of
the infrastructure needed in
> > >>>>>>>>> the compiler frontend to enable
the feature.
> > >>>>>>>>> 
> > >>>>>>>>> Changes in LLVM IR {#llvmIR}
> > >>>>>>>>> ------------------
> > >>>>>>>>> 
> > >>>>>>>>> The IR is enriched with metadata
that details the
> > >>>>>>>>> availability of vector versions
of an associated scalar
> > >>>>>>>>> function. This metadata is
attached to the call site of the scalar function.
> > >>>>>>>>> 
> > >>>>>>>>> The metadata takes the form of an
attribute containing a
> > >>>>>>>>> comma separated list of vector
function mappings. Each entry
> > >>>>>>>>> has a unique name that follows
the Vector Function ABI[^2]
> > >>>>>>>>> and real name that is used when
generating calls to this vector function.
> > >>>>>>>>> 
> > >>>>>>>>>     vfunc_name1(real_name1),
vfunc_name2(real_name2)
> > >>>>>>>>> 
> > >>>>>>>>> The Vector Function ABI name
describes the signature of the
> > >>>>>>>>> vector function so that
properties like vectorisation factor
> > >>>>>>>>> can be queried during
compilation.
> > >>>>>>>>> 
> > >>>>>>>>> The `(real name)` token is
optional and assumed to match the
> > >>>>>>>>> Vector Function ABI name when
omitted.
> > >>>>>>>>> 
> > >>>>>>>>> For example, the availability of
a 2-lane double precision
> > >>>>>>>>> `sin` function via SVML when
targeting AVX on x86 is
> > >>>>>>>>> provided by the following IR.
> > >>>>>>>>> 
> > >>>>>>>>>     // ...
> > >>>>>>>>>     ... = call double
@sin(double) #0
> > >>>>>>>>>     // ...
> > >>>>>>>>> 
> > >>>>>>>>>     #0 = { vector-variant =
{"_ZGVcN2v_sin(__svml_sin2),
> > >>>>>>>>>                              
_ZGVdN4v_sin(__svml_sin4),
> > >>>>>>>>>                              
..."} }
> > >>>>>>>>> 
> > >>>>>>>>> The string
`"_ZGVcN2v_sin(__svml_sin2)"` in this
> > >>>>>>>>> vector-variant attribute provides
information on the shape
> > >>>>>>>>> of the vector function via the
string `_ZGVcN2v_sin`,
> > >>>>>>>>> mangled according to the Vector
Function ABI for Intel, and
> > >>>>>>>>> remaps the standard Vector
Function ABI name to the non-standard name `__svml_sin2`.
> > >>>>>>>>> 
> > >>>>>>>>> This metadata is compatible with
the proposal "Proposal for
> > >>>>>>>>> function vectorization and loop
vectorization with function
> > >>>>>>>>> calls",[^3] that uses Vector
Function ABI mangled names to
> > >>>>>>>>> inform the vectorizer about the
availability of vector
> > >>>>>>>>> functions. The proposal extends
the original by allowing the
> > >>>>>>>>> explicit mapping of the Vector
Function ABI mangled name to a non-standard name, which allows the use of
existing vector libraries.
> > >>>>>>>>> 
> > >>>>>>>>> The `vector-variant` attribute
needs to be attached on a
> > >>>>>>>>> per-call basis to avoid conflicts
when merging modules with different vector variants.
> > >>>>>>>>> 
> > >>>>>>>>> The query infrastructure: SVFS
{#infrastructure}
> > >>>>>>>>> ------------------------------
> > >>>>>>>>> 
> > >>>>>>>>> The Search Vector Function System
(SVFS) is constructed from
> > >>>>>>>>> an `llvm::Module` instance so it
can create function
> > >>>>>>>>> definitions. The SVFS exposes an
API with two methods.
> > >>>>>>>>> 
> > >>>>>>>>> ###
`SVFS::isFunctionVectorizable`
> > >>>>>>>>> 
> > >>>>>>>>> This method queries the
avilability of a vectorized version
> > >>>>>>>>> of a function. The signature of
the method is as follows.
> > >>>>>>>>> 
> > >>>>>>>>>     bool
isFunctionVectorizable(llvm::CallInst * Call,
> > >>>>>>>>> ParTypeMap Params);
> > >>>>>>>>> 
> > >>>>>>>>> The method determine the
availability of vector version of
> > >>>>>>>>> the function invoked by the
`Call` parameter by looking at
> > >>>>>>>>> the `vector-variant` metadata.
> > >>>>>>>>> 
> > >>>>>>>>> The `Params` argument is a map
that associates the position
> > >>>>>>>>> of a parameter in the `CallInst`
to its `ParameterType`
> > >>>>>>>>> descriptor. The `ParameterType`
descriptor holds information
> > >>>>>>>>> about the shape of the
correspondend parameter in the
> > >>>>>>>>> signature of the vector function.
This `ParamaterType` is
> > >>>>>>>>> used to query the SVMS about the
availability of vector
> > >>>>>>>>> version that have `linear`,
`uniform` or `align` parameters (in the sense of OpenMP 4.0 and onwards).
> > >>>>>>>>> 
> > >>>>>>>>> The method
`isFunctionVectorizable`, when invoked with an
> > >>>>>>>>> empty `ParTypeMap`, is equivalent
to the `TargetLibraryInfo`
> > >>>>>>>>> method
`isFunctionVectorizable(StrinRef Name)`.
> > >>>>>>>>> 
> > >>>>>>>>> ### `SVFS::getVectorizedFunction`
> > >>>>>>>>> 
> > >>>>>>>>> This method returns the vector
function declaration that
> > >>>>>>>>> correspond to the needs of the
vectorization technique that is being run.
> > >>>>>>>>> 
> > >>>>>>>>> The signature of the function is
as follows.
> > >>>>>>>>> 
> > >>>>>>>>>    
std::pair<llvm::FunctionType *, std::string> getVectorizedFunction(
> > >>>>>>>>>       llvm::CallInst * Call,
unsigned VF, bool IsMasked,
> > >>>>>>>>> ParTypeSet Params);
> > >>>>>>>>> 
> > >>>>>>>>> The `Call` parameter is the call
instance that is being
> > >>>>>>>>> vectorized, the `VF` parameter
represent the vectorization
> > >>>>>>>>> factor (how many lanes), the
`IsMasked` parameter decides
> > >>>>>>>>> whether or not the signature of
the vector function is
> > >>>>>>>>> required to have a mask
parameter, the `Params` parameter
> > >>>>>>>>> describes the shape of the vector
function as in the `isFunctionVectorizable` method.
> > >>>>>>>>> 
> > >>>>>>>>> The methods uses the
`vector-variant` metadata and returns
> > >>>>>>>>> the function signature and the
name of the function based on the input parameters.
> > >>>>>>>>> 
> > >>>>>>>>> The SVFS can add new function
definitions, in the same
> > >>>>>>>>> module as the `Call`, to provide
vector functions that are
> > >>>>>>>>> not present within the
vector-variant metadata. For example,
> > >>>>>>>>> if a library provides a vector
version of a function with a
> > >>>>>>>>> vectorization factor of 2, but
the vectorizer is requesting
> > >>>>>>>>> a vectorization factor of 4, the
SVFS is allowed to create a
> > >>>>>>>>> definition that calls the 2-lane
version twice. This
> > >>>>>>>>> capability applies similarly for
providing masked and unmasked versions when the request does not match what is
available in the library.
> > >>>>>>>>> 
> > >>>>>>>>> This method is equivalent to the
TLI method `StringRef
> > >>>>>>>>> getVectorizedFunction(StringRef
F, unsigned VF) const;`.
> > >>>>>>>>> 
> > >>>>>>>>> Notice that to fully support
OpenMP vectorization we need to
> > >>>>>>>>> think about a fuzzy matching
mechanism that is able to
> > >>>>>>>>> select a candidate in the calling
context. However, this
> > >>>>>>>>> proposal is intended for
scalar-to-vector mappings of
> > >>>>>>>>> math-like functions that are most
likely to associate a
> > >>>>>>>>> unique vector candidate in most
contexts. Therefore,
> > >>>>>>>>> extending this behavior to a
generic one is an aspect of the implementation that will be treated in a
separate RFC about the vectorization pass.
> > >>>>>>>>> 
> > >>>>>>>>> ### Scalable vectorization
> > >>>>>>>>> 
> > >>>>>>>>> Both methods of the SVFS API will
be extended with a boolean
> > >>>>>>>>> parameter to specify whether
scalable signatures are needed
> > >>>>>>>>> by the user of the SVFS.
> > >>>>>>>>> 
> > >>>>>>>>> Changes in clang {#clang}
> > >>>>>>>>> ----------------
> > >>>>>>>>> 
> > >>>>>>>>> We use clang to generate the
metadata described above.
> > >>>>>>>>> 
> > >>>>>>>>> In the compilation unit, the
vector function definition or
> > >>>>>>>>> declaration must be visible and
associated to the scalar
> > >>>>>>>>> version via the `#pragma clang
declare variant` according to
> > >>>>>>>>> the rule defined by the
correspondent `#pragma omp declare
> > >>>>>>>>> variant` defined in OpenMP 5.0,
as in the following example.
> > >>>>>>>>> 
> > >>>>>>>>>     #pragma clang declare
variant(vector_sinf) \
> > >>>>>>>>>    
match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})
> > >>>>>>>>>     extern float sinf(float);
> > >>>>>>>>> 
> > >>>>>>>>>     float32x4_t
vector_sinf(float32x4_t x);
> > >>>>>>>>> 
> > >>>>>>>>> The `construct` set in the
directive, together with the
> > >>>>>>>>> `device` set, is used to generate
the vector mangled name to
> > >>>>>>>>> be used in the `vector-variant`
attribute, for example
> > >>>>>>>>> `_ZGVnN2v_sin`, when targeting
> > >>>>>>>>> AArch64 Advanced SIMD code
generation. The rule for mangling
> > >>>>>>>>> the name of the scalar function
in the vector name are
> > >>>>>>>>> defined in the the Vector
Function ABI specification of the target.
> > >>>>>>>>> 
> > >>>>>>>>> The part of the vector-variant
attribute that redirects the
> > >>>>>>>>> call to `vector_sinf` is derived
from the `variant-id`
> > >>>>>>>>> specified in the `variant`
clause.
> > >>>>>>>>> 
> > >>>>>>>>> Summary
> > >>>>>>>>> ======> >
>>>>>>>>>
> > >>>>>>>>> New `clang` directive in clang
> > >>>>>>>>> ------------------------------
> > >>>>>>>>> 
> > >>>>>>>>> `#pragma omp declare variant`,
same as `#pragma omp declare
> > >>>>>>>>> variant` restricted to the `simd`
context selector, from OpenMP 5.0+.
> > >>>>>>>>> 
> > >>>>>>>>> Option behavior, and interaction
with OpenMP
> > >>>>>>>>>
--------------------------------------------
> > >>>>>>>>> 
> > >>>>>>>>> The behavior described below
makes sure that `#pragma cland
> > >>>>>>>>> declare variant` function
vectorization and OpenMP function
> > >>>>>>>>> vectorization are orthogonal.
> > >>>>>>>>> 
> > >>>>>>>>> `-fclang-declare-variant`
> > >>>>>>>>> 
> > >>>>>>>>> :   The `#pragma clang declare
variant` directives are parsed and used
> > >>>>>>>>>     to populate the
`vector-variant` attribute.
> > >>>>>>>>> 
> > >>>>>>>>> `-fopenmp[-simd]`
> > >>>>>>>>> 
> > >>>>>>>>> :   The `#pragma omp declare
variant` directives are parsed and used to
> > >>>>>>>>>     populate the `vector-variant`
attribute.
> > >>>>>>>>> 
> > >>>>>>>>> `-fopenmp[-simd]`and
`-fno-clang-declare-variant`
> > >>>>>>>>> 
> > >>>>>>>>> :   The directive `#pragma omp
declare variant` is used to populate the
> > >>>>>>>>>     `vector-variant` attribute in
IR. The directive
> > >>>>>>>>>     `#pragma   clang declare
variant` are ignored.
> > >>>>>>>>> 
> > >>>>>>>>> [^1]: 
> > >>>>>>>>>
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specif
> > >>>>>>>>> ication-5.0.pdf>
> > >>>>>>>>> 
> > >>>>>>>>> [^2]: Vector Function ABI for
x86:
> > >>>>>>>>>    
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
> > >>>>>>>>>     Vector Function ABI for
AArch64:
> > >>>>>>>>>     
> > >>>>>>>>>
https://developer.arm.com/products/software-development-tool
> > >>>>>>>>>
s/hpc/arm-compiler-for-hpc/vector-function-abi
> > >>>>>>>>> 
> > >>>>>>>>> [^3]: 
> > >>>>>>>>>
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.h
> > >>>>>>>>> tml>
> > >>>>>>>>> 
> > >>>>>>>>>
_______________________________________________
> > >>>>>>>>> LLVM Developers mailing list
llvm-dev at lists.llvm.org
> > >>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >>>>>>>>
_______________________________________________
> > >>>>>>>> cfe-dev mailing list
> > >>>>>>>> cfe-dev at lists.llvm.org
> > >>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> > >>>>> --
> > >>>>> Hal Finkel
> > >>>>> Lead, Compiler Technology and Programming
Languages Leadership
> > >>>>> Computing Facility Argonne National Laboratory
> > >>>>> 
> > >>>>> _______________________________________________
> > >>>>> cfe-dev mailing list
> > >>>>> cfe-dev at lists.llvm.org
> > >>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> > >> 
> > > 
> > > --
> > > 
> > > Johannes Doerfert
> > > Researcher
> > > 
> > > Argonne National Laboratory
> > > Lemont, IL 60439, USA
> > > 
> > > jdoerfert at anl.gov
> > 
> 
> -- 
> 
> Johannes Doerfert
> Researcher
> 
> Argonne National Laboratory
> Lemont, IL 60439, USA
> 
> jdoerfert at anl.gov
-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190531/f9944ff1/attachment.sig>

Francesco Petrogalli via llvm-dev

2019-May-31 22:05 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

> On May 31, 2019, at 2:56 PM, Doerfert, Johannes <jdoerfert at
anl.gov> wrote:
> 
> I think I did misunderstand what you want to do with attributes. This is
> my bad. Let me try to explain:
> 
> It seems you want the "vector-variants" attributes (which I could
not
> find with this name in trunk, correct?) to "remember" what vector
> versions can be created (wrt. validity), assuming a definition is
> available? Correct?
Yes.
> What I was concerned with is the example I sketched somewhere below
> which motivates the need for a generalized/standardized name mangling
> for OpenMP. I though you wanted to avoid that somehow but if you don't
I
> misunderstood you. I basically removed the part where the vector
> versions have to be created first but I assumed them to be existent (in
> the module or somewhere else). That is, I assumed a call to foo and
> various symbols available that are specializations of foo. When we then
> vectorize foo (or otherwise specialize at some point in the future), you
> would scan the module and pick the best match based on the context of
> the call.
> 
Yes, although the syntax you use below is wrong. Declare variant is attached to
the scalar definition, and points to a vector definitions (the variant) that is
declared/defined in the same compilation unit where the scalar version is
visible.

> Now I don't know if I understood your proposal by now but let me ask a
> question anyway:
> 
> VecClone.cpp:276-278 mentions that the vectorizer is supposed to look at
> the vector-variants functions. This works for variants that are created
> from definitions in the module but what about #omp declare simd
> declarations?
> 
VectorClone does more than just mapping a scalar version to a vector one. It
builds also the vector version definition by auto-vectorizing the body of the
scalar function.

I don’t know if the patches related to VecClone also are intended to use the
`vector-variant` attribute for function declaration with a #pragma omp declare
simd. On aarch64, in Arm compiler for HPC, we do that to support vector math
libraries. It works in principle, but `vector variant` allows more context
selection (and custom names instead of vector ABI names, which are easier for
users).

> 
> On 05/31, Francesco Petrogalli wrote:
>>> On May 31, 2019, at 11:47 AM, Doerfert, Johannes <jdoerfert at
anl.gov> wrote:
>>> 
>>> I think we should split this discussion:
>>> TOPIC 1 & 2 & 4: How do implement all use cases and OpenMP
5.X
>>>                  features, including compatibility with other
>>>                  compilers and cross module support.
>> 
>> Yes, and we have to carefully make this as standard and compatible as
possible.
> 
> Agreed.
> 
> 
>>> TOPIC 3b & 5: Interoperability with clang declare (system vs.
user
>>>                declares)
>> 
>> 
>> I think that Alexey explanation of how the directive are handled
>> internally in the frontend makes us propound towards the attribute. 
> 
> How things are handled right now, especially given that declare variant
> is not handled at all, should not limit our design space. If the
> argument is that we cannot reasonably implement a solution, that is a
> different story.
> 
> 
>>> TOPIC 3a & 3c: floating point issues?
>>> 
>> 
>> I believe there is no issue there. I have quoted the openMP standard in
reply to Renato:
>> 
>> See
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page
118, lines 23-24:
>> 
>> “The execution of the function or subroutine cannot have any side
>> effects that would alter its execution for concurrent iterations of a
>> SIMD chunk."
> 
> Great.
> 
> 
>>> I inlined comments for Topic 1 below.
>>> 
>>> I hope that we do not have to discuss topic 2 if we agree neither
>>> attributes nor metadata is necessary, or better, will solve the
actual
>>> problem at hand. I don't have strong feeling on topic 4 but I
have the
>>> feeling this will become less problematic once we figure out topic
1.
>>> 
>>> Thanks,
>>> Johannes
>>> 
>>> 
>>> On 05/31, Francesco Petrogalli wrote:
>>>> # TOPIC 1: concerns about name mangling
>>>> 
>>>> I understand that there are concerns in using the mangling
scheme I
>>>> proposed, and that it would be preferred to have a mangling
scheme
>>>> that is based on (and standardized by) OpenMP. 
>>> 
>>> I still think it will be required to have a standardized one, not
>>> only preferred.
>>> 
>>> 
>> 
>> I am all with you in standardizing. x86 and arch64 have their own
>> vector function ABI, which, although “private”, are to be considered
>> standard. Opensource and commercial compilers are using them,
>> therefore we have to deal with this mangling scheme, whether or not
>> OpenMP comes up with a standard mangling scheme.
> 
> I don't get the point you are trying to make here. What do you mean by
> "we have to deal with"? (I do not suggest to get rid of them.)
> 
That we cannot ignore the fact that the name scheme is already standardized by
the vendors, so let’s first deal with what we have, and think about the OpenMP
mangling scheme only once there is one available.
> 
>>>> I hear the argument on having some common ground here. In fact,
there
>>>> is already common ground between the x86 and aarch64 backend,
who have
>>>> based their respective Vector Function ABI specifications on
OpenMP.
>>>> 
>>>> In fact, the mangled name grammar can be summarized as follows:
>>>> 
>>>> _ZGV<isa><masking><VLEN><parameter
type>_<scalar name>
>>>> 
>>>> Across vector extensions the only <token> that will
differ is the
>>>> <isa> token.
>>>> 
>>>> This might lead people to think that we could drop the
_ZGV<isa>
>>>> prefix and consider the
<masking><VLEN><parameter type>_<scalar name>
>>>> part as a sort of unofficial OpenMP mangling scheme: in fact,
the
>>>> signature of an “unmasked 2-lane vector vector of `sin`” will
always
>>>> be `<2 x double>(2 x double>).
>>>> 
>>>> The problem with this choice is the number of vector version
available
>>>> for a target is not unique.
>>> 
>>> For me, this simply means this mangling scheme is not sufficient.
>>> 
>> 
>> Can you explain more why you think the mangling scheme is not
>> sufficient? The mangling scheme is shaped to provide all the
>> information that the OpenMP directive describes.
> 
> I don't know if it is insufficient but I though you hinted towards
that.
I didn’t mean that, the tokens in the vector function ABI mangled schemes are
sufficient.
> If we can handle/decode everything we need for declare variants then I
> do not object at all. If not, we require respective extension such that
> we can. The result should be a superset of the current SIMD encoding and
> compatible with the current one.
> 
> 
We can handle/decode everything for a SIMD context. :)

> 
>> The fact that x86 and aarch64 realize such information in different
>> way (multiple signature/vector extensions) is something that cannot be
>> avoided, because it is related to architectural aspects that are
>> specific to the vector extension and transparent to the OpenMP
>> standard.
> 
> I don't think that is a problem (that's why I "failed to see
the
> problem" in the comment below). I look at it this way: If #declare
simd,
> or similar, results in N variants, it should at the end of the day not
> be different from declaring these N variants explicitly with the
> respective declare variant match clause.
> 
That’s not the case. #declare simd should create all the versions that are
optimal for the target. We carefully thoght about that when writing the vector
function ABI. Most of the constrains derive by the fact that each target has a
specific register size.

Example:

#pragma omp declare simd
Float foo(float);

X86 -> 8 version {2, 4, 8, 16 lanes} x {masking, no masking}, see
https://godbolt.org/z/m1BUVt
Arm NEON: -> 4 versions {2, 4 lanes} x {masking, no masking }
Arm SVE: -> 1 version

Therefore, the outcome of declare simd is not target independent. Your
expectation are met only inside one target.

> 
>>>> In particular, the following declaration generates multiple
vector
>>>> versions, depending on the target:
>>>> 
>>>> #pragma omp declare simd simdlen(2) notinbranch
>>>> double foo(double) {…};
>>>> 
>>>> On x86, this generates at least 4 symbols (one for SSE, one for
AVX,
>>>> one for AVX2, and one for AVX512: https://godbolt.org/z/TLYXPi)
>>>> 
>>>> On aarch64, the same declaration generates a unique symbol, as
>>>> specified in the Vector Function ABI.
>>> 
>>> I fail to see the problem. We generate X symbols for X different
>>> contexts. Once we get to the point where we vectorize, we determine
>>> which context fits best and choose the corresponding symbol
version.
>>> 
>> 
>> Yes, this is exactly what we need to do, under the constrains that
>> the rules for  generating "X symbols for X different contexts” are
>> decided by the Vector Function ABI of the target.
> 
> Sounds good. The vector ABI is used to determine what contexts exists
> and what symbols should be created. I would assume the encoding should
> be the same as if we specified the versions (/contexts) ourselves via
> #declare variant.
> 
Oh yes, vector functions listed in a declare variant should obey the vector
function ABI rules (other than the function name).
> 
>>> Maybe my view is to naive here, please feel free to correct me.
>>> 
>>> 
>>>> This means that the attribute (or metadata) that carries the
>>>> information on the available vector version needs to deal also
with
>>>> things that are not usually visible at IR level, but that might
still
>>>> need to be provided to be able to decide which particular
instruction
>>>> set/ vector extension needs to be targeted.
>>> 
>>> The symbol names should carry all the information we need. If they
do
>>> not, we need to improve the mangling scheme such that they do.
There is
>>> no attributes/metadata we could use at library boundaries.
>>> 
>> Hum, I am not sure what you mean by "There is no
attributes/metadata
>> we could use at library boundaries."
> 
> (This seems to be part of the misunderstanding, I leave my comment here
> anyway:)
> 
> The simd-related stuff works because it is a uniform mangling scheme
> used by all compilers. Take the situation below in which I think we want
> to call foo_CTX in the library. If so, we need a name for it.
> 
In the situation below, the mangled name is going to be the same for both
compilers, as long as they adhere to the vector function ABI.
> 
> a.c:  // Compiled by gcc into a library
> #omp declare variant (foo) match(CTX)
> void foo_CTX(...) {...}
> 
> b.c:  // Compiled by clang linked against the library above.
> #omp declare variant (foo) match(CTX)
> void foo_CTX(...);
> 
> void bar(...) {
>  #pragma omp CTX
>  foo();   // <- What function (symbol) do we call if a.c was compiled
>           //    by gcc and b.c with clang?
> }
> 
Please notice that `declare variant` needs to be attached to the scalar
function, not the vector one.

```
#pragma omp declare variant(foo_CTX) match (context=simd…
double foo (double) {…}

Vector_double_ty foo_CTX(vector_double_ty) {…}
```

In vectorizing foo in bar, the compiler will not care where foo_CTX would come
from (of course, as long as the scalar+declare variant declarations are
visible).
>> In our downstream compiler (Arm compiler for HPC, based on LLVM), we
>> use `declare simd` to provide vector math functions via custom header
>> file. It works brilliantly, if not for specific aspects that would be
>> perfectly covered by the `declare variant`, which might be one of the
>> reason why the OpenMP committee decided to introduce `declare
>> variant`.
> 
> But you (assume that you) control the mangling scheme across the entire
> infrastructure. Given that the simd mangling is de-facto standardized,
> that works.
> 
> Side note:
> Declare variant, as of 5.0, is not flexible enough for a sensible
> inclusion of target specific headers. That will change in 5.1.
> 
Could you point me at the discussion in 5.1 on this specific aspect?

> 
>> If your concerns is that by adding an attribute that somehow represent
>> something that is available in an external library is not enough to
>> guarantee that that symbol is available in the library… not even C
>> code can guarantee that? If the linker is not pointing to the right
>> library, there is nothing that can prevent it to fail if the symbol is
>> not present? 
> 
> I don't follow the example you describe. I don't want to change
anything
> in how symbols are looked up or what happens if they are missing.
> 
> 
I don’t want to change that too :). I think we are misunderstanding each other
here...
>>>> I used an example based on `declare simd` instead of `declare
variant`
>>>> because the attribute/metadata needed for `declare variant` is
a
>>>> modification of the one needed for `declare simd`, which has
already
>>>> been agreed in a previous RFC proposed by Intel [1], and for
which
>>>> Intel has already provided an implementation [2]. The changes
proposed
>>>> in this RFC are fully compatible with the work that is being
don for
>>>> the VecClone pass in [2].
>>>> 
>>>> [1]
http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
>>>> [2] VecCLone pass: https://reviews.llvm.org/D22792
>>> 
>>> Having an agreed upon mangling for the older feature is not
necessarily
>>> important here. We will need more functionality for variants and
keeping
>>> the old scheme around with some metadata is not an extensible
long-term
>>> solution. So, I would not try to fit variants into the existing
>>> simd-scheme but instead do it the other way around. We define what
we
>>> need for variants and implement simd in that scheme.
>>> 
>> 
>> I kinda think that having agreed on something is important. It allows
>> to build other things on top of what have been agreed without breaking
>> compatibility.
>> 
>> On the specific, which are the new functionalities needed for the
>> variants that would make the current metadata (attributes) for declare
>> simd non extensible?
> 
> See first comment.
> 
>>>> The good news is that as far as AArch64 and x86 are concerned,
the only thing that will differ in the mangled name is the “<isa>” token.
As far as I can tell, the mangling scheme of the rest of the vector name is the
same, therefore a lot of infrastructure in terms of mangling and demangling can
be reused. In fact, the `mangleVectorParameters` function in
https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could
already be shared among x86 and aarch64.
>>>> 
>>>> TOPIC 2: metadata vs attribute
>>>> 
>>>> From a functionality point of view, I don’t care whether we use
metadata or attributes. The VecClone pass mentioned in TOPIC 1 uses the
following:
>>>> 
>>>> attributes #0 = { nounwind uwtable
“vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16”}
>>>> 
>>>> This is an attribute (I though it was metadata?), I am happy to
reword the RFC using the right terminology (sorry for messing this up).
>>>> 
>>>> Also, @Renato expressed concern that metadata might be dropped
by optimization passes - would using attributes prevent that?
>>>> 
>>>> TOPIC 3: "There is no way to notify the backend how
conformant the SIMD versions are.”
>>>> 
>>>> @Shawn, I am afraid I don’t understand what you mean by
“conformant” here. Can you elaborate with an example?
>>>> 
>>>> TOPIC 3: interaction of the `omp declare variant` with `clang
declare variant`
>>>> 
>>>> I believe this is described in the `Option behavior, and
interaction with OpenMP`. The option `-fclang-declare-variant` is there to make
the OpenMP based one orthogonal. Of course, we might decide to make
-fclang-declare-variant on/off by default, and have default behavior when
interacting with -fopenmp-simd. For the sake of compatibility with other
compilers, we might need to require -fno-clang-declare-variant when targeting
-fopenmp-[simd].
>>>> 
>>>> TOPIC 3: "there are no special arguments / flags / status
regs that are used / changed in the vector version that the compiler will have
to "just know”
>>>> 
>>>> I believe that this concern is raised by the problem of
handling FP exceptions? If that’s the case, the compiler is not allowed to do
any assumption on the vector function about that, and treat it with the same
knowledge of any other function, depending on the visibility it has in the
compilation unit. @Renato, does this answer your question?
>>>> 
>>>> TOPIC 4: attribute in function declaration vs attribute
function call site
>>>> 
>>>> We discussed this in the previous version of the proposal.
Having it in the call sites guarantees that incompatible vector version are used
when merging modules compiled for different targets. I don’t have a use case for
this, if I remember correctly this was asked by @Hideki Saito. Hideki, any
comment on this?
>>>> 
>>>> TOPIC 5: overriding system header (the discussion on #pragma
omp/clang/system variants initiated by @Hal Finkel).
>>>> 
>>>> I though that the split among #pragma clang declare variant and
#pragma omp declare variant was already providing the orthogonality between
system header and user header. Meaning that a user should always prefer the omp
version (for portability to other compilers) instead of the #pragma clang one,
which would be relegated to system headers and headers provided by the compiler.
Am I missing something? If so, I am happy to add a “system” version of the
directive, as it would be quite easy to do given most of the parsing
infrastructure will be shared.
>>>> 
>>>> 
>>>>> On May 30, 2019, at 12:53 PM, Philip Reames <listmail at
philipreames.com> wrote:
>>>>> 
>>>>> 
>>>>> On 5/30/19 9:05 AM, Doerfert, Johannes wrote:
>>>>>> On 05/29, Finkel, Hal J. via cfe-dev wrote:
>>>>>>> On 5/29/19 1:52 PM, Philip Reames wrote:
>>>>>>>> On 5/28/19 7:55 PM, Finkel, Hal J. wrote:
>>>>>>>>> On 5/28/19 3:31 PM, Philip Reames via
cfe-dev wrote:
>>>>>>>>>> I generally like the idea of having
support in IR for vectorization of
>>>>>>>>>> custom functions.  I have several use
cases which would benefit from this.
>>>>>>>>>> 
>>>>>>>>>> I'd suggest a couple of reframings
to the IR representation though.
>>>>>>>>>> 
>>>>>>>>>> First, this should probably be
specified as metadata/attribute on a
>>>>>>>>>> function declaration.  Allowing the
callsite variant is fine, but it
>>>>>>>>>> should primarily be a property of the
called function, not of the call
>>>>>>>>>> site.  Being able to specify it once
per declaration is much cleaner.
>>>>>>>>> I agree. We should support this both on the
function declaration and on
>>>>>>>>> the call sites.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> Second, I really don't like the
mangling use here.  We need a better way
>>>>>>>>>> to specify the properties of the
function then it's mangled name.  One
>>>>>>>>>> thought to explore is to directly use
the Value of the function
>>>>>>>>>> declaration (since this is metadata and
we can do that), and then tie
>>>>>>>>>> the properties to the function
declaration in some way?  Sorry, I don't
>>>>>>>>>> really have a specific suggestion here.
>>>>>>>>> Is the problem the mangling or the fact
that the mangling is
>>>>>>>>> ABI/target-specific? One option is to use
LLVM's mangling scheme (the
>>>>>>>>> one we use for intrinsics) and then provide
some backend infrastructure
>>>>>>>>> to translate later.
>>>>>>>> Well, both honestly.  But mangling with a
non-target specific scheme is
>>>>>>>> a lot better, so I might be okay with that.  
Good idea.
>>>>>>> 
>>>>>>> I liked your idea of directly encoding the
signature in the metadata,
>>>>>>> but I think that we want to continue to use
attributes, and not
>>>>>>> metadata, and the options for attributes seem more
limited - unless we
>>>>>>> allow attributes to take metadata arguments - maybe
that's an
>>>>>>> enhancement worth considering.
>>>>>> I recently talked to people in the OpenMP language
committee meeting
>>>>>> about this and, thinking forward to the actual
implementation/use of the
>>>>>> OpenMP 5.x declare variant feature, I'd say:
>>>>>> 
>>>>>> - We will need a mangling scheme if we want to allow
variants on
>>>>>>  declarations that are defined elsewhere.
>>>>>> - We will need a (OpenMP) standardized mangling scheme
if we want
>>>>>>  interoperability between compilers.
>>>>>> 
>>>>>> I assume we want both so I think we will need both.
>>>>> If I'm reading this correctly, this describes a need
for the frontend to
>>>>> have a mangling scheme.  Nothing in here would seem to
prevent the
>>>>> frontend for generating a declaration for a mangled
external symbol and
>>>>> then referencing that declaration.  Am I missing something?
>>>>>> 
>>>>>> That said, I think this should allow us to avoid
attributes/metadata
>>>>>> which seems to me like a good thing right now.
>>>>>> 
>>>>>> Cheers,
>>>>>> Johannes
>>>>>> 
>>>>>> 
>>>>>>>>>> On 5/28/19 12:44 PM, Francesco
Petrogalli via llvm-dev wrote:
>>>>>>>>>>> Dear all,
>>>>>>>>>>> 
>>>>>>>>>>> This RFC is a proposal to provide
auto-vectorization functionality for user provided vector functions.
>>>>>>>>>>> 
>>>>>>>>>>> The proposal is a modification of
an RFC that I have sent out a couple of months ago, with the title `[RFC]
Re-implementing -fveclib with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
>>>>>>>>>>> 
>>>>>>>>>>> The original RFC was proposing to
re-implement the `-fveclib` command line option. This proposal avoids that, and
limits its scope to the mechanics of providing vector function in user code that
the compiler can pick up for auto-vectorization. This narrower scope limits the
impact of changes that are needed in both clang and LLVM.
>>>>>>>>>>> 
>>>>>>>>>>> Please let me know what you think.
>>>>>>>>>>> 
>>>>>>>>>>> Kind regards,
>>>>>>>>>>> 
>>>>>>>>>>> Francesco
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>
================================================================================>>>>>>>>>>>
>>>>>>>>>>> Introduction
>>>>>>>>>>>
===========>>>>>>>>>>>
>>>>>>>>>>> This RFC encompasses the proposal
of informing the vectorizer about the
>>>>>>>>>>> availability of vector functions
provided by the user. The mechanism is
>>>>>>>>>>> based on the use of the directive
`declare variant` introduced in OpenMP
>>>>>>>>>>> 5.0 [^1].
>>>>>>>>>>> 
>>>>>>>>>>> The mechanism proposed has the
following properties:
>>>>>>>>>>> 
>>>>>>>>>>> 1.  Decouples the compiler
front-end that knows about the availability
>>>>>>>>>>>    of vectorized routines, from the
back-end that knows how to make use
>>>>>>>>>>>    of them.
>>>>>>>>>>> 2.  Enable support for a
developer's own vector libraries without
>>>>>>>>>>>    requiring changes to the
compiler.
>>>>>>>>>>> 3.  Enables other frontends (e.g.
f18) to add scalar-to-vector function
>>>>>>>>>>>    mappings as relevant for their
own runtime libraries, etc.
>>>>>>>>>>> 
>>>>>>>>>>> The implemetation consists of two
separate sets of changes.
>>>>>>>>>>> 
>>>>>>>>>>> The first set is a set o changes in
`llvm`, and consists of:
>>>>>>>>>>> 
>>>>>>>>>>> 1.  [Changes in LLVM IR](#llvmIR)
to provide information about the
>>>>>>>>>>>    availability of user-defined
vector functions via metadata attached
>>>>>>>>>>>    to an `llvm::CallInst`.
>>>>>>>>>>> 2.  [An
infrastructure](#infrastructure) that can be queried to retrive
>>>>>>>>>>>    information about the available
vector functions associated to a
>>>>>>>>>>>    `llvm::CallInst`.
>>>>>>>>>>> 3.  [Changes in the
LoopVectorizer](#LV) to use the API to query the
>>>>>>>>>>>    metadata.
>>>>>>>>>>> 
>>>>>>>>>>> The second set consists of the
changes [changes in clang](#clang) that
>>>>>>>>>>> are needed too to recognize the
`#pragma clang declare variant`
>>>>>>>>>>> directive.
>>>>>>>>>>> 
>>>>>>>>>>> Proposed changes
>>>>>>>>>>>
===============>>>>>>>>>>>
>>>>>>>>>>> We propose an implementation that
uses `#pragma clang declare variant`
>>>>>>>>>>> to inform the backend components
about the availability of vector
>>>>>>>>>>> version of scalar functions found
in IR. The mechanism relies in storing
>>>>>>>>>>> such information in IR metadata,
and therefore makes the
>>>>>>>>>>> auto-vectorization of function
calls a mid-end (`opt`) process that is
>>>>>>>>>>> independent on the front-end that
generated such IR metadata.
>>>>>>>>>>> 
>>>>>>>>>>> This implementation provides a
generic mechanism that the users of the
>>>>>>>>>>> LLVM compiler will be able to use
for interfacing their own vector
>>>>>>>>>>> routines for generic code.
>>>>>>>>>>> 
>>>>>>>>>>> The implementation can also expose
vectorization-specific descriptors --
>>>>>>>>>>> for example, like the `linear` and
`uniform` clauses of the OpenMP
>>>>>>>>>>> `declare simd` directive -- that
could be used to finely tune the
>>>>>>>>>>> automatic vectorization of some
functions (think for example the
>>>>>>>>>>> vectorization of `double
sincos(double , double *, double *)`, where
>>>>>>>>>>> `linear` can be used to give extra
information about the memory layout
>>>>>>>>>>> of the 2 pointers parameters in the
vector version).
>>>>>>>>>>> 
>>>>>>>>>>> The directive `#pragma clang
declare variant` follows the syntax of the
>>>>>>>>>>> `#pragma omp declare variant`
directive of OpenMP.
>>>>>>>>>>> 
>>>>>>>>>>> We define the new directive in the
`clang` namespace instead of using
>>>>>>>>>>> the `omp` one of OpenMP to allow
the compiler to perform
>>>>>>>>>>> auto-vectorization outside of an
OpenMP SIMD context.
>>>>>>>>>>> 
>>>>>>>>>>> The mechanism is base on OpenMP to
provide a uniform user experience
>>>>>>>>>>> across the two mechanism, and to
maximise the number of shared
>>>>>>>>>>> components of the infrastructure
needed in the compiler frontend to
>>>>>>>>>>> enable the feature.
>>>>>>>>>>> 
>>>>>>>>>>> Changes in LLVM IR {#llvmIR}
>>>>>>>>>>> ------------------
>>>>>>>>>>> 
>>>>>>>>>>> The IR is enriched with metadata
that details the availability of vector
>>>>>>>>>>> versions of an associated scalar
function. This metadata is attached to
>>>>>>>>>>> the call site of the scalar
function.
>>>>>>>>>>> 
>>>>>>>>>>> The metadata takes the form of an
attribute containing a comma separated
>>>>>>>>>>> list of vector function mappings.
Each entry has a unique name that
>>>>>>>>>>> follows the Vector Function ABI[^2]
and real name that is used when
>>>>>>>>>>> generating calls to this vector
function.
>>>>>>>>>>> 
>>>>>>>>>>>    vfunc_name1(real_name1),
vfunc_name2(real_name2)
>>>>>>>>>>> 
>>>>>>>>>>> The Vector Function ABI name
describes the signature of the vector
>>>>>>>>>>> function so that properties like
vectorisation factor can be queried
>>>>>>>>>>> during compilation.
>>>>>>>>>>> 
>>>>>>>>>>> The `(real name)` token is optional
and assumed to match the Vector
>>>>>>>>>>> Function ABI name when omitted.
>>>>>>>>>>> 
>>>>>>>>>>> For example, the availability of a
2-lane double precision `sin`
>>>>>>>>>>> function via SVML when targeting
AVX on x86 is provided by the following
>>>>>>>>>>> IR.
>>>>>>>>>>> 
>>>>>>>>>>>    // ...
>>>>>>>>>>>    ... = call double @sin(double)
#0
>>>>>>>>>>>    // ...
>>>>>>>>>>> 
>>>>>>>>>>>    #0 = { vector-variant =
{"_ZGVcN2v_sin(__svml_sin2),
>>>>>>>>>>>                             
_ZGVdN4v_sin(__svml_sin4),
>>>>>>>>>>>                             
..."} }
>>>>>>>>>>> 
>>>>>>>>>>> The string
`"_ZGVcN2v_sin(__svml_sin2)"` in this vector-variant
>>>>>>>>>>> attribute provides information on
the shape of the vector function via
>>>>>>>>>>> the string `_ZGVcN2v_sin`, mangled
according to the Vector Function ABI
>>>>>>>>>>> for Intel, and remaps the standard
Vector Function ABI name to the
>>>>>>>>>>> non-standard name `__svml_sin2`.
>>>>>>>>>>> 
>>>>>>>>>>> This metadata is compatible with
the proposal "Proposal for function
>>>>>>>>>>> vectorization and loop
vectorization with function calls",[^3] that uses
>>>>>>>>>>> Vector Function ABI mangled names
to inform the vectorizer about the
>>>>>>>>>>> availability of vector functions.
The proposal extends the original by
>>>>>>>>>>> allowing the explicit mapping of
the Vector Function ABI mangled name to
>>>>>>>>>>> a non-standard name, which allows
the use of existing vector libraries.
>>>>>>>>>>> 
>>>>>>>>>>> The `vector-variant` attribute
needs to be attached on a per-call basis
>>>>>>>>>>> to avoid conflicts when merging
modules with different vector variants.
>>>>>>>>>>> 
>>>>>>>>>>> The query infrastructure: SVFS
{#infrastructure}
>>>>>>>>>>> ------------------------------
>>>>>>>>>>> 
>>>>>>>>>>> The Search Vector Function System
(SVFS) is constructed from an
>>>>>>>>>>> `llvm::Module` instance so it can
create function definitions. The SVFS
>>>>>>>>>>> exposes an API with two methods.
>>>>>>>>>>> 
>>>>>>>>>>> ### `SVFS::isFunctionVectorizable`
>>>>>>>>>>> 
>>>>>>>>>>> This method queries the avilability
of a vectorized version of a
>>>>>>>>>>> function. The signature of the
method is as follows.
>>>>>>>>>>> 
>>>>>>>>>>>    bool
isFunctionVectorizable(llvm::CallInst * Call, ParTypeMap Params);
>>>>>>>>>>> 
>>>>>>>>>>> The method determine the
availability of vector version of the function
>>>>>>>>>>> invoked by the `Call` parameter by
looking at the `vector-variant`
>>>>>>>>>>> metadata.
>>>>>>>>>>> 
>>>>>>>>>>> The `Params` argument is a map that
associates the position of a
>>>>>>>>>>> parameter in the `CallInst` to its
`ParameterType` descriptor. The
>>>>>>>>>>> `ParameterType` descriptor holds
information about the shape of the
>>>>>>>>>>> correspondend parameter in the
signature of the vector function. This
>>>>>>>>>>> `ParamaterType` is used to query
the SVMS about the availability of
>>>>>>>>>>> vector version that have `linear`,
`uniform` or `align` parameters (in
>>>>>>>>>>> the sense of OpenMP 4.0 and
onwards).
>>>>>>>>>>> 
>>>>>>>>>>> The method
`isFunctionVectorizable`, when invoked with an empty
>>>>>>>>>>> `ParTypeMap`, is equivalent to the
`TargetLibraryInfo` method
>>>>>>>>>>> `isFunctionVectorizable(StrinRef
Name)`.
>>>>>>>>>>> 
>>>>>>>>>>> ### `SVFS::getVectorizedFunction`
>>>>>>>>>>> 
>>>>>>>>>>> This method returns the vector
function declaration that correspond to
>>>>>>>>>>> the needs of the vectorization
technique that is being run.
>>>>>>>>>>> 
>>>>>>>>>>> The signature of the function is as
follows.
>>>>>>>>>>> 
>>>>>>>>>>>    std::pair<llvm::FunctionType
*, std::string> getVectorizedFunction(
>>>>>>>>>>>      llvm::CallInst * Call,
unsigned VF, bool IsMasked, ParTypeSet Params);
>>>>>>>>>>> 
>>>>>>>>>>> The `Call` parameter is the call
instance that is being vectorized, the
>>>>>>>>>>> `VF` parameter represent the
vectorization factor (how many lanes), the
>>>>>>>>>>> `IsMasked` parameter decides
whether or not the signature of the vector
>>>>>>>>>>> function is required to have a mask
parameter, the `Params` parameter
>>>>>>>>>>> describes the shape of the vector
function as in the
>>>>>>>>>>> `isFunctionVectorizable` method.
>>>>>>>>>>> 
>>>>>>>>>>> The methods uses the
`vector-variant` metadata and returns the function
>>>>>>>>>>> signature and the name of the
function based on the input parameters.
>>>>>>>>>>> 
>>>>>>>>>>> The SVFS can add new function
definitions, in the same module as the
>>>>>>>>>>> `Call`, to provide vector functions
that are not present within the
>>>>>>>>>>> vector-variant metadata. For
example, if a library provides a vector
>>>>>>>>>>> version of a function with a
vectorization factor of 2, but the
>>>>>>>>>>> vectorizer is requesting a
vectorization factor of 4, the SVFS is
>>>>>>>>>>> allowed to create a definition that
calls the 2-lane version twice. This
>>>>>>>>>>> capability applies similarly for
providing masked and unmasked versions
>>>>>>>>>>> when the request does not match
what is available in the library.
>>>>>>>>>>> 
>>>>>>>>>>> This method is equivalent to the
TLI method
>>>>>>>>>>> `StringRef
getVectorizedFunction(StringRef F, unsigned VF) const;`.
>>>>>>>>>>> 
>>>>>>>>>>> Notice that to fully support OpenMP
vectorization we need to think about
>>>>>>>>>>> a fuzzy matching mechanism that is
able to select a candidate in the
>>>>>>>>>>> calling context. However, this
proposal is intended for scalar-to-vector
>>>>>>>>>>> mappings of math-like functions
that are most likely to associate a
>>>>>>>>>>> unique vector candidate in most
contexts. Therefore, extending this
>>>>>>>>>>> behavior to a generic one is an
aspect of the implementation that will
>>>>>>>>>>> be treated in a separate RFC about
the vectorization pass.
>>>>>>>>>>> 
>>>>>>>>>>> ### Scalable vectorization
>>>>>>>>>>> 
>>>>>>>>>>> Both methods of the SVFS API will
be extended with a boolean parameter
>>>>>>>>>>> to specify whether scalable
signatures are needed by the user of the
>>>>>>>>>>> SVFS.
>>>>>>>>>>> 
>>>>>>>>>>> Changes in clang {#clang}
>>>>>>>>>>> ----------------
>>>>>>>>>>> 
>>>>>>>>>>> We use clang to generate the
metadata described above.
>>>>>>>>>>> 
>>>>>>>>>>> In the compilation unit, the vector
function definition or declaration
>>>>>>>>>>> must be visible and associated to
the scalar version via the
>>>>>>>>>>> `#pragma clang declare variant`
according to the rule defined by the
>>>>>>>>>>> correspondent `#pragma omp declare
variant` defined in OpenMP 5.0, as in
>>>>>>>>>>> the following example.
>>>>>>>>>>> 
>>>>>>>>>>>    #pragma clang declare
variant(vector_sinf) \
>>>>>>>>>>>   
match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})
>>>>>>>>>>>    extern float sinf(float);
>>>>>>>>>>> 
>>>>>>>>>>>    float32x4_t
vector_sinf(float32x4_t x);
>>>>>>>>>>> 
>>>>>>>>>>> The `construct` set in the
directive, together with the `device` set, is
>>>>>>>>>>> used to generate the vector mangled
name to be used in the
>>>>>>>>>>> `vector-variant` attribute, for
example `_ZGVnN2v_sin`, when targeting
>>>>>>>>>>> AArch64 Advanced SIMD code
generation. The rule for mangling the name of
>>>>>>>>>>> the scalar function in the vector
name are defined in the the Vector
>>>>>>>>>>> Function ABI specification of the
target.
>>>>>>>>>>> 
>>>>>>>>>>> The part of the vector-variant
attribute that redirects the call to
>>>>>>>>>>> `vector_sinf` is derived from the
`variant-id` specified in the
>>>>>>>>>>> `variant` clause.
>>>>>>>>>>> 
>>>>>>>>>>> Summary
>>>>>>>>>>>
======>>>>>>>>>>>
>>>>>>>>>>> New `clang` directive in clang
>>>>>>>>>>> ------------------------------
>>>>>>>>>>> 
>>>>>>>>>>> `#pragma omp declare variant`, same
as `#pragma omp declare variant`
>>>>>>>>>>> restricted to the `simd` context
selector, from OpenMP 5.0+.
>>>>>>>>>>> 
>>>>>>>>>>> Option behavior, and interaction
with OpenMP
>>>>>>>>>>>
--------------------------------------------
>>>>>>>>>>> 
>>>>>>>>>>> The behavior described below makes
sure that
>>>>>>>>>>> `#pragma cland declare variant`
function vectorization and OpenMP
>>>>>>>>>>> function vectorization are
orthogonal.
>>>>>>>>>>> 
>>>>>>>>>>> `-fclang-declare-variant`
>>>>>>>>>>> 
>>>>>>>>>>> :   The `#pragma clang declare
variant` directives are parsed and used
>>>>>>>>>>>    to populate the `vector-variant`
attribute.
>>>>>>>>>>> 
>>>>>>>>>>> `-fopenmp[-simd]`
>>>>>>>>>>> 
>>>>>>>>>>> :   The `#pragma omp declare
variant` directives are parsed and used to
>>>>>>>>>>>    populate the `vector-variant`
attribute.
>>>>>>>>>>> 
>>>>>>>>>>> `-fopenmp[-simd]`and
`-fno-clang-declare-variant`
>>>>>>>>>>> 
>>>>>>>>>>> :   The directive `#pragma omp
declare variant` is used to populate the
>>>>>>>>>>>    `vector-variant` attribute in
IR. The directive
>>>>>>>>>>>    `#pragma   clang declare
variant` are ignored.
>>>>>>>>>>> 
>>>>>>>>>>> [^1]:
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
>>>>>>>>>>> 
>>>>>>>>>>> [^2]: Vector Function ABI for x86:
>>>>>>>>>>>   
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
>>>>>>>>>>>    Vector Function ABI for AArch64:
>>>>>>>>>>>   
https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi
>>>>>>>>>>> 
>>>>>>>>>>> [^3]:
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
>>>>>>>>>>> 
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>
_______________________________________________
>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>> -- 
>>>>>>> Hal Finkel
>>>>>>> Lead, Compiler Technology and Programming Languages
>>>>>>> Leadership Computing Facility
>>>>>>> Argonne National Laboratory
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> cfe-dev mailing list
>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>> 
>>> 
>>> -- 
>>> 
>>> Johannes Doerfert
>>> Researcher
>>> 
>>> Argonne National Laboratory
>>> Lemont, IL 60439, USA
>>> 
>>> jdoerfert at anl.gov
>> 
> 
> -- 
> 
> Johannes Doerfert
> Researcher
> 
> Argonne National Laboratory
> Lemont, IL 60439, USA
> 
> jdoerfert at anl.gov

Saito, Hideki via llvm-dev

2019-May-31 22:51 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

>VectorClone does more than just mapping a scalar version to a vector one. It
builds also the vector version definition by auto-vectorizing the body of the
scalar function.
To be more precise:
VecClone strictly deals with the callee side of the code. Caller side mapping
happens in vectorizer (LoopVectorize for the most part, but I don't see why
SLPVectorize can't, for example).
Starting from the scalar code, for each "_ZGV..." name it finds in the
function attribute, VecClone will create a new function definition with that
_ZGV... name, with the function body of
the scalar code surrounded by a constant trip loop (trip count is part of the
mangled function name) and then massage the function body. Once the IR for
"#pragma omp simd" is well
defined, we'd add #pragma omp simd simdlen() to that constant trip loop. The
end result is a function with a widened interface that can be called from the
vectorized caller. The work
of VecClone ends here. Later, LoopVectorize should process the #pragma omp simd
simdlen() and that's the real end of the callee side handling of #pragma omp
declare simd.
The code is still fully functional w/o LoopVectorize vectorizing that loop.
>I don’t know if the patches related to VecClone also are intended to use the
`vector-variant` attribute for function declaration with a #pragma omp declare
simd.
VecClone predated #pragma omp declare variant. So that patches doesn’t know
about declare variant. VecClone was written for handling #pragma omp declare
simd,
as described above. OpenCL/SYCL kernel is similar enough to OpenMP declare simd.
Most code can be reused.

Thanks,
Hideki

-----Original Message-----
From: Francesco Petrogalli [mailto:Francesco.Petrogalli at arm.com] 
Sent: Friday, May 31, 2019 3:06 PM
To: Doerfert, Johannes <jdoerfert at anl.gov>
Cc: Philip Reames <listmail at philipreames.com>; Finkel, Hal J.
<hfinkel at anl.gov>; LLVM Development List <llvm-dev at
lists.llvm.org>; nd <nd at arm.com>; Saito, Hideki <hideki.saito at
intel.com>; Clang Dev <cfe-dev at lists.llvm.org>; scogland1 at
llnl.gov
Subject: Re: [cfe-dev] [llvm-dev] [RFC] Expose user provided vector function for
auto-vectorization.


> On May 31, 2019, at 2:56 PM, Doerfert, Johannes <jdoerfert at
anl.gov> wrote:
> 
> I think I did misunderstand what you want to do with attributes. This 
> is my bad. Let me try to explain:
> 
> It seems you want the "vector-variants" attributes (which I could
not
> find with this name in trunk, correct?) to "remember" what vector
> versions can be created (wrt. validity), assuming a definition is 
> available? Correct?
Yes.
> What I was concerned with is the example I sketched somewhere below 
> which motivates the need for a generalized/standardized name mangling 
> for OpenMP. I though you wanted to avoid that somehow but if you don't 
> I misunderstood you. I basically removed the part where the vector 
> versions have to be created first but I assumed them to be existent 
> (in the module or somewhere else). That is, I assumed a call to foo 
> and various symbols available that are specializations of foo. When we 
> then vectorize foo (or otherwise specialize at some point in the 
> future), you would scan the module and pick the best match based on 
> the context of the call.
> 
Yes, although the syntax you use below is wrong. Declare variant is attached to
the scalar definition, and points to a vector definitions (the variant) that is
declared/defined in the same compilation unit where the scalar version is
visible.

> Now I don't know if I understood your proposal by now but let me ask a 
> question anyway:
> 
> VecClone.cpp:276-278 mentions that the vectorizer is supposed to look 
> at the vector-variants functions. This works for variants that are 
> created from definitions in the module but what about #omp declare 
> simd declarations?
> 
VectorClone does more than just mapping a scalar version to a vector one. It
builds also the vector version definition by auto-vectorizing the body of the
scalar function.

I don’t know if the patches related to VecClone also are intended to use the
`vector-variant` attribute for function declaration with a #pragma omp declare
simd. On aarch64, in Arm compiler for HPC, we do that to support vector math
libraries. It works in principle, but `vector variant` allows more context
selection (and custom names instead of vector ABI names, which are easier for
users).

> 
> On 05/31, Francesco Petrogalli wrote:
>>> On May 31, 2019, at 11:47 AM, Doerfert, Johannes <jdoerfert at
anl.gov> wrote:
>>> 
>>> I think we should split this discussion:
>>> TOPIC 1 & 2 & 4: How do implement all use cases and OpenMP
5.X
>>>                  features, including compatibility with other
>>>                  compilers and cross module support.
>> 
>> Yes, and we have to carefully make this as standard and compatible as
possible.
> 
> Agreed.
> 
> 
>>> TOPIC 3b & 5: Interoperability with clang declare (system vs.
user
>>>                declares)
>> 
>> 
>> I think that Alexey explanation of how the directive are handled 
>> internally in the frontend makes us propound towards the attribute.
> 
> How things are handled right now, especially given that declare 
> variant is not handled at all, should not limit our design space. If 
> the argument is that we cannot reasonably implement a solution, that 
> is a different story.
> 
> 
>>> TOPIC 3a & 3c: floating point issues?
>>> 
>> 
>> I believe there is no issue there. I have quoted the openMP standard in
reply to Renato:
>> 
>> See
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page
118, lines 23-24:
>> 
>> “The execution of the function or subroutine cannot have any side 
>> effects that would alter its execution for concurrent iterations of a 
>> SIMD chunk."
> 
> Great.
> 
> 
>>> I inlined comments for Topic 1 below.
>>> 
>>> I hope that we do not have to discuss topic 2 if we agree neither 
>>> attributes nor metadata is necessary, or better, will solve the 
>>> actual problem at hand. I don't have strong feeling on topic 4
but I
>>> have the feeling this will become less problematic once we figure
out topic 1.
>>> 
>>> Thanks,
>>> Johannes
>>> 
>>> 
>>> On 05/31, Francesco Petrogalli wrote:
>>>> # TOPIC 1: concerns about name mangling
>>>> 
>>>> I understand that there are concerns in using the mangling
scheme I
>>>> proposed, and that it would be preferred to have a mangling
scheme
>>>> that is based on (and standardized by) OpenMP.
>>> 
>>> I still think it will be required to have a standardized one, not 
>>> only preferred.
>>> 
>>> 
>> 
>> I am all with you in standardizing. x86 and arch64 have their own 
>> vector function ABI, which, although “private”, are to be considered 
>> standard. Opensource and commercial compilers are using them, 
>> therefore we have to deal with this mangling scheme, whether or not 
>> OpenMP comes up with a standard mangling scheme.
> 
> I don't get the point you are trying to make here. What do you mean by 
> "we have to deal with"? (I do not suggest to get rid of them.)
> 
That we cannot ignore the fact that the name scheme is already standardized by
the vendors, so let’s first deal with what we have, and think about the OpenMP
mangling scheme only once there is one available.
> 
>>>> I hear the argument on having some common ground here. In fact,
>>>> there is already common ground between the x86 and aarch64
backend,
>>>> who have based their respective Vector Function ABI
specifications on OpenMP.
>>>> 
>>>> In fact, the mangled name grammar can be summarized as follows:
>>>> 
>>>> _ZGV<isa><masking><VLEN><parameter
type>_<scalar name>
>>>> 
>>>> Across vector extensions the only <token> that will
differ is the
>>>> <isa> token.
>>>> 
>>>> This might lead people to think that we could drop the
_ZGV<isa>
>>>> prefix and consider the
<masking><VLEN><parameter type>_<scalar
>>>> name> part as a sort of unofficial OpenMP mangling scheme:
in fact,
>>>> the signature of an “unmasked 2-lane vector vector of `sin`”
will
>>>> always be `<2 x double>(2 x double>).
>>>> 
>>>> The problem with this choice is the number of vector version 
>>>> available for a target is not unique.
>>> 
>>> For me, this simply means this mangling scheme is not sufficient.
>>> 
>> 
>> Can you explain more why you think the mangling scheme is not 
>> sufficient? The mangling scheme is shaped to provide all the 
>> information that the OpenMP directive describes.
> 
> I don't know if it is insufficient but I though you hinted towards
that.
I didn’t mean that, the tokens in the vector function ABI mangled schemes are
sufficient.
> If we can handle/decode everything we need for declare variants then I 
> do not object at all. If not, we require respective extension such 
> that we can. The result should be a superset of the current SIMD 
> encoding and compatible with the current one.
> 
> 
We can handle/decode everything for a SIMD context. :)

> 
>> The fact that x86 and aarch64 realize such information in different 
>> way (multiple signature/vector extensions) is something that cannot 
>> be avoided, because it is related to architectural aspects that are 
>> specific to the vector extension and transparent to the OpenMP 
>> standard.
> 
> I don't think that is a problem (that's why I "failed to see
the
> problem" in the comment below). I look at it this way: If #declare 
> simd, or similar, results in N variants, it should at the end of the 
> day not be different from declaring these N variants explicitly with 
> the respective declare variant match clause.
> 
That’s not the case. #declare simd should create all the versions that are
optimal for the target. We carefully thoght about that when writing the vector
function ABI. Most of the constrains derive by the fact that each target has a
specific register size.

Example:

#pragma omp declare simd
Float foo(float);

X86 -> 8 version {2, 4, 8, 16 lanes} x {masking, no masking}, see
https://godbolt.org/z/m1BUVt Arm NEON: -> 4 versions {2, 4 lanes} x {masking,
no masking } Arm SVE: -> 1 version

Therefore, the outcome of declare simd is not target independent. Your
expectation are met only inside one target.

> 
>>>> In particular, the following declaration generates multiple
vector
>>>> versions, depending on the target:
>>>> 
>>>> #pragma omp declare simd simdlen(2) notinbranch double
foo(double)
>>>> {…};
>>>> 
>>>> On x86, this generates at least 4 symbols (one for SSE, one for
>>>> AVX, one for AVX2, and one for AVX512: 
>>>> https://godbolt.org/z/TLYXPi)
>>>> 
>>>> On aarch64, the same declaration generates a unique symbol, as 
>>>> specified in the Vector Function ABI.
>>> 
>>> I fail to see the problem. We generate X symbols for X different 
>>> contexts. Once we get to the point where we vectorize, we determine
>>> which context fits best and choose the corresponding symbol
version.
>>> 
>> 
>> Yes, this is exactly what we need to do, under the constrains that 
>> the rules for  generating "X symbols for X different contexts” are
>> decided by the Vector Function ABI of the target.
> 
> Sounds good. The vector ABI is used to determine what contexts exists 
> and what symbols should be created. I would assume the encoding should 
> be the same as if we specified the versions (/contexts) ourselves via 
> #declare variant.
> 
Oh yes, vector functions listed in a declare variant should obey the vector
function ABI rules (other than the function name).
> 
>>> Maybe my view is to naive here, please feel free to correct me.
>>> 
>>> 
>>>> This means that the attribute (or metadata) that carries the 
>>>> information on the available vector version needs to deal also
with
>>>> things that are not usually visible at IR level, but that might
>>>> still need to be provided to be able to decide which particular
>>>> instruction set/ vector extension needs to be targeted.
>>> 
>>> The symbol names should carry all the information we need. If they 
>>> do not, we need to improve the mangling scheme such that they do. 
>>> There is no attributes/metadata we could use at library boundaries.
>>> 
>> Hum, I am not sure what you mean by "There is no
attributes/metadata
>> we could use at library boundaries."
> 
> (This seems to be part of the misunderstanding, I leave my comment 
> here
> anyway:)
> 
> The simd-related stuff works because it is a uniform mangling scheme 
> used by all compilers. Take the situation below in which I think we 
> want to call foo_CTX in the library. If so, we need a name for it.
> 
In the situation below, the mangled name is going to be the same for both
compilers, as long as they adhere to the vector function ABI.
> 
> a.c:  // Compiled by gcc into a library #omp declare variant (foo) 
> match(CTX) void foo_CTX(...) {...}
> 
> b.c:  // Compiled by clang linked against the library above.
> #omp declare variant (foo) match(CTX)
> void foo_CTX(...);
> 
> void bar(...) {
>  #pragma omp CTX
>  foo();   // <- What function (symbol) do we call if a.c was compiled
>           //    by gcc and b.c with clang?
> }
> 
Please notice that `declare variant` needs to be attached to the scalar
function, not the vector one.

```
#pragma omp declare variant(foo_CTX) match (context=simd… double foo (double)
{…}

Vector_double_ty foo_CTX(vector_double_ty) {…} ```

In vectorizing foo in bar, the compiler will not care where foo_CTX would come
from (of course, as long as the scalar+declare variant declarations are
visible).
>> In our downstream compiler (Arm compiler for HPC, based on LLVM), we 
>> use `declare simd` to provide vector math functions via custom header 
>> file. It works brilliantly, if not for specific aspects that would be 
>> perfectly covered by the `declare variant`, which might be one of the 
>> reason why the OpenMP committee decided to introduce `declare 
>> variant`.
> 
> But you (assume that you) control the mangling scheme across the 
> entire infrastructure. Given that the simd mangling is de-facto 
> standardized, that works.
> 
> Side note:
> Declare variant, as of 5.0, is not flexible enough for a sensible 
> inclusion of target specific headers. That will change in 5.1.
> 
Could you point me at the discussion in 5.1 on this specific aspect?

> 
>> If your concerns is that by adding an attribute that somehow 
>> represent something that is available in an external library is not 
>> enough to guarantee that that symbol is available in the library… not 
>> even C code can guarantee that? If the linker is not pointing to the 
>> right library, there is nothing that can prevent it to fail if the 
>> symbol is not present?
> 
> I don't follow the example you describe. I don't want to change 
> anything in how symbols are looked up or what happens if they are missing.
> 
> 
I don’t want to change that too :). I think we are misunderstanding each other
here...
>>>> I used an example based on `declare simd` instead of `declare 
>>>> variant` because the attribute/metadata needed for `declare 
>>>> variant` is a modification of the one needed for `declare
simd`,
>>>> which has already been agreed in a previous RFC proposed by
Intel
>>>> [1], and for which Intel has already provided an implementation
>>>> [2]. The changes proposed in this RFC are fully compatible with
the
>>>> work that is being don for the VecClone pass in [2].
>>>> 
>>>> [1]
http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
>>>> [2] VecCLone pass: https://reviews.llvm.org/D22792
>>> 
>>> Having an agreed upon mangling for the older feature is not 
>>> necessarily important here. We will need more functionality for 
>>> variants and keeping the old scheme around with some metadata is
not
>>> an extensible long-term solution. So, I would not try to fit 
>>> variants into the existing simd-scheme but instead do it the other 
>>> way around. We define what we need for variants and implement simd
in that scheme.
>>> 
>> 
>> I kinda think that having agreed on something is important. It allows 
>> to build other things on top of what have been agreed without 
>> breaking compatibility.
>> 
>> On the specific, which are the new functionalities needed for the 
>> variants that would make the current metadata (attributes) for 
>> declare simd non extensible?
> 
> See first comment.
> 
>>>> The good news is that as far as AArch64 and x86 are concerned,
the only thing that will differ in the mangled name is the “<isa>” token.
As far as I can tell, the mangling scheme of the rest of the vector name is the
same, therefore a lot of infrastructure in terms of mangling and demangling can
be reused. In fact, the `mangleVectorParameters` function in
https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could
already be shared among x86 and aarch64.
>>>> 
>>>> TOPIC 2: metadata vs attribute
>>>> 
>>>> From a functionality point of view, I don’t care whether we use
metadata or attributes. The VecClone pass mentioned in TOPIC 1 uses the
following:
>>>> 
>>>> attributes #0 = { nounwind uwtable 
>>>>
“vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_ve
>>>>
c_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM1
>>>> 6vv_vec_sum,_ZGVeN16”}
>>>> 
>>>> This is an attribute (I though it was metadata?), I am happy to
reword the RFC using the right terminology (sorry for messing this up).
>>>> 
>>>> Also, @Renato expressed concern that metadata might be dropped
by optimization passes - would using attributes prevent that?
>>>> 
>>>> TOPIC 3: "There is no way to notify the backend how
conformant the SIMD versions are.”
>>>> 
>>>> @Shawn, I am afraid I don’t understand what you mean by
“conformant” here. Can you elaborate with an example?
>>>> 
>>>> TOPIC 3: interaction of the `omp declare variant` with `clang 
>>>> declare variant`
>>>> 
>>>> I believe this is described in the `Option behavior, and
interaction with OpenMP`. The option `-fclang-declare-variant` is there to make
the OpenMP based one orthogonal. Of course, we might decide to make
-fclang-declare-variant on/off by default, and have default behavior when
interacting with -fopenmp-simd. For the sake of compatibility with other
compilers, we might need to require -fno-clang-declare-variant when targeting
-fopenmp-[simd].
>>>> 
>>>> TOPIC 3: "there are no special arguments / flags / status
regs that are used / changed in the vector version that the compiler will have
to "just know”
>>>> 
>>>> I believe that this concern is raised by the problem of
handling FP exceptions? If that’s the case, the compiler is not allowed to do
any assumption on the vector function about that, and treat it with the same
knowledge of any other function, depending on the visibility it has in the
compilation unit. @Renato, does this answer your question?
>>>> 
>>>> TOPIC 4: attribute in function declaration vs attribute
function
>>>> call site
>>>> 
>>>> We discussed this in the previous version of the proposal.
Having it in the call sites guarantees that incompatible vector version are used
when merging modules compiled for different targets. I don’t have a use case for
this, if I remember correctly this was asked by @Hideki Saito. Hideki, any
comment on this?
>>>> 
>>>> TOPIC 5: overriding system header (the discussion on #pragma
omp/clang/system variants initiated by @Hal Finkel).
>>>> 
>>>> I though that the split among #pragma clang declare variant and
#pragma omp declare variant was already providing the orthogonality between
system header and user header. Meaning that a user should always prefer the omp
version (for portability to other compilers) instead of the #pragma clang one,
which would be relegated to system headers and headers provided by the compiler.
Am I missing something? If so, I am happy to add a “system” version of the
directive, as it would be quite easy to do given most of the parsing
infrastructure will be shared.
>>>> 
>>>> 
>>>>> On May 30, 2019, at 12:53 PM, Philip Reames <listmail at
philipreames.com> wrote:
>>>>> 
>>>>> 
>>>>> On 5/30/19 9:05 AM, Doerfert, Johannes wrote:
>>>>>> On 05/29, Finkel, Hal J. via cfe-dev wrote:
>>>>>>> On 5/29/19 1:52 PM, Philip Reames wrote:
>>>>>>>> On 5/28/19 7:55 PM, Finkel, Hal J. wrote:
>>>>>>>>> On 5/28/19 3:31 PM, Philip Reames via
cfe-dev wrote:
>>>>>>>>>> I generally like the idea of having
support in IR for
>>>>>>>>>> vectorization of custom functions.  I
have several use cases which would benefit from this.
>>>>>>>>>> 
>>>>>>>>>> I'd suggest a couple of reframings
to the IR representation though.
>>>>>>>>>> 
>>>>>>>>>> First, this should probably be
specified as
>>>>>>>>>> metadata/attribute on a function
declaration.  Allowing the
>>>>>>>>>> callsite variant is fine, but it should
primarily be a
>>>>>>>>>> property of the called function, not of
the call site.  Being able to specify it once per declaration is much cleaner.
>>>>>>>>> I agree. We should support this both on the
function
>>>>>>>>> declaration and on the call sites.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> Second, I really don't like the
mangling use here.  We need a
>>>>>>>>>> better way to specify the properties of
the function then
>>>>>>>>>> it's mangled name.  One thought to
explore is to directly use
>>>>>>>>>> the Value of the function declaration
(since this is metadata
>>>>>>>>>> and we can do that), and then tie the
properties to the
>>>>>>>>>> function declaration in some way? 
Sorry, I don't really have a specific suggestion here.
>>>>>>>>> Is the problem the mangling or the fact
that the mangling is
>>>>>>>>> ABI/target-specific? One option is to use
LLVM's mangling
>>>>>>>>> scheme (the one we use for intrinsics) and
then provide some
>>>>>>>>> backend infrastructure to translate later.
>>>>>>>> Well, both honestly.  But mangling with a
non-target specific scheme is
>>>>>>>> a lot better, so I might be okay with that.  
Good idea.
>>>>>>> 
>>>>>>> I liked your idea of directly encoding the
signature in the
>>>>>>> metadata, but I think that we want to continue to
use
>>>>>>> attributes, and not metadata, and the options for
attributes
>>>>>>> seem more limited - unless we allow attributes to
take metadata
>>>>>>> arguments - maybe that's an enhancement worth
considering.
>>>>>> I recently talked to people in the OpenMP language
committee
>>>>>> meeting about this and, thinking forward to the actual 
>>>>>> implementation/use of the OpenMP 5.x declare variant
feature, I'd say:
>>>>>> 
>>>>>> - We will need a mangling scheme if we want to allow
variants on
>>>>>> declarations that are defined elsewhere.
>>>>>> - We will need a (OpenMP) standardized mangling scheme
if we want
>>>>>> interoperability between compilers.
>>>>>> 
>>>>>> I assume we want both so I think we will need both.
>>>>> If I'm reading this correctly, this describes a need
for the
>>>>> frontend to have a mangling scheme.  Nothing in here would
seem to
>>>>> prevent the frontend for generating a declaration for a
mangled
>>>>> external symbol and then referencing that declaration.  Am
I missing something?
>>>>>> 
>>>>>> That said, I think this should allow us to avoid 
>>>>>> attributes/metadata which seems to me like a good thing
right now.
>>>>>> 
>>>>>> Cheers,
>>>>>> Johannes
>>>>>> 
>>>>>> 
>>>>>>>>>> On 5/28/19 12:44 PM, Francesco
Petrogalli via llvm-dev wrote:
>>>>>>>>>>> Dear all,
>>>>>>>>>>> 
>>>>>>>>>>> This RFC is a proposal to provide
auto-vectorization functionality for user provided vector functions.
>>>>>>>>>>> 
>>>>>>>>>>> The proposal is a modification of
an RFC that I have sent out a couple of months ago, with the title `[RFC]
Re-implementing -fveclib with OpenMP` (see
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The
previous RFC is to be considered abandoned.
>>>>>>>>>>> 
>>>>>>>>>>> The original RFC was proposing to
re-implement the `-fveclib` command line option. This proposal avoids that, and
limits its scope to the mechanics of providing vector function in user code that
the compiler can pick up for auto-vectorization. This narrower scope limits the
impact of changes that are needed in both clang and LLVM.
>>>>>>>>>>> 
>>>>>>>>>>> Please let me know what you think.
>>>>>>>>>>> 
>>>>>>>>>>> Kind regards,
>>>>>>>>>>> 
>>>>>>>>>>> Francesco
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>
===========================================================>>>>>>>>>>>
====================>>>>>>>>>>>
>>>>>>>>>>> Introduction
>>>>>>>>>>>
===========>>>>>>>>>>>
>>>>>>>>>>> This RFC encompasses the proposal
of informing the
>>>>>>>>>>> vectorizer about the availability
of vector functions
>>>>>>>>>>> provided by the user. The mechanism
is based on the use of
>>>>>>>>>>> the directive `declare variant`
introduced in OpenMP
>>>>>>>>>>> 5.0 [^1].
>>>>>>>>>>> 
>>>>>>>>>>> The mechanism proposed has the
following properties:
>>>>>>>>>>> 
>>>>>>>>>>> 1.  Decouples the compiler
front-end that knows about the availability
>>>>>>>>>>>    of vectorized routines, from the
back-end that knows how to make use
>>>>>>>>>>>    of them.
>>>>>>>>>>> 2.  Enable support for a
developer's own vector libraries without
>>>>>>>>>>>    requiring changes to the
compiler.
>>>>>>>>>>> 3.  Enables other frontends (e.g.
f18) to add scalar-to-vector function
>>>>>>>>>>>    mappings as relevant for their
own runtime libraries, etc.
>>>>>>>>>>> 
>>>>>>>>>>> The implemetation consists of two
separate sets of changes.
>>>>>>>>>>> 
>>>>>>>>>>> The first set is a set o changes in
`llvm`, and consists of:
>>>>>>>>>>> 
>>>>>>>>>>> 1.  [Changes in LLVM IR](#llvmIR)
to provide information about the
>>>>>>>>>>>    availability of user-defined
vector functions via metadata attached
>>>>>>>>>>>    to an `llvm::CallInst`.
>>>>>>>>>>> 2.  [An
infrastructure](#infrastructure) that can be queried to retrive
>>>>>>>>>>>    information about the available
vector functions associated to a
>>>>>>>>>>>    `llvm::CallInst`.
>>>>>>>>>>> 3.  [Changes in the
LoopVectorizer](#LV) to use the API to query the
>>>>>>>>>>>    metadata.
>>>>>>>>>>> 
>>>>>>>>>>> The second set consists of the
changes [changes in
>>>>>>>>>>> clang](#clang) that are needed too
to recognize the `#pragma
>>>>>>>>>>> clang declare variant` directive.
>>>>>>>>>>> 
>>>>>>>>>>> Proposed changes
>>>>>>>>>>>
===============>>>>>>>>>>>
>>>>>>>>>>> We propose an implementation that
uses `#pragma clang
>>>>>>>>>>> declare variant` to inform the
backend components about the
>>>>>>>>>>> availability of vector version of
scalar functions found in
>>>>>>>>>>> IR. The mechanism relies in storing
such information in IR
>>>>>>>>>>> metadata, and therefore makes the
auto-vectorization of
>>>>>>>>>>> function calls a mid-end (`opt`)
process that is independent on the front-end that generated such IR metadata.
>>>>>>>>>>> 
>>>>>>>>>>> This implementation provides a
generic mechanism that the
>>>>>>>>>>> users of the LLVM compiler will be
able to use for
>>>>>>>>>>> interfacing their own vector
routines for generic code.
>>>>>>>>>>> 
>>>>>>>>>>> The implementation can also expose
vectorization-specific
>>>>>>>>>>> descriptors -- for example, like
the `linear` and `uniform`
>>>>>>>>>>> clauses of the OpenMP `declare
simd` directive -- that could
>>>>>>>>>>> be used to finely tune the
automatic vectorization of some
>>>>>>>>>>> functions (think for example the
vectorization of `double
>>>>>>>>>>> sincos(double , double *, double
*)`, where `linear` can be
>>>>>>>>>>> used to give extra information
about the memory layout of the 2 pointers parameters in the vector version).
>>>>>>>>>>> 
>>>>>>>>>>> The directive `#pragma clang
declare variant` follows the
>>>>>>>>>>> syntax of the `#pragma omp declare
variant` directive of OpenMP.
>>>>>>>>>>> 
>>>>>>>>>>> We define the new directive in the
`clang` namespace instead
>>>>>>>>>>> of using the `omp` one of OpenMP to
allow the compiler to
>>>>>>>>>>> perform auto-vectorization outside
of an OpenMP SIMD context.
>>>>>>>>>>> 
>>>>>>>>>>> The mechanism is base on OpenMP to
provide a uniform user
>>>>>>>>>>> experience across the two
mechanism, and to maximise the
>>>>>>>>>>> number of shared components of the
infrastructure needed in
>>>>>>>>>>> the compiler frontend to enable the
feature.
>>>>>>>>>>> 
>>>>>>>>>>> Changes in LLVM IR {#llvmIR}
>>>>>>>>>>> ------------------
>>>>>>>>>>> 
>>>>>>>>>>> The IR is enriched with metadata
that details the
>>>>>>>>>>> availability of vector versions of
an associated scalar
>>>>>>>>>>> function. This metadata is attached
to the call site of the scalar function.
>>>>>>>>>>> 
>>>>>>>>>>> The metadata takes the form of an
attribute containing a
>>>>>>>>>>> comma separated list of vector
function mappings. Each entry
>>>>>>>>>>> has a unique name that follows the
Vector Function ABI[^2]
>>>>>>>>>>> and real name that is used when
generating calls to this vector function.
>>>>>>>>>>> 
>>>>>>>>>>>    vfunc_name1(real_name1),
vfunc_name2(real_name2)
>>>>>>>>>>> 
>>>>>>>>>>> The Vector Function ABI name
describes the signature of the
>>>>>>>>>>> vector function so that properties
like vectorisation factor
>>>>>>>>>>> can be queried during compilation.
>>>>>>>>>>> 
>>>>>>>>>>> The `(real name)` token is optional
and assumed to match the
>>>>>>>>>>> Vector Function ABI name when
omitted.
>>>>>>>>>>> 
>>>>>>>>>>> For example, the availability of a
2-lane double precision
>>>>>>>>>>> `sin` function via SVML when
targeting AVX on x86 is
>>>>>>>>>>> provided by the following IR.
>>>>>>>>>>> 
>>>>>>>>>>>    // ...
>>>>>>>>>>>    ... = call double @sin(double)
#0
>>>>>>>>>>>    // ...
>>>>>>>>>>> 
>>>>>>>>>>>    #0 = { vector-variant =
{"_ZGVcN2v_sin(__svml_sin2),
>>>>>>>>>>>                             
_ZGVdN4v_sin(__svml_sin4),
>>>>>>>>>>>                             
..."} }
>>>>>>>>>>> 
>>>>>>>>>>> The string
`"_ZGVcN2v_sin(__svml_sin2)"` in this
>>>>>>>>>>> vector-variant attribute provides
information on the shape
>>>>>>>>>>> of the vector function via the
string `_ZGVcN2v_sin`,
>>>>>>>>>>> mangled according to the Vector
Function ABI for Intel, and
>>>>>>>>>>> remaps the standard Vector Function
ABI name to the non-standard name `__svml_sin2`.
>>>>>>>>>>> 
>>>>>>>>>>> This metadata is compatible with
the proposal "Proposal for
>>>>>>>>>>> function vectorization and loop
vectorization with function
>>>>>>>>>>> calls",[^3] that uses Vector
Function ABI mangled names to
>>>>>>>>>>> inform the vectorizer about the
availability of vector
>>>>>>>>>>> functions. The proposal extends the
original by allowing the
>>>>>>>>>>> explicit mapping of the Vector
Function ABI mangled name to a non-standard name, which allows the use of
existing vector libraries.
>>>>>>>>>>> 
>>>>>>>>>>> The `vector-variant` attribute
needs to be attached on a
>>>>>>>>>>> per-call basis to avoid conflicts
when merging modules with different vector variants.
>>>>>>>>>>> 
>>>>>>>>>>> The query infrastructure: SVFS
{#infrastructure}
>>>>>>>>>>> ------------------------------
>>>>>>>>>>> 
>>>>>>>>>>> The Search Vector Function System
(SVFS) is constructed from
>>>>>>>>>>> an `llvm::Module` instance so it
can create function
>>>>>>>>>>> definitions. The SVFS exposes an
API with two methods.
>>>>>>>>>>> 
>>>>>>>>>>> ### `SVFS::isFunctionVectorizable`
>>>>>>>>>>> 
>>>>>>>>>>> This method queries the avilability
of a vectorized version
>>>>>>>>>>> of a function. The signature of the
method is as follows.
>>>>>>>>>>> 
>>>>>>>>>>>    bool
isFunctionVectorizable(llvm::CallInst * Call,
>>>>>>>>>>> ParTypeMap Params);
>>>>>>>>>>> 
>>>>>>>>>>> The method determine the
availability of vector version of
>>>>>>>>>>> the function invoked by the `Call`
parameter by looking at
>>>>>>>>>>> the `vector-variant` metadata.
>>>>>>>>>>> 
>>>>>>>>>>> The `Params` argument is a map that
associates the position
>>>>>>>>>>> of a parameter in the `CallInst` to
its `ParameterType`
>>>>>>>>>>> descriptor. The `ParameterType`
descriptor holds information
>>>>>>>>>>> about the shape of the
correspondend parameter in the
>>>>>>>>>>> signature of the vector function.
This `ParamaterType` is
>>>>>>>>>>> used to query the SVMS about the
availability of vector
>>>>>>>>>>> version that have `linear`,
`uniform` or `align` parameters (in the sense of OpenMP 4.0 and onwards).
>>>>>>>>>>> 
>>>>>>>>>>> The method
`isFunctionVectorizable`, when invoked with an
>>>>>>>>>>> empty `ParTypeMap`, is equivalent
to the `TargetLibraryInfo`
>>>>>>>>>>> method
`isFunctionVectorizable(StrinRef Name)`.
>>>>>>>>>>> 
>>>>>>>>>>> ### `SVFS::getVectorizedFunction`
>>>>>>>>>>> 
>>>>>>>>>>> This method returns the vector
function declaration that
>>>>>>>>>>> correspond to the needs of the
vectorization technique that is being run.
>>>>>>>>>>> 
>>>>>>>>>>> The signature of the function is as
follows.
>>>>>>>>>>> 
>>>>>>>>>>>    std::pair<llvm::FunctionType
*, std::string> getVectorizedFunction(
>>>>>>>>>>>      llvm::CallInst * Call,
unsigned VF, bool IsMasked,
>>>>>>>>>>> ParTypeSet Params);
>>>>>>>>>>> 
>>>>>>>>>>> The `Call` parameter is the call
instance that is being
>>>>>>>>>>> vectorized, the `VF` parameter
represent the vectorization
>>>>>>>>>>> factor (how many lanes), the
`IsMasked` parameter decides
>>>>>>>>>>> whether or not the signature of the
vector function is
>>>>>>>>>>> required to have a mask parameter,
the `Params` parameter
>>>>>>>>>>> describes the shape of the vector
function as in the `isFunctionVectorizable` method.
>>>>>>>>>>> 
>>>>>>>>>>> The methods uses the
`vector-variant` metadata and returns
>>>>>>>>>>> the function signature and the name
of the function based on the input parameters.
>>>>>>>>>>> 
>>>>>>>>>>> The SVFS can add new function
definitions, in the same
>>>>>>>>>>> module as the `Call`, to provide
vector functions that are
>>>>>>>>>>> not present within the
vector-variant metadata. For example,
>>>>>>>>>>> if a library provides a vector
version of a function with a
>>>>>>>>>>> vectorization factor of 2, but the
vectorizer is requesting
>>>>>>>>>>> a vectorization factor of 4, the
SVFS is allowed to create a
>>>>>>>>>>> definition that calls the 2-lane
version twice. This
>>>>>>>>>>> capability applies similarly for
providing masked and unmasked versions when the request does not match what is
available in the library.
>>>>>>>>>>> 
>>>>>>>>>>> This method is equivalent to the
TLI method `StringRef
>>>>>>>>>>> getVectorizedFunction(StringRef F,
unsigned VF) const;`.
>>>>>>>>>>> 
>>>>>>>>>>> Notice that to fully support OpenMP
vectorization we need to
>>>>>>>>>>> think about a fuzzy matching
mechanism that is able to
>>>>>>>>>>> select a candidate in the calling
context. However, this
>>>>>>>>>>> proposal is intended for
scalar-to-vector mappings of
>>>>>>>>>>> math-like functions that are most
likely to associate a
>>>>>>>>>>> unique vector candidate in most
contexts. Therefore,
>>>>>>>>>>> extending this behavior to a
generic one is an aspect of the implementation that will be treated in a
separate RFC about the vectorization pass.
>>>>>>>>>>> 
>>>>>>>>>>> ### Scalable vectorization
>>>>>>>>>>> 
>>>>>>>>>>> Both methods of the SVFS API will
be extended with a boolean
>>>>>>>>>>> parameter to specify whether
scalable signatures are needed
>>>>>>>>>>> by the user of the SVFS.
>>>>>>>>>>> 
>>>>>>>>>>> Changes in clang {#clang}
>>>>>>>>>>> ----------------
>>>>>>>>>>> 
>>>>>>>>>>> We use clang to generate the
metadata described above.
>>>>>>>>>>> 
>>>>>>>>>>> In the compilation unit, the vector
function definition or
>>>>>>>>>>> declaration must be visible and
associated to the scalar
>>>>>>>>>>> version via the `#pragma clang
declare variant` according to
>>>>>>>>>>> the rule defined by the
correspondent `#pragma omp declare
>>>>>>>>>>> variant` defined in OpenMP 5.0, as
in the following example.
>>>>>>>>>>> 
>>>>>>>>>>>    #pragma clang declare
variant(vector_sinf) \
>>>>>>>>>>>   
match(construct=simd(simdlen(4),notinbranch), device={isa("simd")})
>>>>>>>>>>>    extern float sinf(float);
>>>>>>>>>>> 
>>>>>>>>>>>    float32x4_t
vector_sinf(float32x4_t x);
>>>>>>>>>>> 
>>>>>>>>>>> The `construct` set in the
directive, together with the
>>>>>>>>>>> `device` set, is used to generate
the vector mangled name to
>>>>>>>>>>> be used in the `vector-variant`
attribute, for example
>>>>>>>>>>> `_ZGVnN2v_sin`, when targeting
>>>>>>>>>>> AArch64 Advanced SIMD code
generation. The rule for mangling
>>>>>>>>>>> the name of the scalar function in
the vector name are
>>>>>>>>>>> defined in the the Vector Function
ABI specification of the target.
>>>>>>>>>>> 
>>>>>>>>>>> The part of the vector-variant
attribute that redirects the
>>>>>>>>>>> call to `vector_sinf` is derived
from the `variant-id`
>>>>>>>>>>> specified in the `variant` clause.
>>>>>>>>>>> 
>>>>>>>>>>> Summary
>>>>>>>>>>>
======>>>>>>>>>>>
>>>>>>>>>>> New `clang` directive in clang
>>>>>>>>>>> ------------------------------
>>>>>>>>>>> 
>>>>>>>>>>> `#pragma omp declare variant`, same
as `#pragma omp declare
>>>>>>>>>>> variant` restricted to the `simd`
context selector, from OpenMP 5.0+.
>>>>>>>>>>> 
>>>>>>>>>>> Option behavior, and interaction
with OpenMP
>>>>>>>>>>>
--------------------------------------------
>>>>>>>>>>> 
>>>>>>>>>>> The behavior described below makes
sure that `#pragma cland
>>>>>>>>>>> declare variant` function
vectorization and OpenMP function
>>>>>>>>>>> vectorization are orthogonal.
>>>>>>>>>>> 
>>>>>>>>>>> `-fclang-declare-variant`
>>>>>>>>>>> 
>>>>>>>>>>> :   The `#pragma clang declare
variant` directives are parsed and used
>>>>>>>>>>>    to populate the `vector-variant`
attribute.
>>>>>>>>>>> 
>>>>>>>>>>> `-fopenmp[-simd]`
>>>>>>>>>>> 
>>>>>>>>>>> :   The `#pragma omp declare
variant` directives are parsed and used to
>>>>>>>>>>>    populate the `vector-variant`
attribute.
>>>>>>>>>>> 
>>>>>>>>>>> `-fopenmp[-simd]`and
`-fno-clang-declare-variant`
>>>>>>>>>>> 
>>>>>>>>>>> :   The directive `#pragma omp
declare variant` is used to populate the
>>>>>>>>>>>    `vector-variant` attribute in
IR. The directive
>>>>>>>>>>>    `#pragma   clang declare
variant` are ignored.
>>>>>>>>>>> 
>>>>>>>>>>> [^1]: 
>>>>>>>>>>>
<https://www.openmp.org/wp-content/uploads/OpenMP-API-Specif
>>>>>>>>>>> ication-5.0.pdf>
>>>>>>>>>>> 
>>>>>>>>>>> [^2]: Vector Function ABI for x86:
>>>>>>>>>>>   
<https://software.intel.com/en-us/articles/vector-simd-function-abi>.
>>>>>>>>>>>    Vector Function ABI for AArch64:
>>>>>>>>>>>    
>>>>>>>>>>>
https://developer.arm.com/products/software-development-tool
>>>>>>>>>>>
s/hpc/arm-compiler-for-hpc/vector-function-abi
>>>>>>>>>>> 
>>>>>>>>>>> [^3]: 
>>>>>>>>>>>
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.h
>>>>>>>>>>> tml>
>>>>>>>>>>> 
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> LLVM Developers mailing list
llvm-dev at lists.llvm.org
>>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>
_______________________________________________
>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>> --
>>>>>>> Hal Finkel
>>>>>>> Lead, Compiler Technology and Programming Languages
Leadership
>>>>>>> Computing Facility Argonne National Laboratory
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> cfe-dev mailing list
>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>> 
>>> 
>>> --
>>> 
>>> Johannes Doerfert
>>> Researcher
>>> 
>>> Argonne National Laboratory
>>> Lemont, IL 60439, USA
>>> 
>>> jdoerfert at anl.gov
>> 
> 
> --
> 
> Johannes Doerfert
> Researcher
> 
> Argonne National Laboratory
> Lemont, IL 60439, USA
> 
> jdoerfert at anl.gov

Doerfert, Johannes via llvm-dev

2019-May-31 23:09 UTC

head link

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

On 05/31, Francesco Petrogalli wrote:> 
> 
> > On May 31, 2019, at 2:56 PM, Doerfert, Johannes <jdoerfert at
anl.gov> wrote:
> > 
> > I think I did misunderstand what you want to do with attributes. This
is
> > my bad. Let me try to explain:
> > 
> > It seems you want the "vector-variants" attributes (which I
could not
> > find with this name in trunk, correct?) to "remember" what
vector
> > versions can be created (wrt. validity), assuming a definition is
> > available? Correct?
> 
> Yes.
> 
> > What I was concerned with is the example I sketched somewhere below
> > which motivates the need for a generalized/standardized name mangling
> > for OpenMP. I though you wanted to avoid that somehow but if you
don't I
> > misunderstood you. I basically removed the part where the vector
> > versions have to be created first but I assumed them to be existent
(in
> > the module or somewhere else). That is, I assumed a call to foo and
> > various symbols available that are specializations of foo. When we
then
> > vectorize foo (or otherwise specialize at some point in the future),
you
> > would scan the module and pick the best match based on the context of
> > the call.
> > 
> 
> Yes, although the syntax you use below is wrong. Declare variant is
> attached to the scalar definition, and points to a vector definitions
> (the variant) that is declared/defined in the same compilation unit
> where the scalar version is visible.
Yeah, I do it all the time. They changed that last minute in 5.0,... I'm
emotionally attached to the old one ;)

> > Now I don't know if I understood your proposal by now but let me
ask a
> > question anyway:
> > 
> > VecClone.cpp:276-278 mentions that the vectorizer is supposed to look
at
> > the vector-variants functions. This works for variants that are
created
> > from definitions in the module but what about #omp declare simd
> > declarations?
> > 
> 
> VectorClone does more than just mapping a scalar version to a vector
> one. It builds also the vector version definition by auto-vectorizing
> the body of the scalar function.
I get that.

> I don’t know if the patches related to VecClone also are intended to
> use the `vector-variant` attribute for function declaration with a
> #pragma omp declare simd. On aarch64, in Arm compiler for HPC, we do
> that to support vector math libraries. It works in principle, but
> `vector variant` allows more context selection (and custom names
> instead of vector ABI names, which are easier for users).
This seems to be very interesting. What declarations are considered
"vector-variants" in the first place? I could see all of the below
shown
versions to be reasonable choices.

#pragma omp declare simd (foo)

#pragma omp declare variant match(simd)
void bar(void);

#pragma omp declare variant match(parallel, simd)
void baz(void);

<2 x double> sin(<2 x double>) {}
<4 x float> sin(<4 x float>) { ... }


The ticket I mentioned earlier (#940, see link below), is proposing a
begin/end version of declare variant that would declare the enclosing
definitions as variants of functions with the same prototype. This goes
in the same direction as the sin definitions above but makes it more
explicit. One then might want to write something like:

#pramga omp begin declare variant match(nvptx, simd)
<4 x float> sin(<4 x float>) { ... }
<2 x double> sin(<2 x double>) { ... }
#pramga omp end declare variant

and expect these methods to be used if we have `sin` in a vectorized
environment. If we would only go by the mangled names available in the
module, this should at least be possible.

> > On 05/31, Francesco Petrogalli wrote:
> >>> On May 31, 2019, at 11:47 AM, Doerfert, Johannes <jdoerfert
at anl.gov> wrote:
> >>> 
> >>> I think we should split this discussion:
> >>> TOPIC 1 & 2 & 4: How do implement all use cases and
OpenMP 5.X
> >>>                  features, including compatibility with other
> >>>                  compilers and cross module support.
> >> 
> >> Yes, and we have to carefully make this as standard and compatible
as possible.
> > 
> > Agreed.
> > 
> > 
> >>> TOPIC 3b & 5: Interoperability with clang declare (system
vs. user
> >>>                declares)
> >> 
> >> 
> >> I think that Alexey explanation of how the directive are handled
> >> internally in the frontend makes us propound towards the
attribute.
> > 
> > How things are handled right now, especially given that declare
variant
> > is not handled at all, should not limit our design space. If the
> > argument is that we cannot reasonably implement a solution, that is a
> > different story.
> > 
> > 
> >>> TOPIC 3a & 3c: floating point issues?
> >>> 
> >> 
> >> I believe there is no issue there. I have quoted the openMP
standard in reply to Renato:
> >> 
> >> See
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page
118, lines 23-24:
> >> 
> >> “The execution of the function or subroutine cannot have any side
> >> effects that would alter its execution for concurrent iterations
of a
> >> SIMD chunk."
> > 
> > Great.
> > 
> > 
> >>> I inlined comments for Topic 1 below.
> >>> 
> >>> I hope that we do not have to discuss topic 2 if we agree
neither
> >>> attributes nor metadata is necessary, or better, will solve
the actual
> >>> problem at hand. I don't have strong feeling on topic 4
but I have the
> >>> feeling this will become less problematic once we figure out
topic 1.
> >>> 
> >>> Thanks,
> >>> Johannes
> >>> 
> >>> 
> >>> On 05/31, Francesco Petrogalli wrote:
> >>>> # TOPIC 1: concerns about name mangling
> >>>> 
> >>>> I understand that there are concerns in using the mangling
scheme I
> >>>> proposed, and that it would be preferred to have a
mangling scheme
> >>>> that is based on (and standardized by) OpenMP. 
> >>> 
> >>> I still think it will be required to have a standardized one,
not
> >>> only preferred.
> >>> 
> >>> 
> >> 
> >> I am all with you in standardizing. x86 and arch64 have their own
> >> vector function ABI, which, although “private”, are to be
considered
> >> standard. Opensource and commercial compilers are using them,
> >> therefore we have to deal with this mangling scheme, whether or
not
> >> OpenMP comes up with a standard mangling scheme.
> > 
> > I don't get the point you are trying to make here. What do you
mean by
> > "we have to deal with"? (I do not suggest to get rid of
them.)
> > 
> 
> That we cannot ignore the fact that the name scheme is already
> standardized by the vendors, so let’s first deal with what we have,
> and think about the OpenMP mangling scheme only once there is one
> available.
Again, I do not want to get rid of what we have or even replace it if it
is not necessary. I try to determine if the scheme is sufficient for 5.0
and later extensions that are undoubtedly coming.

Another example is:
If we would have target and host code in the same IR module, could we
distinguish a vector version for a target from one for the host? If not,
that would be a problem we should not ignore.

> >>>> I hear the argument on having some common ground here. In
fact, there
> >>>> is already common ground between the x86 and aarch64
backend, who have
> >>>> based their respective Vector Function ABI specifications
on OpenMP.
> >>>> 
> >>>> In fact, the mangled name grammar can be summarized as
follows:
> >>>> 
> >>>> _ZGV<isa><masking><VLEN><parameter
type>_<scalar name>
> >>>> 
> >>>> Across vector extensions the only <token> that will
differ is the
> >>>> <isa> token.
> >>>> 
> >>>> This might lead people to think that we could drop the
_ZGV<isa>
> >>>> prefix and consider the
<masking><VLEN><parameter type>_<scalar name>
> >>>> part as a sort of unofficial OpenMP mangling scheme: in
fact, the
> >>>> signature of an “unmasked 2-lane vector vector of `sin`”
will always
> >>>> be `<2 x double>(2 x double>).
> >>>> 
> >>>> The problem with this choice is the number of vector
version available
> >>>> for a target is not unique.
> >>> 
> >>> For me, this simply means this mangling scheme is not
sufficient.
> >>> 
> >> 
> >> Can you explain more why you think the mangling scheme is not
> >> sufficient? The mangling scheme is shaped to provide all the
> >> information that the OpenMP directive describes.
> > 
> > I don't know if it is insufficient but I though you hinted towards
that.
> 
> I didn’t mean that, the tokens in the vector function ABI mangled schemes
are sufficient.
> 
> > If we can handle/decode everything we need for declare variants then I
> > do not object at all. If not, we require respective extension such
that
> > we can. The result should be a superset of the current SIMD encoding
and
> > compatible with the current one.
> > 
> > 
> 
> We can handle/decode everything for a SIMD context. :)
What about a context that combines SIMD and something else, e.g.,
parallel?

> >> The fact that x86 and aarch64 realize such information in
different
> >> way (multiple signature/vector extensions) is something that
cannot be
> >> avoided, because it is related to architectural aspects that are
> >> specific to the vector extension and transparent to the OpenMP
> >> standard.
> > 
> > I don't think that is a problem (that's why I "failed to
see the
> > problem" in the comment below). I look at it this way: If
#declare simd,
> > or similar, results in N variants, it should at the end of the day not
> > be different from declaring these N variants explicitly with the
> > respective declare variant match clause.
> > 
> 
> That’s not the case. #declare simd should create all the versions that are
optimal for the target. We carefully thoght about that when writing the vector
function ABI. Most of the constrains derive by the fact that each target has a
specific register size.
> 
> Example:
> 
> #pragma omp declare simd
> Float foo(float);
> 
> X86 -> 8 version {2, 4, 8, 16 lanes} x {masking, no masking}, see
https://godbolt.org/z/m1BUVt
> Arm NEON: -> 4 versions {2, 4 lanes} x {masking, no masking }
> Arm SVE: -> 1 version
> 
> Therefore, the outcome of declare simd is not target independent. Your
> expectation are met only inside one target.
The above outcome is fine. What version exist is target dependent. The
encoding and selection is the interesting part I guess.

> >>>> In particular, the following declaration generates
multiple vector
> >>>> versions, depending on the target:
> >>>> 
> >>>> #pragma omp declare simd simdlen(2) notinbranch
> >>>> double foo(double) {…};
> >>>> 
> >>>> On x86, this generates at least 4 symbols (one for SSE,
one for AVX,
> >>>> one for AVX2, and one for AVX512:
https://godbolt.org/z/TLYXPi)
> >>>> 
> >>>> On aarch64, the same declaration generates a unique
symbol, as
> >>>> specified in the Vector Function ABI.
> >>> 
> >>> I fail to see the problem. We generate X symbols for X
different
> >>> contexts. Once we get to the point where we vectorize, we
determine
> >>> which context fits best and choose the corresponding symbol
version.
> >>> 
> >> 
> >> Yes, this is exactly what we need to do, under the constrains that
> >> the rules for  generating "X symbols for X different
contexts” are
> >> decided by the Vector Function ABI of the target.
> > 
> > Sounds good. The vector ABI is used to determine what contexts exists
> > and what symbols should be created. I would assume the encoding should
> > be the same as if we specified the versions (/contexts) ourselves via
> > #declare variant.
> > 
> 
> Oh yes, vector functions listed in a declare variant should obey the
> vector function ABI rules (other than the function name).
> 
> > 
> >>> Maybe my view is to naive here, please feel free to correct
me.
> >>> 
> >>> 
> >>>> This means that the attribute (or metadata) that carries
the
> >>>> information on the available vector version needs to deal
also with
> >>>> things that are not usually visible at IR level, but that
might still
> >>>> need to be provided to be able to decide which particular
instruction
> >>>> set/ vector extension needs to be targeted.
> >>> 
> >>> The symbol names should carry all the information we need. If
they do
> >>> not, we need to improve the mangling scheme such that they do.
There is
> >>> no attributes/metadata we could use at library boundaries.
> >>> 
> >> Hum, I am not sure what you mean by "There is no
attributes/metadata
> >> we could use at library boundaries."
> > 
> > (This seems to be part of the misunderstanding, I leave my comment
here
> > anyway:)
> > 
> > The simd-related stuff works because it is a uniform mangling scheme
> > used by all compilers. Take the situation below in which I think we
want
> > to call foo_CTX in the library. If so, we need a name for it.
> > 
> 
> In the situation below, the mangled name is going to be the same for
> both compilers, as long as they adhere to the vector function ABI.
Assuming CTX is only a SIMD context. What if it is more than that?

> > a.c:  // Compiled by gcc into a library
> > #omp declare variant (foo) match(CTX)
> > void foo_CTX(...) {...}
> > 
> > b.c:  // Compiled by clang linked against the library above.
> > #omp declare variant (foo) match(CTX)
> > void foo_CTX(...);
> > 
> > void bar(...) {
> >  #pragma omp CTX
> >  foo();   // <- What function (symbol) do we call if a.c was
compiled
> >           //    by gcc and b.c with clang?
> > }
> > 
> 
> Please notice that `declare variant` needs to be attached to the scalar
function, not the vector one.
> 
> ```
> #pragma omp declare variant(foo_CTX) match (context=simd…
> double foo (double) {…}
> 
> Vector_double_ty foo_CTX(vector_double_ty) {…}
> ```
> 
> In vectorizing foo in bar, the compiler will not care where foo_CTX
> would come from (of course, as long as the scalar+declare variant
> declarations are visible).
OK. What happens if you merge two modules, one with foo and declare
variants that resulted in the "vector-version" attribute and one where
foo is just a declaration? (One may not originate in a C/C++ declaration
though as that would potentially be undefined in OpenMP*).

* OpenMP 5.0 Restriction: If the function has any declarations, then the
  declare simd construct for any declaration that has one must be
  equivalent to the one specified for the definition.  Otherwise, the
  result is unspecified.
(I'm not sure if that is supposed to be true across compilation units or
 not, I would have guessed it is not though.)

And what happens if CTX is more complex, e.g., target + SIMD, parallel +
SIMD, in which case we have to take the additional non-SIMD part into
account to select the correct version. Since the SIMD version is
selected late, we might have to remember the original syntactic scope of
the call to avoid picking the wrong version. (I would argue picking the
version late (after inlining), even if it is not the same as early
picking would have resulted in, is actually good but I think the
standard would require syntactic context evaluation.)
> >> In our downstream compiler (Arm compiler for HPC, based on LLVM),
we
> >> use `declare simd` to provide vector math functions via custom
header
> >> file. It works brilliantly, if not for specific aspects that would
be
> >> perfectly covered by the `declare variant`, which might be one of
the
> >> reason why the OpenMP committee decided to introduce `declare
> >> variant`.
> > 
> > But you (assume that you) control the mangling scheme across the
entire
> > infrastructure. Given that the simd mangling is de-facto standardized,
> > that works.
> > 
> > Side note:
> > Declare variant, as of 5.0, is not flexible enough for a sensible
> > inclusion of target specific headers. That will change in 5.1.
> > 
> 
> Could you point me at the discussion in 5.1 on this specific aspect?
A begin/end version of declare variant, described in more detail above:
http://trac.openmp.org/trac/OpenMP/ticket/940

> >> If your concerns is that by adding an attribute that somehow
represent
> >> something that is available in an external library is not enough
to
> >> guarantee that that symbol is available in the library… not even C
> >> code can guarantee that? If the linker is not pointing to the
right
> >> library, there is nothing that can prevent it to fail if the
symbol is
> >> not present? 
> > 
> > I don't follow the example you describe. I don't want to
change anything
> > in how symbols are looked up or what happens if they are missing.
> > 
> > 
> 
> I don’t want to change that too :). I think we are misunderstanding
> each other here...
Probably.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190531/53690f3e/attachment.sig>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - May 2019 - [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

[llvm-dev] [cfe-dev] [RFC] Expose user provided vector function for auto-vectorization.

Maybe Matching Threads