Francesco Petrogalli via llvm-dev
2016-Nov-30  11:46 UTC
[llvm-dev] Enable "#pragma omp declare simd" in the LoopVectorizer
Dear all,
I have just created a couple of differential reviews to enable the
vectorisation of loops that have function calls to routines marked with
“#pragma omp declare simd”.
They can be (re)viewed here:
* https://reviews.llvm.org/D27249
	
* https://reviews.llvm.org/D27250
The current implementation allows the loop vectorizer to generate vector
code for source file as:
  #pragma omp declare simd
  double f(double x);
  void aaa(double *x, double *y, int N) {
    for (int i = 0; i < N; ++i) {
    x[i] = f(y[i]);
    }
  }
by invoking clang with arguments:
  $> clang -fopenmp -c -O3 file.c […]
Such functionality should provide a nice interface for vector libraries
developers that can be used to inform the loop vectorizer of the
availability of an external library with the vector implementation of the
scalar functions in the loops. For this, all is needed to do is to mark
with “#pragma omp declare simd” the function declaration in the header
file of the library and generate the associated symbols in the object file
of the library according to the name scheme of the vector ABI (see notes
below).
I am interested in any feedback/suggestion/review the community might have
regarding this behaviour.
Below you find a description of the implementation and some notes.
Thanks,
Francesco 
-----------
The functionality is implemented as follow:
1. Clang CodeGen generates a set of global external variables for each of
the function declarations marked with the OpenMP pragma. Each of such
globals are named according a mangling that is generated by
llvm::TargetLibraryInfoImpl (TLII), and holds the vector signature of the
associated vector function. (See examples in the tests of the clang patch.
Each scalar function can generate multiple vector functions depending on
the clauses of the declare simd directives)
2. When clang created the TLII, it processes the llvm::Module and finds
out which of the globals of the module have the correct mangling and type
so that they be added to the TLII as a list of vector function that can be
associated to the original scalar one.
3. The LoopVectorizer looks for the available vector functions through the
TLII not by scalar name and vectorisation factor but by scalar name and
vector function signature, thus enabling the vectorizer to be able to
distinguish a "vector vpow1(vector x, vector y)” from a “vector
vpow2(vector x, scalar y)”. (The second one corresponds to a “declare simd
uniform(y)” for a “scalar pow(scalar x, scalar y)” declaration). (Notice
that the changes in the loop vectorizer are minimal.)
Notes:
1. To enable SIMD only for OpenMP, leaving all the multithread/target
behaviour behind, we should enable this also with a new option:
-fopenmp-simd
2. The AArch64 vector ABI in the code is essentially the same as for the
Intel one (apart from the prefix and the masking argument), and it is
based on the clauses associated to “declare simd” in OpenMP 4.0. For
OpenMP4.5, the parameters section of the mangled name should be updated.
This update will not change the vectorizer behaviour as all the vectorizer
needs to detect a vectorizable function is the original scalar name and a
compatible vector function signature. Of course, any changes/updates in
the ABI will have to be reflected in the symbols of the binary file of the
library.
3. Whistle this is working only for function declaration, the same
functionality can be used when (if) clang will implement the declare simd
OpenMP pragma for function definitions.
4. I have enabled this for any loop that invokes the scalar function call,
not just for those annotated with “#pragma omp for simd”. I don’t have any
preference here, but at the same time I don’t see any reason why this
shouldn’t be enabled by default for non annotated loops. Let me know if
you disagree, I’d happily change the functionality if there are sound
reasons behind that.