Francesco Petrogalli via llvm-dev
2016-Nov-30 11:46 UTC
[llvm-dev] Enable "#pragma omp declare simd" in the LoopVectorizer
Dear all, I have just created a couple of differential reviews to enable the vectorisation of loops that have function calls to routines marked with “#pragma omp declare simd”. They can be (re)viewed here: * https://reviews.llvm.org/D27249 * https://reviews.llvm.org/D27250 The current implementation allows the loop vectorizer to generate vector code for source file as: #pragma omp declare simd double f(double x); void aaa(double *x, double *y, int N) { for (int i = 0; i < N; ++i) { x[i] = f(y[i]); } } by invoking clang with arguments: $> clang -fopenmp -c -O3 file.c […] Such functionality should provide a nice interface for vector libraries developers that can be used to inform the loop vectorizer of the availability of an external library with the vector implementation of the scalar functions in the loops. For this, all is needed to do is to mark with “#pragma omp declare simd” the function declaration in the header file of the library and generate the associated symbols in the object file of the library according to the name scheme of the vector ABI (see notes below). I am interested in any feedback/suggestion/review the community might have regarding this behaviour. Below you find a description of the implementation and some notes. Thanks, Francesco ----------- The functionality is implemented as follow: 1. Clang CodeGen generates a set of global external variables for each of the function declarations marked with the OpenMP pragma. Each of such globals are named according a mangling that is generated by llvm::TargetLibraryInfoImpl (TLII), and holds the vector signature of the associated vector function. (See examples in the tests of the clang patch. Each scalar function can generate multiple vector functions depending on the clauses of the declare simd directives) 2. When clang created the TLII, it processes the llvm::Module and finds out which of the globals of the module have the correct mangling and type so that they be added to the TLII as a list of vector function that can be associated to the original scalar one. 3. The LoopVectorizer looks for the available vector functions through the TLII not by scalar name and vectorisation factor but by scalar name and vector function signature, thus enabling the vectorizer to be able to distinguish a "vector vpow1(vector x, vector y)” from a “vector vpow2(vector x, scalar y)”. (The second one corresponds to a “declare simd uniform(y)” for a “scalar pow(scalar x, scalar y)” declaration). (Notice that the changes in the loop vectorizer are minimal.) Notes: 1. To enable SIMD only for OpenMP, leaving all the multithread/target behaviour behind, we should enable this also with a new option: -fopenmp-simd 2. The AArch64 vector ABI in the code is essentially the same as for the Intel one (apart from the prefix and the masking argument), and it is based on the clauses associated to “declare simd” in OpenMP 4.0. For OpenMP4.5, the parameters section of the mangled name should be updated. This update will not change the vectorizer behaviour as all the vectorizer needs to detect a vectorizable function is the original scalar name and a compatible vector function signature. Of course, any changes/updates in the ABI will have to be reflected in the symbols of the binary file of the library. 3. Whistle this is working only for function declaration, the same functionality can be used when (if) clang will implement the declare simd OpenMP pragma for function definitions. 4. I have enabled this for any loop that invokes the scalar function call, not just for those annotated with “#pragma omp for simd”. I don’t have any preference here, but at the same time I don’t see any reason why this shouldn’t be enabled by default for non annotated loops. Let me know if you disagree, I’d happily change the functionality if there are sound reasons behind that.