Johannes Doerfert via llvm-dev
2021-Mar-20 00:10 UTC
[llvm-dev] [RFC] The `implements` attribute, or how to swap functions statically but late
The basic idea is to provide a way to replace a callee statically *inside* the LLVM-core pipeline [0]. The initial use case was the missing translation of certain (math) intrinsics by the NVPTX backend, however, we believe there are other use cases that would benefit from a generic model, one is described below as well as the benefits over alternatives. --- Use cases 1a) NVPTX + fast-math + math functions The original use case was discussed here [1] and the problem is shown here [2]. In a nutshell, `llvm.sin` (and friends) are not supported by the backend and it will yell at you. This is unfortunate but could arguably be resolved in a lot of different ways (below). Note that there is not necessarily a 1:1 mapping for all intrinsics and we might need to write a wrapper which would probably be in clang/lib/Header with other CUDA wrapper code. Considered solutions we did not pick: a) Teach the backend explicitly about the mapping `llvm.sin` -> `__nv_sin`. b) Teach clang not to emit the intrinsics in the first place. c) Use an existing overload "hack", e.g., __asm__. Let's discuss those first. a) Works, is very limited to this use case, the mapping is not with the declarations and definitions which makes the maintenance less appealing. b) Seems easy enough, we loose all the benefits of intrinsics though, more on that later. c) Right now, IR isn't happy with defining intrinsics so there is little we could do better than b) here I think, short of allowing intrinsic definitions (which is an option). 1b) User can overwrite the implementation of any intrinsic, regardless if the backend supports it or not. They still get the intrinsic benefits during the compilation. This is not limited to NVPTX and math intrinsics but also opens the door for cross platform tooling, e.g., Tool XYZ will emit intrinsics our backed doesn't support but we can map them to implementations for our architecture. 1c) You can prototype/test different lowerings for functions and intrinsics easily by linking in a bitcode file (which can be compiled form C*). 2) Function specifications and implementations I would like to embed a function specification with the implementation, e.g. ``` void vec_add(double *A, double *B, int N) { for (int i = 0; i < N; ++i) A[i] += B[i]; } __attribute__((implements("vec_add"))) void vec_add_impl(double *A, double *B, int N) { #pragma omp parallel for for (int i = 0; i < N; ++i) A[i] += B[i]; } ``` I think that would open up cool possibilities: A) Replace only the first N specifications with implementations, allows to improve debugging. B) Use the specification for the first part of the optimization pipeline so you can derive facts the implementation "hides". E.g., `llvm.sin` calls could be annotated as `readnone` while the `__nv_sin` call contains inline assembly and acts as a optimization barrier. Or, in the above example, it is obvious from the "specification" that `B` is readonly but the implementation could arbitrarily hide this, e.g., the implementation might be a runtime call. C) Program verification, test synthesis, etc. would be possible on a new level in a defined way. By keeping implementation and specification close together and written in the same language we might get somewhere in this area. --- Thoughts I initially thought about the reverse attribute, so `vec_add` in the above example would have `__attribute__((specifies(vec_add_impl))` for the same reasons (A-C). The problem is that I'm not sure how to attach anything to an intrinsic. Some `__asm__` trickery might help but it's unclear that this is better than the `implements` version. Maybe we want both for convenience with the `implements` version being able to handle intrinsics. ~ Johannes [0] https://reviews.llvm.org/D98516 [1]https://lists.llvm.org/pipermail/llvm-dev/2021-March/149117.html [2] https://godbolt.org/z/PxsEWs -- ─────────────────── ∽ Johannes (he/his)
William Moses via llvm-dev
2021-Mar-20 02:30 UTC
[llvm-dev] [RFC] The `implements` attribute, or how to swap functions statically but late
Super in favor of this, both for ease of GPU codegen, potential BLAS optimizations, as well as upstream users like Julia potentially being able to take advantage of LLVM's optimizations for sin/etc. Given the potential for multiple implementations to exist, is it worthwhile adding some notion of a priority or perhaps a suggested target? On Fri, Mar 19, 2021 at 8:10 PM Johannes Doerfert < johannesdoerfert at gmail.com> wrote:> The basic idea is to provide a way to replace a callee statically > *inside* the LLVM-core > pipeline [0]. The initial use case was the missing translation of > certain (math) intrinsics > by the NVPTX backend, however, we believe there are other use cases that > would benefit from > a generic model, one is described below as well as the benefits over > alternatives. > > --- > > Use cases > > 1a) NVPTX + fast-math + math functions > > The original use case was discussed here [1] and the problem is shown > here [2]. > In a nutshell, `llvm.sin` (and friends) are not supported by the backend > and it > will yell at you. This is unfortunate but could arguably be resolved in > a lot of > different ways (below). Note that there is not necessarily a 1:1 mapping > for all > intrinsics and we might need to write a wrapper which would probably be in > clang/lib/Header with other CUDA wrapper code. > > Considered solutions we did not pick: > a) Teach the backend explicitly about the mapping `llvm.sin` -> `__nv_sin`. > b) Teach clang not to emit the intrinsics in the first place. > c) Use an existing overload "hack", e.g., __asm__. > > Let's discuss those first. > a) Works, is very limited to this use case, the mapping is not with the > declarations > and definitions which makes the maintenance less appealing. > b) Seems easy enough, we loose all the benefits of intrinsics though, > more on that > later. > c) Right now, IR isn't happy with defining intrinsics so there is little > we could do > better than b) here I think, short of allowing intrinsic definitions > (which is an > option). > > 1b) User can overwrite the implementation of any intrinsic, regardless > if the backend > supports it or not. They still get the intrinsic benefits during > the compilation. > This is not limited to NVPTX and math intrinsics but also opens the > door for cross > platform tooling, e.g., Tool XYZ will emit intrinsics our backed > doesn't support > but we can map them to implementations for our architecture. > > 1c) You can prototype/test different lowerings for functions and > intrinsics easily by > linking in a bitcode file (which can be compiled form C*). > > 2) Function specifications and implementations > > I would like to embed a function specification with the implementation, > e.g. > ``` > void vec_add(double *A, double *B, int N) { > for (int i = 0; i < N; ++i) > A[i] += B[i]; > } > __attribute__((implements("vec_add"))) > void vec_add_impl(double *A, double *B, int N) { > #pragma omp parallel for > for (int i = 0; i < N; ++i) > A[i] += B[i]; > } > ``` > > I think that would open up cool possibilities: > A) Replace only the first N specifications with implementations, allows > to improve debugging. > B) Use the specification for the first part of the optimization pipeline > so you > can derive facts the implementation "hides". E.g., `llvm.sin` calls > could be > annotated as `readnone` while the `__nv_sin` call contains inline > assembly and > acts as a optimization barrier. Or, in the above example, it is > obvious from the > "specification" that `B` is readonly but the implementation could > arbitrarily hide this, > e.g., the implementation might be a runtime call. > C) Program verification, test synthesis, etc. would be possible on a new > level in a defined way. > By keeping implementation and specification close together and > written in the same > language we might get somewhere in this area. > > --- > > Thoughts > > I initially thought about the reverse attribute, so `vec_add` in the > above example would have > `__attribute__((specifies(vec_add_impl))` for the same reasons (A-C). > The problem is that I'm > not sure how to attach anything to an intrinsic. Some `__asm__` trickery > might help but it's > unclear that this is better than the `implements` version. Maybe we want > both for convenience > with the `implements` version being able to handle intrinsics. > > ~ Johannes > > > [0] https://reviews.llvm.org/D98516 > [1]https://lists.llvm.org/pipermail/llvm-dev/2021-March/149117.html > [2] https://godbolt.org/z/PxsEWs > > > -- > ─────────────────── > ∽ Johannes (he/his) > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210319/c12b4802/attachment.html>