Jeroen Dobbelaere via llvm-dev
2019-Oct-05 09:25 UTC
[llvm-dev] [repost] RFC: Full 'restrict' support in LLVM
This is a repost of "RFC: Full 'restrict' support in LLVM" of March 19, 2019 https://lists.llvm.org/pipermail/llvm-dev/2019-March/131127.html Hopefully the formatting now looks better. Greetings, Jeroen Dobbelaere --------------- Hi all, during the last US LLVM conference (October last year), I had two 'BoF restrict' sessions. During these sessions, I discussed with a number of persons our view and mechanism on fixing the issues with llvm's restrict implementation and Hal Finkel's local restrict patches. One outcome of these discussions was a request to put all of this together in an RFC and present it to the larger llvm community so that a broader discussion can be started. With thanks to Wouter, Bruno, Dirk, Gert, Troy, David, Johannes for helping with the initial review of this rfc. Greetings, Jeroen Dobbelaere --- 2019-03-19 jeroen.dobbelaere at synopsys.com RFC: Full 'restrict' support in LLVM == Introduction = LLVM only has basic support for restrict on function arguments. All other restrict usages are ignored. This document proposes a mechanism to get full support for restrict in LLVM. == 'restrict' = C99 introduces the 'restrict' keyword for pointer operands [1, 2]. It helps the programmer to specify that memory accesses will not interfere with each other. Based on such information, the compiler can decide to do better optimizations and transformations, as it can assume that certain memory accesses will never alias. This can result in dramatic speed improvements. Restrict pointers can occur as function arguments, local variables (local restrict), struct members (restrict member pointers), global variables, ... == Current status (LLVM-8) = LLVM mostly ignores the 'restrict' keyword. It only has support for restrict pointer arguments of a function. Even there, the C restrict keyword is mapped onto the 'noalias' argument attribute, which is in some way stronger than 'restrict' [5,9]. While inlining, LLVM tries to track the 'noalias' argument information through the '!noalias' and '!alias.scope' metadata. This easily results in transformations that are not correct according to the C restrict rules, as this metadata representation is not sufficient to retain all information that is needed to represent all cases [6]. Another example that demonstrates this problem is shown in [7]: #define SIZE 1024 // Testcase showing that llvm-ir is not able to // differentiate between following two cases. // (restrict outside ('0') or inside ('1') the loop) // Toggle '0' to '1' and see that the metadata // (!noalias, !alias.scope) remains similar. #if 0 static __attribute__((always_inline)) void add_element(int* restrict pDest, int* restrict pA, int* restrict pB) { *pDest = *pA+*pB; } void test_add_array01(int* pDest, int* pA, int* pB) { for (int i=0; i<SIZE; ++i) { // restrict only holds per iteration, not across iterations add_element(pDest++, pA++, pB++); } } #else static __attribute__((always_inline)) void add_array(int* restrict pDest, int* restrict pA, int* restrict pB) { for (int i=0; i<SIZE; ++i) { *pDest++=*pA++ + *pB++; // restrict holds across iterations } } void test_add_array02(int* pDest, int* pA, int* pB) { // restrict info is similar to 'test_add_array01', no differentiation possible add_array(pDest, pA, pB); } #endif == 'local restrict' patches = Around 2014/2015, Hal Finkel proposed a number of patches, fixing some of the observed issues with restrict, while also adding support for local restrict [4]. Hal had the following goals: * Fix the issue with inlined functions, where restrictness inside vs outside the loop body was not decidable * Enable support for local restrict Some of the initial patches have been added to LLVM in the meantime. But, a number of main changes are still pending, due to slow review progress, but also due to a number of issues that popped up in the meantime. How it works: * Whenever a value is assigned to a restrict pointer, a 'llvm.noalias %p, metadata !x' intrinsic is put between the assignment and the value. '!x' is a list of a single scope id that represents the specific restrict pointer. * Whenever a memory instruction is produced, the '!noalias !y' metadata scope is added. '!y' is a list of all scoped id's that are active at the moment of the memory instruction. * The combination of the intrinsic and the !noalias scope allows to check if two memory accesses can/cannot alias according to the restrict rules. Observed issues: * Performance: ** The intrinsic blocks certain optimizations. ** In some combinations of local restrict and inlining, the restrictness is lost, resulting in slow code. * Correctness: ** Some optimizations (like LSR) eliminate the intrinsics, possibly resulting in wrong code. ** Loop unrolling can result in wrong code. [6] == A way out = I had the following goals when trying to fix the observed issues with the local restrict implementation: * Fix the known issues wrt to the local restrict implementation. Both correctness (must) and performance (as much as possible) * Make it safe(r) against the unexpected effect of (new) optimization passes: prefer slow code over incorrect code * Make it easy to add support for restrict member pointers. * Make it feasible to implement the 'alias_set' proposal (n4150) [3] High level strategy: * Keep most of the restrictness annotations out the way of the normal program flow, except when it is needed for correctness. * Follow the restrict specification [2] to the letter. More precisely: ** Track the declaration of a 'object P'. ** Each distinct restrict pointer 'object P' points to its own set of objects (within the valid scope). As such, restrictness is introduced when using a restrict pointer (not when assigning it). Example: void foo(int* restrict rP, int* restrict * pRP) { int* p2=rP; // 'using rP' introduces restrictness *p2=42; // dereferencing a pointer based on a restrict pointer rP=&someInt; // assignment does not introduce (additional) restrictness int* p3=*pRP; // 'using *pRP' introduces restrictness *p3=43; // dereferencing a pointer based on a restrict pointer } ** Track the 'based on' expressions in an accurate way. NOTE: Part of those ideas have been discussed in small group during 2 BOF sessions during the 2018 US LLVM Conference. The following sections explain how to come to a full set of intrinsics that should support restrict: (A) The Basics In principle, two intrinsics should be sufficient: * 'llvm.noalias.decl %p.alloc, metadata !Scope' intrinsic: ** indicates at what place in the program a restrict pointer has been declared. ** %p.alloc: represents the alloca associated with the declaration. ** !Scope: metadata that represents a unique scope, associated with this specific declaration. ** NOTE: the llvm.noalias.decl intrinsic can normally not be moved outside loops. Its purpose is to identify the freedom that a restrict pointer has wrt to the loop bodies and other blocks. As such it helps to resolve problems as seen in [6] and [7]. * 'llvm.noalias %p, %p.decl, %p.addr' intrinsic: ** created when a restrict pointer object P is _read_. It is used to track the 'based-on' dependency. ** %p: is the pointer value that was read. This is also the value that the intrinsic returns. ** %p.decl: refers to the 'llvm.noalias.decl' that is associated with the restrict pointer. ** %p.addr represents the address of 'object P' ** NOTE: sometimes the declaration is not known upfront, in that case '%p.decl' is 'null'. After inlining and/or optimizations, it can be possible to infer the llvm.noalias.decl (based on %p.addr). Example A: int foo(int* pA, int* pB) { int * restrict rpA=pA; *rpA=42; *pB=99; return *rpA; } (pseudo LLVM-IR) define @foo(i32* pA, i32* pB) { %rpA.address = alloca i32* %rpA.decl = llvm.noalias.decl %rpA.address, !metadata !10 ; declaration of a restrict pointer store i32* %pA, %rpA.address, !noalias !10 %rpA = load i32* %rpA.address, !noalias !10 %rpA.1 = i32* llvm.noalias %rpA, %rpA.decl, %rpA.address ; reading of a restrict pointer store i32 42, %rpA.1, !noalias !10 store i32 99, %pB, !noalias !10 %1 = load i32 %rpA.1, !noalias !10 ret i32 %1 } NOTES: * this is slightly different from Hal Finkel's approach, where the 'llvm.noalias' intrinsics is used when storing a value in the pointer. * this representation suffers from possible blocking of optimizations. But, when 'llvm.noalias' is made too transparent, it can result in wrong code. Summary: * llvm.noalias.decl %p.alloc, metadata !Scope * llvm.noalias %p, %p.decl, %p.addr (B) The side channel We introduce a 'side channel' for load/store instructions. The idea is that we use the 'pointer operand' for normal pointer computations, and the 'side channel operand' for restrict related dependencies. Optimizations (like LSR) can modify the 'pointer operand' as they see fit. As long as the 'side channel operand' is not touched, we are still able to deduce the restrict related information. When an optimization introduces a load/store without a side channel operand, we fall back to the fail-safe 'worst case'. NOTE: the name 'side channel' comes from the need to track the restrict information for load/store instructions 'on the side'. In a way that is hidden for normal optimizations. Although the actual pointer computations can be removed in the side channel, it can still contain PHI nodes and 'select' instructions. As it is hard to track the usage of a pointer in clang, we will work on llvm-IR level. Because of that the annotations exist in two states: * Before 'noalias propagation' This state is produced by clang. It is similar to (A). The main difference is that the 'llvm.noalias' intrinsic becomes completely opaque for optimizations. * After 'noalias propagation' A 'noalias propagation' pass is introduced that converts the llvm-IR code to the second state: ** llvm.noalias intrinsics are converted into 'llvm.side_noalias' intrinsics ** their usage is removed from the main pointer computations, and moved to the side channel of load/store instructions ** When a pointer depending on a llvm.noalias intrinsic is passed as an argument, returned from a function or stored into memory, we introduce a 'llvm.noalias.arg.guard'. This combines the original pointer computation with the side channel information. After inlining, it is also used to propagate the noalias information to the load/store instructions. So, we now have two extra intrinsics: * 'llvm.side.noalias %side.p, %p.decl, %p.addr' intrinsic: ** provides restrict information to a side channel. ** %side.p: tracks the side channel associated with the pointer value that was read. ** %p.decl: refers to the 'llvm.noalias.decl' that is associated with the restrict pointer. ** %p.addr: represents the address of 'object P'. * 'llvm.noalias.arg.guard %p, %side.p' intrinsic: ** combines pointer and restrict information when a restrict pointer value escapes. It is normally used for function arguments, returns, or stores to memory. ** %p tracks the pointer computation ** %side.p tracks the side channel associated with the pointer value Example B: After noalias propagation, example A becomes: (pseudo LLVM-IR) define @foo(i32* pA, i32* pB) { %rpA.address = alloca i32* %rpA.decl = llvm.noalias.decl %rpA.address, !metadata !10 ; declaration of a restrict pointer store i32* %pA, %rpA.address, noalias_sidechannel undef, !noalias !10 %rpA = load i32* %rpA.address, noalias_sidechannel undef, !noalias !10 %side.rpA.1 = i32* llvm.side.noalias %rpA, %rpA.decl, %rpA.address ; reading of a restrict pointer store i32 42, %rpA, noalias_sidechannel %side.rpA.1, !noalias !10 store i32 99, %pB, noalias_sidechannel undef, !noalias !10 %1 = load i32 %rpA, noalias_sidechannel %side.rpA.1, !noalias !10 ret i32 %1 } Summary: * llvm.noalias.decl %p.alloc, metadata !Scope * llvm.noalias %p, %p.decl, %p.addr * llvm.side.noalias %side.p, %p.decl, %p.addr * llvm.noalias.arg.guard %p, %side.p (C) Optimizations that eliminate the location of 'object P' from the stack When SROA eliminates a local variable, we do not have an address for 'object P' anymore (the alloca is removed and %p.addr becomes 'null'). At that moment we must depend on the '!Scope' metadata to differentiate restrict objects. For convenience, we also add this information to the llvm.noalias and llvm.side.noalias intrinsics. It is also possible that a single variable declaration contains multiple restrict pointers (think of a struct containing multiple restrict pointers, or an array of restrict pointers). Instead of introducing new scopes when this happens, we introduce an extra object ID (objId) parameter for llvm.noalias.decl, llvm.noalias and llvm.side.noalias. This can be thought of as the 'offset in the variable'. Summary: * llvm.noalias.decl %p.alloc, i32 objId, metadata !Scope * llvm.noalias %p, %p.decl, %p.addr, i32 objId, metadata !Scope * llvm.side.noalias %side.p, %p.decl, %p.addr, i32 objId, metadata !Scope * llvm.noalias.arg.guard %p, %side.p For alias analysis, this means that two llvm.side.noalias intrinsics represent a different 'object P0', 'object P1', if: * %p0.addr and $p1.addr are different * or, objId0 and objId1 are different * or, !Scope0 and !Scope1 are different (D) Optimizing a restrict pointer pointing to a restrict pointer Example: int * restrict * restrict ppA = ...; int * restrict * restrict ppB = ...; **ppA=42; **ppB=99; return **ppA; // according to C99, 6.7.3.1 paragraph 4, **ppA and **ppB are not aliasing In order to allow this optimization, we also need to track the !noalias scope where the 'llvm.noalias' intrinsic was introduced. The %p.addr parameter in the 'llvm.side.noalias' will also get a noalias side channel, through the %side.p.addr argument. In short, we treat the 'llvm.noalias' and 'llvm.side.noalias' intrinsics as if they are a memory operation. Summary: * llvm.noalias.decl %p.alloc, i32 objId, metadata !Scope * llvm.noalias %p, %p.decl, %p.addr, i32 objId, metadata !Scope, !noalias !visible_scopes * llvm.side.noalias %side.p, %p.decl, %p.addr, %side.p.addr, i32 objId, metadata !Scope, !noalias !visible_scopes * llvm.noalias.arg.guard %p, %side.p For alias analysis, this means that two llvm.side.noalias intrinsics represent a different 'object P0', 'object P1', if: * %p0.addr and $p1.addr are different * or, objId0 and objId1 are different * or, !Scope0 and !Scope1 are different * or we can prove that (%p0.addr, %side.p0.addr, !visible_scopes0) and (%p1.addr, %side.p1.addr, !visible_scopes1) do not alias for both intrinsics. (as if we treat each of the two 'llvm.side.noalias' as a 'store to %p.addr' and we must prove that the two stores do not alias; also see [8], question 2) (E) 'unknown function' scope When the declaration of a restrict pointer is not visible, C99, 6.7.3.1 paragraph 2, says that the pointer is assumed to start living from 'main'. This case can be handled by the 'unknown function' scope, which is annotated to the function itself. This can be treated as saying: the scope of this restrict pointer starts somewhere outside this function. In such case, the 'llvm.noalias' and 'llvm.side.noalias' will not be associated with a 'llvm.noalias.decl'. It is possible that after inlining, the scopes can be refined to a declaration which became visible. For convenience, each function can have its own unknown scope specified by a 'noalias !UnknownScope' metadata attribute on the function itself. (F) Aggregate copies Restrictness is introduced by 'reading a restrict pointer'. It is not always possible to add the necessary 'llvm.noalias' annotation when this is done. An aggregate containing one or more restrict pointers can be copied with a single load/store pair or a llvm.memcpy. This makes it hard to track when a restrict pointer is copied over. For this, a final intrinsic is introduced: 'llvm.noalias.copy.guard': * 'llvm.noalias.copy.guard %p.addr, %p.decl, metadata !indices, metadata !Scope' intrinsic ** Guards a %p.addr object that is copied as a single aggregate or llvm.memcpy ** %p.addr: the object to guard ** %p.decl: (when available), the llvm.noalias.decl associated with the object ** !indices: The !indices refers to a metadata list. Each element of the list refers to a set of indices where a restrict pointer is located. ** !Scope: the declaration scope of %p.decl This information allows SROA to introduce the needed 'llvm.noalias' intrinsics when a struct is split up. Summary: * potential '!noalias !UnknownScope' annotation at function level * llvm.noalias.decl %p.alloc, i32 objId, metadata !Scope * llvm.noalias %p, %p.decl, %p.addr, i32 objId, metadata !Scope, !noalias !visible_scopes * llvm.side.noalias %side.p, %p.decl, %p.addr, %side.p.addr, i32 objId, metadata !Scope, !noalias !visible_scopes * llvm.noalias.arg.guard %p, %side.p * llvm.noalias.copy.guard %p.addr, %p.decl, metadata !indices, metadata !Scope (G) Optimization passes For correctness, some optimization passes must be aware of the noalias intrinsics: inlining [7], unrolling [6], loop rotation, ... Whenever a body is duplicated that contains a llvm.noalias.decl, it must be decided how that duplication must be done. Sometimes new unique scopes must be introduced, sometimes not. Other optimization passes can perform better by knowing about the side channels: when new load/store instructions are introduced, adding side channel information can result in better alias analysis for those load/stores. == c++ alias_set = With this framework in place, it should be easy to extend it to support the 'alias_set' proposal [3]. This can be done by tracking a separate 'universe object', instead of 'object P'. == References =* [1] https://en.wikipedia.org/wiki/Restrict * [2] WG14 N1256: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf (Chapter 6.7.3.1 Formal definition of restrict) * [3] WG21 N4150: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4150.pdf * [4] https://reviews.llvm.org/D9375 Hal Finkel's local restrict patches * [5] https://bugs.llvm.org/show_bug.cgi?id=39240 "clang/llvm looses restrictness, resulting in wrong code" * [6] https://bugs.llvm.org/show_bug.cgi?id=39282 "Loop unrolling incorrectly duplicates noalias metadata" * [7] https://www.godbolt.org/z/cUk6To "testcase showing that llvm-ir is not able to differentiate if restrict is done inside or outside the loop" * [8] DR294: http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_294.htm * [9] WG14 N2250: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2260.pdf Clarifying the restrict Keyword v2