thr3ads.net - llvm dev - [llvm-dev] [repost] RFC: Full 'restrict' support in LLVM [Oct 2019]

If this information is useful, please help other people find it:
Share via:
Jeroen Dobbelaere via llvm-dev
2019-Oct-05 09:25 UTC
[llvm-dev] [repost] RFC: Full 'restrict' support in LLVM

This is a repost of "RFC: Full 'restrict' support in LLVM" of
March 19, 2019
 https://lists.llvm.org/pipermail/llvm-dev/2019-March/131127.html

Hopefully the formatting now looks better.

Greetings,

Jeroen Dobbelaere

---------------
Hi all,

during the last US LLVM conference (October last year), I had two 'BoF
restrict' sessions.
During these sessions, I discussed with a number of persons our view and
mechanism on fixing the issues
with llvm's  restrict implementation and Hal Finkel's local restrict
patches. One outcome of these discussions
was a request to put all of this together in an RFC and present it to the larger
llvm community so that a
broader discussion can be started.

With thanks to Wouter, Bruno, Dirk, Gert, Troy, David, Johannes for helping with
the initial review of this rfc.

Greetings,

Jeroen Dobbelaere
---

                                      2019-03-19  jeroen.dobbelaere at
synopsys.com
                           
RFC: Full 'restrict' support in LLVM

==  Introduction =
LLVM only has basic support for restrict on function arguments. All other
restrict usages are ignored.  This document proposes a mechanism to get full
support for restrict in LLVM.

== 'restrict' =
C99 introduces the 'restrict' keyword for pointer operands [1, 2]. It
helps the
programmer to specify that memory accesses will not interfere with each
other. Based on such information, the compiler can decide to do better
optimizations and transformations, as it can assume that certain memory accesses
will never alias.  This can result in dramatic speed improvements.
Restrict pointers can occur as function arguments, local variables (local
restrict),
struct members (restrict member pointers), global variables, ...

== Current status (LLVM-8) =
LLVM mostly ignores the 'restrict' keyword. It only has support for
restrict
pointer arguments of a function. Even there, the C restrict keyword is mapped
onto the 'noalias' argument attribute, which is in some way stronger
than
'restrict' [5,9].

While inlining, LLVM tries to track the 'noalias' argument information
through
the '!noalias' and '!alias.scope' metadata.  This easily results
in
transformations that are not correct according to the C restrict rules, as this
metadata representation is not sufficient to retain all information that is
needed to represent all cases [6].

Another example that demonstrates this problem is shown in [7]:

  #define SIZE 1024
   
  // Testcase showing that llvm-ir is not able to 
  // differentiate between following two cases.
  // (restrict outside ('0') or inside ('1') the loop)
  
  // Toggle '0' to '1' and see that the metadata 
  // (!noalias, !alias.scope) remains similar.
  #if 0
  static __attribute__((always_inline))
  void add_element(int* restrict pDest, int* restrict pA, int* restrict pB) {
      *pDest = *pA+*pB;
  }
  
  void test_add_array01(int* pDest, int* pA, int* pB) {
    for (int i=0; i<SIZE; ++i) {
       // restrict only holds per iteration, not across iterations
      add_element(pDest++, pA++, pB++);
    }
  }
  #else
  static __attribute__((always_inline))
  void add_array(int* restrict pDest, int* restrict pA, int* restrict pB) {
    for (int i=0; i<SIZE; ++i) {
      *pDest++=*pA++ + *pB++; // restrict holds across iterations
    }
  }
  
  void test_add_array02(int* pDest, int* pA, int* pB) {
    // restrict info is similar to 'test_add_array01', no
differentiation possible
    add_array(pDest, pA, pB);
  }
  #endif
   
  
== 'local restrict' patches =
Around 2014/2015, Hal Finkel proposed a number of patches, fixing some of the
observed issues with restrict, while also adding support for local restrict [4].

Hal had the following goals:
* Fix the issue with inlined functions, where restrictness inside vs outside
  the loop body was not decidable
* Enable support for local restrict

Some of the initial patches have been added to LLVM in the meantime. But, a
number of main changes are still pending, due to slow review progress, but also
due to a number of issues that popped up in the meantime.

How it works:
* Whenever a value is assigned to a restrict pointer, a 'llvm.noalias %p,
  metadata !x' intrinsic is put between the assignment and the value.
  '!x' is a list of a single scope id that represents the specific
restrict pointer.
* Whenever a memory instruction is produced, the '!noalias !y' metadata
scope is
  added.
  '!y' is a list of all scoped id's that are active at the moment of
the memory
  instruction.
* The combination of the intrinsic and the !noalias scope allows to check if two
  memory accesses can/cannot alias according to the restrict rules.

Observed issues:
* Performance:
** The intrinsic blocks certain optimizations.
** In some combinations of local restrict and inlining, the restrictness is
   lost, resulting in slow code.
* Correctness:
** Some optimizations (like LSR) eliminate the intrinsics, possibly resulting
   in wrong code.
** Loop unrolling can result in wrong code. [6]

== A way out =
I had the following goals when trying to fix the observed issues with the local
restrict implementation:
* Fix the known issues wrt to the local restrict implementation. Both
  correctness (must) and performance (as much as possible)
* Make it safe(r) against the unexpected effect of (new) optimization passes:
  prefer slow code over incorrect code
* Make it easy to add support for restrict member pointers.
* Make it feasible to implement the 'alias_set' proposal (n4150) [3]

High level strategy:
* Keep most of the restrictness annotations out the way of the normal program
  flow, except when it is needed for correctness.
* Follow the restrict specification [2] to the letter. More precisely:
** Track the declaration of a 'object P'.
** Each distinct restrict pointer 'object P' points to its own set of
objects
   (within the valid scope).
   As such, restrictness is introduced when using a restrict pointer
   (not when assigning it).

   Example:
     void foo(int* restrict rP, int* restrict * pRP) {
        int* p2=rP;   // 'using rP' introduces restrictness
        *p2=42;       // dereferencing a pointer based on a restrict pointer
     rP=&someInt;  // assignment does not introduce (additional)
restrictness
     int* p3=*pRP; // 'using *pRP' introduces restrictness
     *p3=43;       // dereferencing a pointer based on a restrict pointer
     }
** Track the 'based on' expressions in an accurate way.

NOTE: Part of those ideas have been discussed in small group during 2 BOF
      sessions during the 2018 US LLVM Conference.

The following sections explain how to come to a full set of intrinsics that
should
support restrict:

(A) The Basics

In principle, two intrinsics should be sufficient:
* 'llvm.noalias.decl %p.alloc, metadata !Scope' intrinsic:
** indicates at what place in the program a restrict pointer has been declared.
** %p.alloc: represents the alloca associated with the declaration.
** !Scope: metadata that represents a unique scope, associated with this
specific
   declaration.
** NOTE: the llvm.noalias.decl intrinsic can normally not be moved outside
   loops. Its purpose is to identify the freedom that a restrict pointer has
   wrt to the loop bodies and other blocks.  As such it helps to resolve
   problems as seen in [6] and [7].


* 'llvm.noalias %p, %p.decl, %p.addr' intrinsic:
** created when a restrict pointer object P is _read_.
   It is used to track the 'based-on' dependency.
** %p: is the pointer value that was read. This is also the value that the
   intrinsic returns.
** %p.decl: refers to the 'llvm.noalias.decl' that is associated with
the
   restrict pointer.
** %p.addr represents the address of 'object P'
** NOTE: sometimes the declaration is not known upfront, in that case
'%p.decl'
   is 'null'. After inlining and/or optimizations, it can be possible to
infer
   the llvm.noalias.decl (based on %p.addr).

Example A:
  int foo(int* pA, int* pB) {
    int * restrict rpA=pA;
    *rpA=42;
    *pB=99;
    return *rpA;
  }

  (pseudo LLVM-IR)
  define @foo(i32* pA, i32* pB) {
    %rpA.address = alloca i32*
    %rpA.decl = llvm.noalias.decl %rpA.address, !metadata !10 ; declaration of a
restrict pointer
    store i32* %pA, %rpA.address, !noalias !10
    %rpA = load i32* %rpA.address, !noalias !10
    %rpA.1 = i32* llvm.noalias %rpA, %rpA.decl, %rpA.address ; reading of a
restrict pointer
    store i32 42, %rpA.1, !noalias !10
    store i32 99, %pB, !noalias !10
    %1 = load i32 %rpA.1, !noalias !10
    ret i32 %1
  }

NOTES:
* this is slightly different from Hal Finkel's approach, where the
  'llvm.noalias' intrinsics is used when storing a value in the pointer.
* this representation suffers from possible blocking of optimizations. But, when
  'llvm.noalias' is made too transparent, it can result in wrong code.

Summary:
* llvm.noalias.decl %p.alloc, metadata !Scope
* llvm.noalias %p, %p.decl, %p.addr

(B) The side channel

We introduce a 'side channel' for load/store instructions. The idea is
that we
use the 'pointer operand' for normal pointer computations, and the
'side channel
operand' for restrict related dependencies. Optimizations (like LSR) can
modify the
'pointer operand' as they see fit. As long as the 'side channel
operand' is not
touched, we are still able to deduce the restrict related information. When an
optimization introduces a load/store without a side channel operand, we fall
back to the fail-safe 'worst case'.

NOTE: the name 'side channel' comes from the need to track the restrict
information
  for load/store instructions 'on the side'. In a way that is hidden for
normal
  optimizations.

Although the actual pointer computations can be removed in the side channel, it
can still contain PHI nodes and 'select' instructions.

As it is hard to track the usage of a pointer in clang, we will work on llvm-IR
level. Because of that the annotations exist in two states:

* Before 'noalias propagation'
  This state is produced by clang. It is similar to (A). The main difference is
  that the 'llvm.noalias' intrinsic becomes completely opaque for
optimizations.

* After 'noalias propagation'
  A 'noalias propagation' pass is introduced that converts the llvm-IR
code to
  the second state:
** llvm.noalias intrinsics are converted into 'llvm.side_noalias'
intrinsics
** their usage is removed from the main pointer computations, and moved to the
   side channel of load/store instructions
** When a pointer depending on a llvm.noalias intrinsic is passed as an
   argument, returned from a function or stored into memory, we introduce a
   'llvm.noalias.arg.guard'.  This combines the original pointer
computation
   with the side channel information. After inlining, it is also used to
   propagate the noalias information to the load/store instructions.

So, we now have two extra intrinsics:
* 'llvm.side.noalias %side.p, %p.decl, %p.addr' intrinsic:
** provides restrict information to a side channel.
** %side.p: tracks the side channel associated with the pointer value that was
read.
** %p.decl: refers to the 'llvm.noalias.decl' that is associated with
the
   restrict pointer.
** %p.addr: represents the address of 'object P'.

* 'llvm.noalias.arg.guard %p, %side.p' intrinsic:
** combines pointer and restrict information when a restrict pointer value
   escapes. It is normally used for function arguments, returns, or stores to
   memory.
** %p tracks the pointer computation
** %side.p tracks the side channel associated with the pointer value

Example B: After noalias propagation, example A becomes:
  (pseudo LLVM-IR)
  define @foo(i32* pA, i32* pB) {
    %rpA.address = alloca i32*
    %rpA.decl = llvm.noalias.decl %rpA.address, !metadata !10 ; declaration of a
restrict pointer
    store i32* %pA, %rpA.address, noalias_sidechannel undef, !noalias !10
    %rpA = load i32* %rpA.address, noalias_sidechannel undef, !noalias !10
    %side.rpA.1 = i32* llvm.side.noalias %rpA, %rpA.decl, %rpA.address ; reading
of a restrict pointer
    store i32 42, %rpA, noalias_sidechannel %side.rpA.1, !noalias !10
    store i32 99, %pB, noalias_sidechannel undef, !noalias !10
    %1 = load i32 %rpA, noalias_sidechannel %side.rpA.1, !noalias !10
    ret i32 %1
  }

Summary:
* llvm.noalias.decl %p.alloc, metadata !Scope
* llvm.noalias %p, %p.decl, %p.addr
* llvm.side.noalias %side.p, %p.decl, %p.addr
* llvm.noalias.arg.guard %p, %side.p

(C) Optimizations that eliminate the location of 'object P' from the
stack

When SROA eliminates a local variable, we do not have an address for 'object
P'
anymore (the alloca is removed and %p.addr becomes 'null'). At that
moment we
must depend on the '!Scope' metadata to differentiate restrict objects.
For convenience, we also add this information to the llvm.noalias and
llvm.side.noalias intrinsics.

It is also possible that a single variable declaration contains multiple
restrict pointers (think of a struct containing multiple restrict pointers, or
an array of restrict pointers).  Instead of introducing new scopes when this
happens, we introduce an extra object ID (objId) parameter for
llvm.noalias.decl, llvm.noalias and llvm.side.noalias. This can be thought of as
the 'offset in the variable'.

Summary:
* llvm.noalias.decl %p.alloc, i32 objId, metadata !Scope
* llvm.noalias %p, %p.decl, %p.addr, i32 objId, metadata !Scope
* llvm.side.noalias %side.p, %p.decl, %p.addr, i32 objId, metadata !Scope
* llvm.noalias.arg.guard %p, %side.p

For alias analysis, this means that two llvm.side.noalias intrinsics represent a
different 'object P0', 'object P1', if:
* %p0.addr and $p1.addr are different
* or, objId0 and objId1 are different
* or, !Scope0 and !Scope1 are different

(D) Optimizing a restrict pointer pointing to a restrict pointer

Example:
  int * restrict * restrict ppA = ...;
  int * restrict * restrict ppB = ...;

  **ppA=42;
  **ppB=99;
  return **ppA; // according to C99, 6.7.3.1 paragraph 4, **ppA and **ppB are
not aliasing

In order to allow this optimization, we also need to track the !noalias scope
where the 'llvm.noalias' intrinsic was introduced.  The %p.addr
parameter in the
'llvm.side.noalias' will also get a noalias side channel, through the
%side.p.addr argument.

In short, we treat the 'llvm.noalias' and 'llvm.side.noalias'
intrinsics as if
they are a memory operation.

Summary:
* llvm.noalias.decl %p.alloc, i32 objId, metadata !Scope
* llvm.noalias %p, %p.decl, %p.addr, i32 objId, metadata !Scope, !noalias
!visible_scopes
* llvm.side.noalias %side.p, %p.decl, %p.addr, %side.p.addr, i32 objId, metadata
!Scope, !noalias !visible_scopes
* llvm.noalias.arg.guard %p, %side.p

For alias analysis, this means that two llvm.side.noalias intrinsics represent a
different 'object P0', 'object P1', if:
* %p0.addr and $p1.addr are different
* or, objId0 and objId1 are different
* or, !Scope0 and !Scope1 are different
* or we can prove that (%p0.addr, %side.p0.addr, !visible_scopes0) and
  (%p1.addr, %side.p1.addr, !visible_scopes1) do not alias for both intrinsics.
  (as if we treat each of the two 'llvm.side.noalias' as a 'store to
%p.addr'
  and we must prove that the two stores do not alias; also see [8], question 2)

(E) 'unknown function' scope

When the declaration of a restrict pointer is not visible, C99, 6.7.3.1
paragraph 2, says that the pointer is assumed to start living from
'main'.

This case can be handled by the 'unknown function' scope, which is
annotated to
the function itself. This can be treated as saying: the scope of this restrict
pointer starts somewhere outside this function. In such case, the
'llvm.noalias'
and 'llvm.side.noalias' will not be associated with a
'llvm.noalias.decl'. It is
possible that after inlining, the scopes can be refined to a declaration which
became visible.

For convenience, each function can have its own unknown scope specified by a
'noalias !UnknownScope' metadata attribute on the function itself.

(F) Aggregate copies

Restrictness is introduced by 'reading a restrict pointer'. It is not
always
possible to add the necessary 'llvm.noalias' annotation when this is
done. An
aggregate containing one or more restrict pointers can be copied with a single
load/store pair or a llvm.memcpy. This makes it hard to track when a restrict
pointer is copied over.

For this, a final intrinsic is introduced: 'llvm.noalias.copy.guard':

* 'llvm.noalias.copy.guard %p.addr, %p.decl, metadata !indices, metadata
!Scope' intrinsic
** Guards a %p.addr object that is copied as a single aggregate or llvm.memcpy
** %p.addr: the object to guard
** %p.decl: (when available), the llvm.noalias.decl associated with the object
** !indices: The !indices refers to a metadata list. Each element of the list
   refers to a set of indices where a restrict pointer is located.
** !Scope: the declaration scope of %p.decl

This information allows SROA to introduce the needed 'llvm.noalias'
intrinsics
when a struct is split up.

Summary:
* potential '!noalias !UnknownScope' annotation at function level
* llvm.noalias.decl %p.alloc, i32 objId, metadata !Scope
* llvm.noalias %p, %p.decl, %p.addr, i32 objId, metadata !Scope, !noalias
!visible_scopes
* llvm.side.noalias %side.p, %p.decl, %p.addr, %side.p.addr, i32 objId, metadata
!Scope, !noalias !visible_scopes
* llvm.noalias.arg.guard %p, %side.p
* llvm.noalias.copy.guard %p.addr, %p.decl, metadata !indices, metadata !Scope

(G) Optimization passes

For correctness, some optimization passes must be aware of the noalias
intrinsics:
inlining [7], unrolling [6], loop rotation, ...  Whenever a body is duplicated
that
contains a llvm.noalias.decl, it must be decided how that duplication must be
done.
Sometimes new unique scopes must be introduced, sometimes not.

Other optimization passes can perform better by knowing about the side channels:
when
new load/store instructions are introduced, adding side channel information can
result
in better alias analysis for those load/stores.

== c++ alias_set =
With this framework in place, it should be easy to extend it to support the
'alias_set' proposal [3]. This can be done by tracking a separate
'universe
object', instead of 'object P'.


== References =* [1] https://en.wikipedia.org/wiki/Restrict
* [2] WG14 N1256: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
(Chapter 6.7.3.1 Formal definition of restrict)
* [3] WG21 N4150:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4150.pdf
* [4] https://reviews.llvm.org/D9375   Hal Finkel's local restrict patches
* [5] https://bugs.llvm.org/show_bug.cgi?id=39240 "clang/llvm looses
restrictness, resulting in wrong code"
* [6] https://bugs.llvm.org/show_bug.cgi?id=39282 "Loop unrolling
incorrectly duplicates noalias metadata"
* [7] https://www.godbolt.org/z/cUk6To "testcase showing that llvm-ir is
not able to differentiate if restrict is done inside or outside the loop"
* [8] DR294: http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_294.htm
* [9] WG14 N2250: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2260.pdf 
Clarifying the restrict Keyword v2
llvm dev - Oct 2019 - [repost] RFC: Full 'restrict' support in LLVM

[llvm-dev] [repost] RFC: Full 'restrict' support in LLVM