Siddharth Bhat via llvm-dev
2019-Jul-21  22:53 UTC
[llvm-dev] [RFC] A new multidimensional array indexing intrinsic
Hello,
We would like to begin discussions around a new set of intrinsics, to
better express
multi-dimensional array indexing within LLVM. The motivations and a
possible design
are sketched out below.
Rendered RFC link here
<https://github.com/bollu/llvm-multidim-array-indexing-proposal/blob/master/RFC.md>
Raw markdown:
# Introducing a new multidimensional array indexing intrinsic
## The request for change: A new intrinsic, `llvm.multidim.array.index.*`.
We propose the addition of a new family of intrinsics,
`llvm.multidim.array.index.*`.
This will allow us to represent array indexing into an array `A[d1][d2][..][dk]`
as `llvm.multidim.array.index.k A d1, d2, d3, ... dk` instead of flattening
the information into `gep A, d1 * n1 + d2 * n2 + ... + dk * nk`. The former
unflattened representation is advantageous for analysis and
optimisation. It also
allows us to represent array indexing semantics of languages such as Fortran
and Chapel, which differs from that of C based array indexing.
	
## Motivating the need for a multi-dimensional array indexing intrinsic
There are primarily one of two views of array indexing involving
multiple dimensions most languages take, which we discuss to motivate
the need for a multi-dimensional array index. This consideration impacts
the kinds of analysis we can perform on the program. In Polly, we care about
dependence analysis, so the examples here will focus on that particular problem.
Let us consider an array indexing operation of the form:
```cpp
int ex1(int n, int m, B[n][m], int x1, int x2, int y1, int y2) {
	__builtin_assume(x1 != x2);
	__builtin_assume(y1 != y2);
	B[x1][y1] = 1;
	printf("%d", B[x2][y2]);
	exit(0);
}
```
One would like to infer that the array indices _interpreted as tuples_
`(x1, y1)` and `(x2, y2)` do not have the same value, due to the
guarding asserts
that `x1 != x2` and `y1 != y2`. As a result, the write `B[x1][y1] = 1` can
in no way interfere with the value of `B[x2][y2]`. Consquently,
we can optimise the program into:
```cpp
int ex1_opt(int n, int m, B[n][m], int x1, int x2, int y1, int y2) {
    // B[x1][y1] = 1 is omitted because the result
    // cannot be used:
    //  It is not used in the print and then the program exits
	printf("%d", B[x2][y2]);
	exit(0);
}
```
However, alas, this is illegal, for the C language does not provide
semantics that allow the final inference above. It is conceivable that
`x1 != x2, y1 != y2`, but the indices do actually alias, since
according to C semantics, the two indices alias if the _flattened
representation of the indices alias_. Consider the parameter
values:
```
n = m = 3
x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3
x2 = 0, y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3
```
Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can alias_, and
so the transformation proposed in `ex1_opt` is unsound in general.
In contrast, many langagues other than C require that index
expressions for multidimensional arrays have each component within the
array dimension for that component. As a result, in the example above,
the index pair `(0,3)` would be out-of-bounds. In languages with these
semantics, one can infer that the indexing:
`[x1][y1] != [x2][y2]` iff `x1 != x2 || y1 != y2`.
While this particular example is not very interesting, it shows the
spirit of the lack of expressiveness in LLVM we are trying to
improve.
Julia, Fortran, and Chapel are examples of such languages which target
LLVM.
Currently, the LLVM semantics of `getelementptr` talk purely of
the C style flattened views of arrays. This inhibits the ability of
the optimiser
to understand the multidimensional examples as given above, and we
are forced to make conservative assumptions, inhibiting optimisation.
This information must ideally be expressible in LLVM, such that LLVM's
optimisers and
alias analysis can use this information to model multidimensional-array
semantics.
There is a more realistic (and involved) example in [Appendix A](#Appendix-A)
in the same spirit as the above simple example, but one a compiler might
realistically wish to perform.
## Evaluation of the impact of the intrinsic on accuracy of dependence analysis
- This has been implemented in an exprimental branch of Polly, and was used
on the COSMO climate weather model. This greatly helped increase the accuracy
of Polly's analysis, since we eliminated the guessing game from the
array analysis.
- This has also been implemented as part of a GSoC effort to unify
Chapel and Polly.
- Molly, the distributed version of Polly written by Michael Kruse for his PhD
  also implemented a similar scheme. In his use-case, optimistic run-time checks
  with delinearization was not possible, so this kind of intrinsic was
_necessary_
  for the application, not just _good to have_. More details are available
  in his PhD thesis: [Lattice QCD Optimization and Polytopic
Representations of Distributed
Memory](https://tel.archives-ouvertes.fr/tel-01078440).
  In particular, Chapter 9 contains a detailed discussion.
# Representations
## Intrinsic
### Syntax
```
<result> = llvm.multidim.array.index.* <ty> <ty>*
<ptrval> {<stride>, <idx>}*
```
### Overview:
The `llvm.multidim.array.index.*` intrinsic is used to get the address of
an element from an array. It performs address calcuation only and
does not access memory. It is similar to `getelementptr`. However, it imposes
additional semantics which allows the optimiser to provide better optimisations
than `getlementptr`.
### Arguments:
The first argument is always a type used as the basis for the calculations. The
second argument is always a pointer, and is the base address to start the
calculation from. The remaining arguments are a list of pairs. Each pair
contains a dimension stride, and an offset with respect to that stride.
### Semantics:
`llvm.multidim.array.index.*` represents a multi-dimensional array
index, In particular, this will
mean that we will assume that all indices `<idx_i>` are non-negative.
Additionally, we assume that, for each `<str_i>`, `<idx_i>` pair,
that
`0 <= idx_i < str_i`.
Optimizations can assume that, given two llvm.multidim.array.index.*
instructions with matching types:
```
llvm.multidim.array.index.* <ty> <ty>* <ptrvalA>
<strA_1>, <idxA_1>,
..., <strA_N>, <idxA_N>
llvm.multidim.array.index.* <ty> <ty>* <ptrvalB>
<strB_1>, <idxB_1>,
..., <strB_N>, <idxb_N>
```
If `ptrvalA == ptrvalB` and the strides are equal `(strA_1 == strB_1
&& ... && strA_N == strB_N)` then:
- If all the indices are equal (that is, `idxA_1 == idxB_1 && ...
&&
idxA_N == idxB_N`),
  then the resulting pointer values are equal.
- If any index value is not equal (that is, there exists an `i` such
that `idxA_i != idxB_i`),
  then the resulting pointer values are not equal.
##### Address computation:
Consider an invocation of `llvm.multidim.array.index.*`:
```
<result> = call @llvm.multidim.array.index.* <ty> <ty>*
<ptrval>
<str_0>, <idx_0>, <str_1> <idx_1>, ..., <str_n>
<idx_n>
```
If the pairs are denoted by `(str_i, idx_i)`, where `str_i` denotes the stride
and `idx_i` denotes the index of the ith pair, then the final address (in bytes)
is computed as:
```
ptrval + len(ty) * [(str_0 * idx_0) + (str_1 * idx_1) + ... (str_n * idx_n)]
```
## Transitioning to `llvm.multidim.array.index.*`: Allow
`multidim_array_index` to refer to a GEP instruction:
This is a sketch of how we might gradually introduce the
`llvm.multidim.array.index.*`
intrinsic into LLVM without immediately losing the analyses
that are performed on `getelememtptr` instructions. This section
lists out some possible choices that we have, since the authors
do not have a "best" solution.
##### Choice 1: Write a `llvm.multidim.array.index.*` to `GEP` pass,
with the `GEP` annotated with metadata
This pass will flatten all `llvm.multidim.array.index.*` expressions
into a `GEP` annotated with metadata. This metadata will indicate that
the index expression computed by the lowered GEP is guaranteed to be
in a canonical form which allows the analysis
to infer stride and index sizes.
A multidim index of the form:
```
%arrayidx = llvm.multidim.array.index.* i64 i64* %A, %str_1, %idx_1,
%str_2, %idx_2
```
is lowered to:
```
%mul1 = mul nsw i64 %str_1, %idx_1
%mul2 = mul1 nsw i64 %str_2, %idx_2
%total = add nsw i64 %mul2, %mul1
%arrayidx = getelementptr inbounds i64, i64* %A, i64 %total, !multidim !1
```
with guarantees that the first term in each multiplication is the stride
and the second term in each multiplication is the index. (What happens
if intermediate transformation passes decide to change the order? This seems
complicated).
**TODO:** Lookup how to attach metadata such that the metadata can communicate
which of the values are strides and which are indeces
# Caveats
Currently, we assume that the array shape is immutable. However, we
will need to deal with
being able to express `reshape` like primitives where the array shape
can be mutated. However,
this appears to make expressing this information quite difficult: We
now need to attach the shape
information to an array per "shape-live-range".
## Appendix-A
##### A realistic, more involved example of dependence analysis going wrong
```cpp
// In an array A of size (n0 x n1),
// fill a subarray of size (s0 x s1)
// which starts at an offset (o0 x o1) in the larger array A
void set_subarray(unsigned n0, unsigned n1,
	unsigned o0, unsigned o1,
	unsigned s0, unsigned s1,
	float A[n0][n1]) {
	for (unsigned i = 0; i < s0; i++)
		for (unsigned j = 0; j < s1; j++)
				S: A[i + o0][j + o1] = 1;
}
```
We first reduce this index expression to a sum of products:
```
(i + o0) * n1 + (j + o1) = n1i + n1o0 + j + o1
ix(i, j, n0, n1, o0, o1) = n1i + n1o0 + j + o1
```
`ix` is the index expression which `LLVM` will see, since it is fully
flattened, in comparison with the multi-dimensional index expression
`index:[i + o0][j + o1] | sizes:[][n1]`.
We will now show _why_ this multi-dimensional index is not always correct,
and why guessing for one language will not work for another:
Consider a call `set_subarray(n0=8, n1=9, o0=4, o1=6, s0=3, s1=6)`. At face
value, this is incorrect if we view the array as 2D grid, since the size
of the array is `(n0, n1) = (8, 9)`, but we are writing an array of size
`(s0, s1) = (3, 6)` starting from `(o0, o1) = (4, 6)`. Clearly, we will
exceed the width of the array, since `(s1 + o1 = 6 + 6 = 12) > (n1 = 9)`.
However, now think of the array as a flattened 1D representation. In this
case, the total size of the array is `n1xn2 = 8x9 = 72`, while the largest
element we will access is at the largest value of `(i, j)`. That is,
`i = s0 - 1 = 2`, and `j = s1 - 1 = 5`.
The largest index will be `ix(i=2, j=5, n0=8, n1=9, o0=4, o1=6) 8*2+8*4+5+6=59`.
Since `59 < 72`, we are clearly at _legal_ array indices, by C semantics!
The definition of the semantics of the language **changed the illegal
multidimensional access** (which is illegal since it exceeds the `n1`
dimension), into a **legal flattened 1D access** (which is legal since the
flattened array indices are inbounds).
LLVM has no way of expressing these two different semantics. Hence, we are
forced to:
1. Consider flattened 1D accesses, which makes analysis of index expressions
equivalent to analysis of polynomials over the integers, which is hard.
2. Guess multidimensional representations, and use them at the expense of
soundness bugs as shown above.
3. Guess multidimensional representations, use them, and check their validity
at runtime, causing a runtime performance hit. This implementation follows
the description from the paper [Optimistic Delinearization of
Parametrically Sized Arrays](optim-delin-parametrically-sized-arrays).
Currently, Polly opts for option (3), which is to emit runtime checks. If
the run-time checks fail, then Polly will not run its optimised code. Instead,
It keeps a copy of the unoptimised code around, which is run in this case.
Note that this effectively doubles the amount of performance-sensitive code
which is finally emitted after running Polly.
Ideally, we would like a mechanism to directly express the multidimensional
semantics, which would eliminate this kind of guesswork from Polly/LLVM,
which would both make code faster, and easier to analyze.
## References
- [The chapel language
specification](https://chapel-lang.org/docs/1.13/_downloads/chapelLanguageSpec.pdf)
- [Fortran 2003
standard](http://www.j3-fortran.org/doc/year/04/04-007.pdf}{Fortran
2003 standard)
- [C++ subscripting](http://eel.is/c++draft/expr.sub)
- [Michael Kruse's PhD thesis: Lattice QCD Optimization and Polytopic
Representations of Distributed
Memory](https://tel.archives-ouvertes.fr/tel-01078440).
- [Molly source code link for the
intrinsic](https://github.com/Meinersbur/llvm/blob/molly/include/llvm/IR/IntrinsicsMolly.td#L3)
-[Optimistic Delinearization of Parametrically Sized
Arrays](optim-delin-parametrically-sized-arrays)
[optim-delin-parametrically-sized-arrays]:
http://delivery.acm.org/10.1145/2760000/2751248/p351-grosser.pdf?ip=93.3.109.183&id=2751248&acc=CHORUS&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1557948443_2f56675c6d04796f27b84593535c9f70
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190722/0a2310cb/attachment.html>
Doerfert, Johannes via llvm-dev
2019-Jul-22  14:50 UTC
[llvm-dev] [RFC] A new multidimensional array indexing intrinsic
First, I agree that there is a need for better multidimensional support. I (only) browsed the RFC (it is long) and I have a high-level question: Why introduce a new intrinsic (family)? It seems that would require us to support GEPs and GEP + "multi-dim" semantics in various places. What is the benefit over a GEP extension? (I also added a comment below regarding you C example.) Thanks, Johannes On 07/22, Siddharth Bhat via llvm-dev wrote:> Hello, > > We would like to begin discussions around a new set of intrinsics, to > better express > multi-dimensional array indexing within LLVM. The motivations and a > possible design > are sketched out below. > > Rendered RFC link here > <https://github.com/bollu/llvm-multidim-array-indexing-proposal/blob/master/RFC.md> > > Raw markdown: > > # Introducing a new multidimensional array indexing intrinsic > > ## The request for change: A new intrinsic, `llvm.multidim.array.index.*`. > > We propose the addition of a new family of intrinsics, > `llvm.multidim.array.index.*`. > This will allow us to represent array indexing into an array `A[d1][d2][..][dk]` > as `llvm.multidim.array.index.k A d1, d2, d3, ... dk` instead of flattening > the information into `gep A, d1 * n1 + d2 * n2 + ... + dk * nk`. The former > unflattened representation is advantageous for analysis and > optimisation. It also > allows us to represent array indexing semantics of languages such as Fortran > and Chapel, which differs from that of C based array indexing. > > ## Motivating the need for a multi-dimensional array indexing intrinsic > > There are primarily one of two views of array indexing involving > multiple dimensions most languages take, which we discuss to motivate > the need for a multi-dimensional array index. This consideration impacts > the kinds of analysis we can perform on the program. In Polly, we care about > dependence analysis, so the examples here will focus on that particular problem. > > Let us consider an array indexing operation of the form: > ```cpp > int ex1(int n, int m, B[n][m], int x1, int x2, int y1, int y2) { > __builtin_assume(x1 != x2); > __builtin_assume(y1 != y2); > B[x1][y1] = 1; > printf("%d", B[x2][y2]); > exit(0); > } > ``` > > One would like to infer that the array indices _interpreted as tuples_ > `(x1, y1)` and `(x2, y2)` do not have the same value, due to the > guarding asserts > that `x1 != x2` and `y1 != y2`. As a result, the write `B[x1][y1] = 1` can > in no way interfere with the value of `B[x2][y2]`. Consquently, > we can optimise the program into: > > > ```cpp > int ex1_opt(int n, int m, B[n][m], int x1, int x2, int y1, int y2) { > // B[x1][y1] = 1 is omitted because the result > // cannot be used: > // It is not used in the print and then the program exits > printf("%d", B[x2][y2]); > exit(0); > } > ``` > > However, alas, this is illegal, for the C language does not provide > semantics that allow the final inference above. It is conceivable that > `x1 != x2, y1 != y2`, but the indices do actually alias, since > according to C semantics, the two indices alias if the _flattened > representation of the indices alias_. Consider the parameter > values: > > ``` > n = m = 3 > x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3 > x2 = 0, y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3 > ``` > > Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can alias_, and > so the transformation proposed in `ex1_opt` is unsound in general.I'm unsure your example actually showcases the problem: C standard, N1570 draft, Page 560, Appendix J.2 Undefined Behavior: An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration inta[4][5]) (6.5.6).> In contrast, many langagues other than C require that index > expressions for multidimensional arrays have each component within the > array dimension for that component. As a result, in the example above, > the index pair `(0,3)` would be out-of-bounds. In languages with these > semantics, one can infer that the indexing: > > `[x1][y1] != [x2][y2]` iff `x1 != x2 || y1 != y2`. > > While this particular example is not very interesting, it shows the > spirit of the lack of expressiveness in LLVM we are trying to > improve. > > Julia, Fortran, and Chapel are examples of such languages which target > LLVM. > > Currently, the LLVM semantics of `getelementptr` talk purely of > the C style flattened views of arrays. This inhibits the ability of > the optimiser > to understand the multidimensional examples as given above, and we > are forced to make conservative assumptions, inhibiting optimisation. > This information must ideally be expressible in LLVM, such that LLVM's > optimisers and > alias analysis can use this information to model multidimensional-array > semantics. > > There is a more realistic (and involved) example in [Appendix A](#Appendix-A) > in the same spirit as the above simple example, but one a compiler might > realistically wish to perform. > > ## Evaluation of the impact of the intrinsic on accuracy of dependence analysis > > - This has been implemented in an exprimental branch of Polly, and was used > on the COSMO climate weather model. This greatly helped increase the accuracy > of Polly's analysis, since we eliminated the guessing game from the > array analysis. > > - This has also been implemented as part of a GSoC effort to unify > Chapel and Polly. > > - Molly, the distributed version of Polly written by Michael Kruse for his PhD > also implemented a similar scheme. In his use-case, optimistic run-time checks > with delinearization was not possible, so this kind of intrinsic was > _necessary_ > for the application, not just _good to have_. More details are available > in his PhD thesis: [Lattice QCD Optimization and Polytopic > Representations of Distributed > Memory](https://tel.archives-ouvertes.fr/tel-01078440). > In particular, Chapter 9 contains a detailed discussion. > > # Representations > > ## Intrinsic > > ### Syntax > ``` > <result> = llvm.multidim.array.index.* <ty> <ty>* <ptrval> {<stride>, <idx>}* > ``` > > ### Overview: > > The `llvm.multidim.array.index.*` intrinsic is used to get the address of > an element from an array. It performs address calcuation only and > does not access memory. It is similar to `getelementptr`. However, it imposes > additional semantics which allows the optimiser to provide better optimisations > than `getlementptr`. > > > > ### Arguments: > > The first argument is always a type used as the basis for the calculations. The > second argument is always a pointer, and is the base address to start the > calculation from. The remaining arguments are a list of pairs. Each pair > contains a dimension stride, and an offset with respect to that stride. > > > ### Semantics: > > `llvm.multidim.array.index.*` represents a multi-dimensional array > index, In particular, this will > mean that we will assume that all indices `<idx_i>` are non-negative. > > Additionally, we assume that, for each `<str_i>`, `<idx_i>` pair, that > `0 <= idx_i < str_i`. > > Optimizations can assume that, given two llvm.multidim.array.index.* > instructions with matching types: > > ``` > llvm.multidim.array.index.* <ty> <ty>* <ptrvalA> <strA_1>, <idxA_1>, > ..., <strA_N>, <idxA_N> > llvm.multidim.array.index.* <ty> <ty>* <ptrvalB> <strB_1>, <idxB_1>, > ..., <strB_N>, <idxb_N> > ``` > > If `ptrvalA == ptrvalB` and the strides are equal `(strA_1 == strB_1 > && ... && strA_N == strB_N)` then: > > - If all the indices are equal (that is, `idxA_1 == idxB_1 && ... && > idxA_N == idxB_N`), > then the resulting pointer values are equal. > > - If any index value is not equal (that is, there exists an `i` such > that `idxA_i != idxB_i`), > then the resulting pointer values are not equal. > > > ##### Address computation: > Consider an invocation of `llvm.multidim.array.index.*`: > > ``` > <result> = call @llvm.multidim.array.index.* <ty> <ty>* <ptrval> > <str_0>, <idx_0>, <str_1> <idx_1>, ..., <str_n> <idx_n> > ``` > > If the pairs are denoted by `(str_i, idx_i)`, where `str_i` denotes the stride > and `idx_i` denotes the index of the ith pair, then the final address (in bytes) > is computed as: > > ``` > ptrval + len(ty) * [(str_0 * idx_0) + (str_1 * idx_1) + ... (str_n * idx_n)] > ``` > > ## Transitioning to `llvm.multidim.array.index.*`: Allow > `multidim_array_index` to refer to a GEP instruction: > > This is a sketch of how we might gradually introduce the > `llvm.multidim.array.index.*` > intrinsic into LLVM without immediately losing the analyses > that are performed on `getelememtptr` instructions. This section > lists out some possible choices that we have, since the authors > do not have a "best" solution. > > ##### Choice 1: Write a `llvm.multidim.array.index.*` to `GEP` pass, > with the `GEP` annotated with metadata > > This pass will flatten all `llvm.multidim.array.index.*` expressions > into a `GEP` annotated with metadata. This metadata will indicate that > the index expression computed by the lowered GEP is guaranteed to be > in a canonical form which allows the analysis > to infer stride and index sizes. > > A multidim index of the form: > ``` > %arrayidx = llvm.multidim.array.index.* i64 i64* %A, %str_1, %idx_1, > %str_2, %idx_2 > ``` > > is lowered to: > > ``` > %mul1 = mul nsw i64 %str_1, %idx_1 > %mul2 = mul1 nsw i64 %str_2, %idx_2 > %total = add nsw i64 %mul2, %mul1 > %arrayidx = getelementptr inbounds i64, i64* %A, i64 %total, !multidim !1 > ``` > with guarantees that the first term in each multiplication is the stride > and the second term in each multiplication is the index. (What happens > if intermediate transformation passes decide to change the order? This seems > complicated). > > **TODO:** Lookup how to attach metadata such that the metadata can communicate > which of the values are strides and which are indeces > > > > # Caveats > > Currently, we assume that the array shape is immutable. However, we > will need to deal with > being able to express `reshape` like primitives where the array shape > can be mutated. However, > this appears to make expressing this information quite difficult: We > now need to attach the shape > information to an array per "shape-live-range". > > > ## Appendix-A > > ##### A realistic, more involved example of dependence analysis going wrong > > ```cpp > // In an array A of size (n0 x n1), > // fill a subarray of size (s0 x s1) > // which starts at an offset (o0 x o1) in the larger array A > void set_subarray(unsigned n0, unsigned n1, > unsigned o0, unsigned o1, > unsigned s0, unsigned s1, > float A[n0][n1]) { > for (unsigned i = 0; i < s0; i++) > for (unsigned j = 0; j < s1; j++) > S: A[i + o0][j + o1] = 1; > } > ``` > We first reduce this index expression to a sum of products: > > ``` > (i + o0) * n1 + (j + o1) = n1i + n1o0 + j + o1 > ix(i, j, n0, n1, o0, o1) = n1i + n1o0 + j + o1 > ``` > > `ix` is the index expression which `LLVM` will see, since it is fully > flattened, in comparison with the multi-dimensional index expression > `index:[i + o0][j + o1] | sizes:[][n1]`. > > We will now show _why_ this multi-dimensional index is not always correct, > and why guessing for one language will not work for another: > > Consider a call `set_subarray(n0=8, n1=9, o0=4, o1=6, s0=3, s1=6)`. At face > value, this is incorrect if we view the array as 2D grid, since the size > of the array is `(n0, n1) = (8, 9)`, but we are writing an array of size > `(s0, s1) = (3, 6)` starting from `(o0, o1) = (4, 6)`. Clearly, we will > exceed the width of the array, since `(s1 + o1 = 6 + 6 = 12) > (n1 = 9)`. > However, now think of the array as a flattened 1D representation. In this > case, the total size of the array is `n1xn2 = 8x9 = 72`, while the largest > element we will access is at the largest value of `(i, j)`. That is, > `i = s0 - 1 = 2`, and `j = s1 - 1 = 5`. > > The largest index will be `ix(i=2, j=5, n0=8, n1=9, o0=4, o1=6) > 8*2+8*4+5+6=59`. > Since `59 < 72`, we are clearly at _legal_ array indices, by C semantics! > > The definition of the semantics of the language **changed the illegal > multidimensional access** (which is illegal since it exceeds the `n1` > dimension), into a **legal flattened 1D access** (which is legal since the > flattened array indices are inbounds). > > LLVM has no way of expressing these two different semantics. Hence, we are > forced to: > 1. Consider flattened 1D accesses, which makes analysis of index expressions > equivalent to analysis of polynomials over the integers, which is hard. > 2. Guess multidimensional representations, and use them at the expense of > soundness bugs as shown above. > 3. Guess multidimensional representations, use them, and check their validity > at runtime, causing a runtime performance hit. This implementation follows > the description from the paper [Optimistic Delinearization of > Parametrically Sized Arrays](optim-delin-parametrically-sized-arrays). > > > Currently, Polly opts for option (3), which is to emit runtime checks. If > the run-time checks fail, then Polly will not run its optimised code. Instead, > It keeps a copy of the unoptimised code around, which is run in this case. > Note that this effectively doubles the amount of performance-sensitive code > which is finally emitted after running Polly. > > Ideally, we would like a mechanism to directly express the multidimensional > semantics, which would eliminate this kind of guesswork from Polly/LLVM, > which would both make code faster, and easier to analyze. > > ## References > - [The chapel language > specification](https://chapel-lang.org/docs/1.13/_downloads/chapelLanguageSpec.pdf) > - [Fortran 2003 > standard](http://www.j3-fortran.org/doc/year/04/04-007.pdf}{Fortran > 2003 standard) > - [C++ subscripting](http://eel.is/c++draft/expr.sub) > - [Michael Kruse's PhD thesis: Lattice QCD Optimization and Polytopic > Representations of Distributed > Memory](https://tel.archives-ouvertes.fr/tel-01078440). > - [Molly source code link for the > intrinsic](https://github.com/Meinersbur/llvm/blob/molly/include/llvm/IR/IntrinsicsMolly.td#L3) > -[Optimistic Delinearization of Parametrically Sized > Arrays](optim-delin-parametrically-sized-arrays) > > [optim-delin-parametrically-sized-arrays]: > http://delivery.acm.org/10.1145/2760000/2751248/p351-grosser.pdf?ip=93.3.109.183&id=2751248&acc=CHORUS&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1557948443_2f56675c6d04796f27b84593535c9f70> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190722/2751e8af/attachment.sig>
Michael Kruse via llvm-dev
2019-Jul-22  15:59 UTC
[llvm-dev] [RFC] A new multidimensional array indexing intrinsic
Am Mo., 22. Juli 2019 um 10:50 Uhr schrieb Doerfert, Johannes <jdoerfert at anl.gov>:> Why introduce a new intrinsic (family)? It seems that would require us > to support GEPs and GEP + "multi-dim" semantics in various places. What is > the benefit over a GEP extension?Adding an intrinsic is easier than adding or extending an existing instruction, as suggested by https://llvm.org/docs/ExtendingLLVM.html#introduction-and-warning Extending GEP would require all passes to understand the added semantics on day 1, while a new intrinsic allows us to gradually add support. We can still make the intrinsic into an instruction when existing passes understand the new functionality.> > However, alas, this is illegal, for the C language does not provide > > semantics that allow the final inference above. It is conceivable that > > `x1 != x2, y1 != y2`, but the indices do actually alias, since > > according to C semantics, the two indices alias if the _flattened > > representation of the indices alias_. Consider the parameter > > values: > > > > ``` > > n = m = 3 > > x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3 > > x2 = 0, y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3 > > ``` > > > > Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can alias_, and > > so the transformation proposed in `ex1_opt` is unsound in general. > > I'm unsure your example actually showcases the problem: > > C standard, N1570 draft, Page 560, Appendix J.2 Undefined Behavior: > > An array subscript is out of range, even if an object is apparently > accessible with the given subscript (as in the lvalue expression a[1][7] > given the declaration inta[4][5]) (6.5.6).GEP requires an array type, which in LLVM has static size. For dynamically-sized array (In C/C++ language family, this is only supported by C99 VLAs), clang emits the the linearized index expression followed by a single-dimensional GEP. The multi-dimensional GEP instruction has this semantics with the `inrange` modifier, which clang currently does not generate for subscript expressions. Given that there might be code around relying on this behavior, and the C++ standard not containing this paragraph, I am not sure we can change it to undefined behavior, even if the C standard would permit it. I think this is a separate discussion since the main motivation of the RFC are languages that have broader use of runtime-length arrays. But for the showcase, you are right in that according to the C standard, this aliasing might be undefined behavior. Michael
Peter Collingbourne via llvm-dev
2019-Jul-22  17:01 UTC
[llvm-dev] [RFC] A new multidimensional array indexing intrinsic
The restrictions of this new intrinsic seem very similar to those of the inrange attribute on GEPs. It seems that the main advantage of your proposal is that it would allow for non-constant strides (i.e. variable length arrays) in dimensions other than the first one. Do these appear frequently enough in the programs that you're interested in to be worth optimizing for? Peter On Sun, Jul 21, 2019 at 3:54 PM Siddharth Bhat via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hello, > > We would like to begin discussions around a new set of intrinsics, to > better express > multi-dimensional array indexing within LLVM. The motivations and a > possible design > are sketched out below. > > Rendered RFC link here > <https://github.com/bollu/llvm-multidim-array-indexing-proposal/blob/master/RFC.md> > > Raw markdown: > > # Introducing a new multidimensional array indexing intrinsic > > ## The request for change: A new intrinsic, `llvm.multidim.array.index.*`. > > We propose the addition of a new family of intrinsics, `llvm.multidim.array.index.*`. > This will allow us to represent array indexing into an array `A[d1][d2][..][dk]` > as `llvm.multidim.array.index.k A d1, d2, d3, ... dk` instead of flattening > the information into `gep A, d1 * n1 + d2 * n2 + ... + dk * nk`. The former > unflattened representation is advantageous for analysis and optimisation. It also > allows us to represent array indexing semantics of languages such as Fortran > and Chapel, which differs from that of C based array indexing. > > ## Motivating the need for a multi-dimensional array indexing intrinsic > > There are primarily one of two views of array indexing involving > multiple dimensions most languages take, which we discuss to motivate > the need for a multi-dimensional array index. This consideration impacts > the kinds of analysis we can perform on the program. In Polly, we care about > dependence analysis, so the examples here will focus on that particular problem. > > Let us consider an array indexing operation of the form: > ```cpp > int ex1(int n, int m, B[n][m], int x1, int x2, int y1, int y2) { > __builtin_assume(x1 != x2); > __builtin_assume(y1 != y2); > B[x1][y1] = 1; > printf("%d", B[x2][y2]); > exit(0); > } > ``` > > One would like to infer that the array indices _interpreted as tuples_ > `(x1, y1)` and `(x2, y2)` do not have the same value, due to the guarding asserts > that `x1 != x2` and `y1 != y2`. As a result, the write `B[x1][y1] = 1` can > in no way interfere with the value of `B[x2][y2]`. Consquently, > we can optimise the program into: > > > ```cpp > int ex1_opt(int n, int m, B[n][m], int x1, int x2, int y1, int y2) { > // B[x1][y1] = 1 is omitted because the result > // cannot be used: > // It is not used in the print and then the program exits > printf("%d", B[x2][y2]); > exit(0); > } > ``` > > However, alas, this is illegal, for the C language does not provide > semantics that allow the final inference above. It is conceivable that > `x1 != x2, y1 != y2`, but the indices do actually alias, since > according to C semantics, the two indices alias if the _flattened > representation of the indices alias_. Consider the parameter > values: > > ``` > n = m = 3 > x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3 > x2 = 0, y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3 > ``` > > Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can alias_, and > so the transformation proposed in `ex1_opt` is unsound in general. > > > In contrast, many langagues other than C require that index > expressions for multidimensional arrays have each component within the > array dimension for that component. As a result, in the example above, > the index pair `(0,3)` would be out-of-bounds. In languages with these > semantics, one can infer that the indexing: > > `[x1][y1] != [x2][y2]` iff `x1 != x2 || y1 != y2`. > > While this particular example is not very interesting, it shows the > spirit of the lack of expressiveness in LLVM we are trying to > improve. > > Julia, Fortran, and Chapel are examples of such languages which target > LLVM. > > Currently, the LLVM semantics of `getelementptr` talk purely of > the C style flattened views of arrays. This inhibits the ability of the optimiser > to understand the multidimensional examples as given above, and we > are forced to make conservative assumptions, inhibiting optimisation. > This information must ideally be expressible in LLVM, such that LLVM's optimisers and > alias analysis can use this information to model multidimensional-array > semantics. > > There is a more realistic (and involved) example in [Appendix A](#Appendix-A) > in the same spirit as the above simple example, but one a compiler might > realistically wish to perform. > > ## Evaluation of the impact of the intrinsic on accuracy of dependence analysis > > - This has been implemented in an exprimental branch of Polly, and was used > on the COSMO climate weather model. This greatly helped increase the accuracy > of Polly's analysis, since we eliminated the guessing game from the array analysis. > > - This has also been implemented as part of a GSoC effort to unify > Chapel and Polly. > > - Molly, the distributed version of Polly written by Michael Kruse for his PhD > also implemented a similar scheme. In his use-case, optimistic run-time checks > with delinearization was not possible, so this kind of intrinsic was _necessary_ > for the application, not just _good to have_. More details are available > in his PhD thesis: [Lattice QCD Optimization and Polytopic Representations of Distributed Memory](https://tel.archives-ouvertes.fr/tel-01078440). > In particular, Chapter 9 contains a detailed discussion. > > # Representations > > ## Intrinsic > > ### Syntax > ``` > <result> = llvm.multidim.array.index.* <ty> <ty>* <ptrval> {<stride>, <idx>}* > ``` > > ### Overview: > > The `llvm.multidim.array.index.*` intrinsic is used to get the address of > an element from an array. It performs address calcuation only and > does not access memory. It is similar to `getelementptr`. However, it imposes > additional semantics which allows the optimiser to provide better optimisations > than `getlementptr`. > > > > ### Arguments: > > The first argument is always a type used as the basis for the calculations. The > second argument is always a pointer, and is the base address to start the > calculation from. The remaining arguments are a list of pairs. Each pair > contains a dimension stride, and an offset with respect to that stride. > > > ### Semantics: > > `llvm.multidim.array.index.*` represents a multi-dimensional array index, In particular, this will > mean that we will assume that all indices `<idx_i>` are non-negative. > > Additionally, we assume that, for each `<str_i>`, `<idx_i>` pair, that > `0 <= idx_i < str_i`. > > Optimizations can assume that, given two llvm.multidim.array.index.* instructions with matching types: > > ``` > llvm.multidim.array.index.* <ty> <ty>* <ptrvalA> <strA_1>, <idxA_1>, ..., <strA_N>, <idxA_N> > llvm.multidim.array.index.* <ty> <ty>* <ptrvalB> <strB_1>, <idxB_1>, ..., <strB_N>, <idxb_N> > ``` > > If `ptrvalA == ptrvalB` and the strides are equal `(strA_1 == strB_1 && ... && strA_N == strB_N)` then: > > - If all the indices are equal (that is, `idxA_1 == idxB_1 && ... && idxA_N == idxB_N`), > then the resulting pointer values are equal. > > - If any index value is not equal (that is, there exists an `i` such that `idxA_i != idxB_i`), > then the resulting pointer values are not equal. > > > ##### Address computation: > Consider an invocation of `llvm.multidim.array.index.*`: > > ``` > <result> = call @llvm.multidim.array.index.* <ty> <ty>* <ptrval> <str_0>, <idx_0>, <str_1> <idx_1>, ..., <str_n> <idx_n> > ``` > > If the pairs are denoted by `(str_i, idx_i)`, where `str_i` denotes the stride > and `idx_i` denotes the index of the ith pair, then the final address (in bytes) > is computed as: > > ``` > ptrval + len(ty) * [(str_0 * idx_0) + (str_1 * idx_1) + ... (str_n * idx_n)] > ``` > > ## Transitioning to `llvm.multidim.array.index.*`: Allow `multidim_array_index` to refer to a GEP instruction: > > This is a sketch of how we might gradually introduce the `llvm.multidim.array.index.*` > intrinsic into LLVM without immediately losing the analyses > that are performed on `getelememtptr` instructions. This section > lists out some possible choices that we have, since the authors > do not have a "best" solution. > > ##### Choice 1: Write a `llvm.multidim.array.index.*` to `GEP` pass, with the `GEP` annotated with metadata > > This pass will flatten all `llvm.multidim.array.index.*` expressions into a `GEP` annotated with metadata. This metadata will indicate that the index expression computed by the lowered GEP is guaranteed to be in a canonical form which allows the analysis > to infer stride and index sizes. > > A multidim index of the form: > ``` > %arrayidx = llvm.multidim.array.index.* i64 i64* %A, %str_1, %idx_1, %str_2, %idx_2 > ``` > > is lowered to: > > ``` > %mul1 = mul nsw i64 %str_1, %idx_1 > %mul2 = mul1 nsw i64 %str_2, %idx_2 > %total = add nsw i64 %mul2, %mul1 > %arrayidx = getelementptr inbounds i64, i64* %A, i64 %total, !multidim !1 > ``` > with guarantees that the first term in each multiplication is the stride > and the second term in each multiplication is the index. (What happens > if intermediate transformation passes decide to change the order? This seems > complicated). > > **TODO:** Lookup how to attach metadata such that the metadata can communicate > which of the values are strides and which are indeces > > > > # Caveats > > Currently, we assume that the array shape is immutable. However, we will need to deal with > being able to express `reshape` like primitives where the array shape can be mutated. However, > this appears to make expressing this information quite difficult: We now need to attach the shape > information to an array per "shape-live-range". > > > ## Appendix-A > > ##### A realistic, more involved example of dependence analysis going wrong > > ```cpp > // In an array A of size (n0 x n1), > // fill a subarray of size (s0 x s1) > // which starts at an offset (o0 x o1) in the larger array A > void set_subarray(unsigned n0, unsigned n1, > unsigned o0, unsigned o1, > unsigned s0, unsigned s1, > float A[n0][n1]) { > for (unsigned i = 0; i < s0; i++) > for (unsigned j = 0; j < s1; j++) > S: A[i + o0][j + o1] = 1; > } > ``` > We first reduce this index expression to a sum of products: > > ``` > (i + o0) * n1 + (j + o1) = n1i + n1o0 + j + o1 > ix(i, j, n0, n1, o0, o1) = n1i + n1o0 + j + o1 > ``` > > `ix` is the index expression which `LLVM` will see, since it is fully > flattened, in comparison with the multi-dimensional index expression > `index:[i + o0][j + o1] | sizes:[][n1]`. > > We will now show _why_ this multi-dimensional index is not always correct, > and why guessing for one language will not work for another: > > Consider a call `set_subarray(n0=8, n1=9, o0=4, o1=6, s0=3, s1=6)`. At face > value, this is incorrect if we view the array as 2D grid, since the size > of the array is `(n0, n1) = (8, 9)`, but we are writing an array of size > `(s0, s1) = (3, 6)` starting from `(o0, o1) = (4, 6)`. Clearly, we will > exceed the width of the array, since `(s1 + o1 = 6 + 6 = 12) > (n1 = 9)`. > However, now think of the array as a flattened 1D representation. In this > case, the total size of the array is `n1xn2 = 8x9 = 72`, while the largest > element we will access is at the largest value of `(i, j)`. That is, > `i = s0 - 1 = 2`, and `j = s1 - 1 = 5`. > > The largest index will be `ix(i=2, j=5, n0=8, n1=9, o0=4, o1=6) = 8*2+8*4+5+6=59`. > Since `59 < 72`, we are clearly at _legal_ array indices, by C semantics! > > The definition of the semantics of the language **changed the illegal > multidimensional access** (which is illegal since it exceeds the `n1` > dimension), into a **legal flattened 1D access** (which is legal since the > flattened array indices are inbounds). > > LLVM has no way of expressing these two different semantics. Hence, we are > forced to: > 1. Consider flattened 1D accesses, which makes analysis of index expressions > equivalent to analysis of polynomials over the integers, which is hard. > 2. Guess multidimensional representations, and use them at the expense of > soundness bugs as shown above. > 3. Guess multidimensional representations, use them, and check their validity > at runtime, causing a runtime performance hit. This implementation follows > the description from the paper [Optimistic Delinearization of Parametrically Sized Arrays](optim-delin-parametrically-sized-arrays). > > > Currently, Polly opts for option (3), which is to emit runtime checks. If > the run-time checks fail, then Polly will not run its optimised code. Instead, > It keeps a copy of the unoptimised code around, which is run in this case. > Note that this effectively doubles the amount of performance-sensitive code > which is finally emitted after running Polly. > > Ideally, we would like a mechanism to directly express the multidimensional > semantics, which would eliminate this kind of guesswork from Polly/LLVM, > which would both make code faster, and easier to analyze. > > ## References > - [The chapel language specification](https://chapel-lang.org/docs/1.13/_downloads/chapelLanguageSpec.pdf) > - [Fortran 2003 standard](http://www.j3-fortran.org/doc/year/04/04-007.pdf}{Fortran 2003 standard) > - [C++ subscripting](http://eel.is/c++draft/expr.sub) > - [Michael Kruse's PhD thesis: Lattice QCD Optimization and Polytopic Representations of Distributed Memory](https://tel.archives-ouvertes.fr/tel-01078440). > - [Molly source code link for the intrinsic](https://github.com/Meinersbur/llvm/blob/molly/include/llvm/IR/IntrinsicsMolly.td#L3) > -[Optimistic Delinearization of Parametrically Sized Arrays](optim-delin-parametrically-sized-arrays) > > [optim-delin-parametrically-sized-arrays]: http://delivery.acm.org/10.1145/2760000/2751248/p351-grosser.pdf?ip=93.3.109.183&id=2751248&acc=CHORUS&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1557948443_2f56675c6d04796f27b84593535c9f70 > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- -- Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190722/e3c6c5b8/attachment.html>
Philip Reames via llvm-dev
2019-Jul-22  17:13 UTC
[llvm-dev] [RFC] A new multidimensional array indexing intrinsic
We could also simply extend the existing inrange mechanism to non-constantexpr GEPs. It would remove an inconsistency in the semantics, be relatively straight forward, and solve the motivating example. (I didn't read the proposal in full, so there may be other examples it doesn't solve.) Philip On 7/22/19 10:01 AM, Peter Collingbourne via llvm-dev wrote:> The restrictions of this new intrinsic seem very similar to those of > the inrange attribute on GEPs. It seems that the main advantage of > your proposal is that it would allow for non-constant strides (i.e. > variable length arrays) in dimensions other than the first one. Do > these appear frequently enough in the programs that you're interested > in to be worth optimizing for? > > Peter > > On Sun, Jul 21, 2019 at 3:54 PM Siddharth Bhat via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Hello, > > We would like to begin discussions around a new set of intrinsics, > to better express > multi-dimensional array indexing within LLVM. The motivations and > a possible design > are sketched out below. > > Rendered RFC link here > <https://github.com/bollu/llvm-multidim-array-indexing-proposal/blob/master/RFC.md> > > > Raw markdown: > > # Introducing a new multidimensional array indexing intrinsic > > ## The request for change: A new intrinsic, `llvm.multidim.array.index.*`. > > We propose the addition of a new family of intrinsics, `llvm.multidim.array.index.*`. > This will allow us to represent array indexing into an array `A[d1][d2][..][dk]` > as `llvm.multidim.array.index.k A d1, d2, d3, ... dk` instead of flattening > the information into `gep A, d1 * n1 + d2 * n2 + ... + dk * nk`. The former > unflattened representation is advantageous for analysis and optimisation. It also > allows us to represent array indexing semantics of languages such as Fortran > and Chapel, which differs from that of C based array indexing. > > ## Motivating the need for a multi-dimensional array indexing intrinsic > > There are primarily one of two views of array indexing involving > multiple dimensions most languages take, which we discuss to motivate > the need for a multi-dimensional array index. This consideration impacts > the kinds of analysis we can perform on the program. In Polly, we care about > dependence analysis, so the examples here will focus on that particular problem. > > Let us consider an array indexing operation of the form: > ```cpp > int ex1(int n, int m, B[n][m], int x1, int x2, int y1, int y2) { > __builtin_assume(x1 != x2); > __builtin_assume(y1 != y2); > B[x1][y1] = 1; > printf("%d", B[x2][y2]); > exit(0); > } > ``` > > One would like to infer that the array indices _interpreted as tuples_ > `(x1, y1)` and `(x2, y2)` do not have the same value, due to the guarding asserts > that `x1 != x2` and `y1 != y2`. As a result, the write `B[x1][y1] = 1` can > in no way interfere with the value of `B[x2][y2]`. Consquently, > we can optimise the program into: > > > ```cpp > int ex1_opt(int n, int m, B[n][m], int x1, int x2, int y1, int y2) { > // B[x1][y1] = 1 is omitted because the result > // cannot be used: > // It is not used in the print and then the program exits > printf("%d", B[x2][y2]); > exit(0); > } > ``` > > However, alas, this is illegal, for the C language does not provide > semantics that allow the final inference above. It is conceivable that > `x1 != x2, y1 != y2`, but the indices do actually alias, since > according to C semantics, the two indices alias if the _flattened > representation of the indices alias_. Consider the parameter > values: > > ``` > n = m = 3 > x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3 > x2 = 0, y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3 > ``` > > Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can alias_, and > so the transformation proposed in `ex1_opt` is unsound in general. > > > In contrast, many langagues other than C require that index > expressions for multidimensional arrays have each component within the > array dimension for that component. As a result, in the example above, > the index pair `(0,3)` would be out-of-bounds. In languages with these > semantics, one can infer that the indexing: > > `[x1][y1] != [x2][y2]` iff `x1 != x2 || y1 != y2`. > > While this particular example is not very interesting, it shows the > spirit of the lack of expressiveness in LLVM we are trying to > improve. > > Julia, Fortran, and Chapel are examples of such languages which target > LLVM. > > Currently, the LLVM semantics of `getelementptr` talk purely of > the C style flattened views of arrays. This inhibits the ability of the optimiser > to understand the multidimensional examples as given above, and we > are forced to make conservative assumptions, inhibiting optimisation. > This information must ideally be expressible in LLVM, such that LLVM's optimisers and > alias analysis can use this information to model multidimensional-array > semantics. > > There is a more realistic (and involved) example in [Appendix A](#Appendix-A) > in the same spirit as the above simple example, but one a compiler might > realistically wish to perform. > > ## Evaluation of the impact of the intrinsic on accuracy of dependence analysis > > - This has been implemented in an exprimental branch of Polly, and was used > on the COSMO climate weather model. This greatly helped increase the accuracy > of Polly's analysis, since we eliminated the guessing game from the array analysis. > > - This has also been implemented as part of a GSoC effort to unify > Chapel and Polly. > > - Molly, the distributed version of Polly written by Michael Kruse for his PhD > also implemented a similar scheme. In his use-case, optimistic run-time checks > with delinearization was not possible, so this kind of intrinsic was _necessary_ > for the application, not just _good to have_. More details are available > in his PhD thesis: [Lattice QCD Optimization and Polytopic Representations of Distributed Memory](https://tel.archives-ouvertes.fr/tel-01078440). > In particular, Chapter 9 contains a detailed discussion. > > # Representations > > ## Intrinsic > > ### Syntax > ``` > <result> = llvm.multidim.array.index.* <ty> <ty>* <ptrval> {<stride>, <idx>}* > ``` > > ### Overview: > > The `llvm.multidim.array.index.*` intrinsic is used to get the address of > an element from an array. It performs address calcuation only and > does not access memory. It is similar to `getelementptr`. However, it imposes > additional semantics which allows the optimiser to provide better optimisations > than `getlementptr`. > > > > ### Arguments: > > The first argument is always a type used as the basis for the calculations. The > second argument is always a pointer, and is the base address to start the > calculation from. The remaining arguments are a list of pairs. Each pair > contains a dimension stride, and an offset with respect to that stride. > > > ### Semantics: > > `llvm.multidim.array.index.*` represents a multi-dimensional array index, In particular, this will > mean that we will assume that all indices `<idx_i>` are non-negative. > > Additionally, we assume that, for each `<str_i>`, `<idx_i>` pair, that > `0 <= idx_i < str_i`. > > Optimizations can assume that, given two llvm.multidim.array.index.* instructions with matching types: > > ``` > llvm.multidim.array.index.* <ty> <ty>* <ptrvalA> <strA_1>, <idxA_1>, ..., <strA_N>, <idxA_N> > llvm.multidim.array.index.* <ty> <ty>* <ptrvalB> <strB_1>, <idxB_1>, ..., <strB_N>, <idxb_N> > ``` > > If `ptrvalA == ptrvalB` and the strides are equal `(strA_1 == strB_1 && ... && strA_N == strB_N)` then: > > - If all the indices are equal (that is, `idxA_1 == idxB_1 && ... && idxA_N == idxB_N`), > then the resulting pointer values are equal. > > - If any index value is not equal (that is, there exists an `i` such that `idxA_i != idxB_i`), > then the resulting pointer values are not equal. > > > ##### Address computation: > Consider an invocation of `llvm.multidim.array.index.*`: > > ``` > <result> = call @llvm.multidim.array.index.* <ty> <ty>* <ptrval> <str_0>, <idx_0>, <str_1> <idx_1>, ..., <str_n> <idx_n> > ``` > > If the pairs are denoted by `(str_i, idx_i)`, where `str_i` denotes the stride > and `idx_i` denotes the index of the ith pair, then the final address (in bytes) > is computed as: > > ``` > ptrval + len(ty) * [(str_0 * idx_0) + (str_1 * idx_1) + ... (str_n * idx_n)] > ``` > > ## Transitioning to `llvm.multidim.array.index.*`: Allow `multidim_array_index` to refer to a GEP instruction: > > This is a sketch of how we might gradually introduce the `llvm.multidim.array.index.*` > intrinsic into LLVM without immediately losing the analyses > that are performed on `getelememtptr` instructions. This section > lists out some possible choices that we have, since the authors > do not have a "best" solution. > > ##### Choice 1: Write a `llvm.multidim.array.index.*` to `GEP` pass, with the `GEP` annotated with metadata > > This pass will flatten all `llvm.multidim.array.index.*` expressions into a `GEP` annotated with metadata. This metadata will indicate that the index expression computed by the lowered GEP is guaranteed to be in a canonical form which allows the analysis > to infer stride and index sizes. > > A multidim index of the form: > ``` > %arrayidx = llvm.multidim.array.index.* i64 i64* %A, %str_1, %idx_1, %str_2, %idx_2 > ``` > > is lowered to: > > ``` > %mul1 = mul nsw i64 %str_1, %idx_1 > %mul2 = mul1 nsw i64 %str_2, %idx_2 > %total = add nsw i64 %mul2, %mul1 > %arrayidx = getelementptr inbounds i64, i64* %A, i64 %total, !multidim !1 > ``` > with guarantees that the first term in each multiplication is the stride > and the second term in each multiplication is the index. (What happens > if intermediate transformation passes decide to change the order? This seems > complicated). > > **TODO:** Lookup how to attach metadata such that the metadata can communicate > which of the values are strides and which are indeces > > > > # Caveats > > Currently, we assume that the array shape is immutable. However, we will need to deal with > being able to express `reshape` like primitives where the array shape can be mutated. However, > this appears to make expressing this information quite difficult: We now need to attach the shape > information to an array per "shape-live-range". > > > ## Appendix-A > > ##### A realistic, more involved example of dependence analysis going wrong > > ```cpp > // In an array A of size (n0 x n1), > // fill a subarray of size (s0 x s1) > // which starts at an offset (o0 x o1) in the larger array A > void set_subarray(unsigned n0, unsigned n1, > unsigned o0, unsigned o1, > unsigned s0, unsigned s1, > float A[n0][n1]) { > for (unsigned i = 0; i < s0; i++) > for (unsigned j = 0; j < s1; j++) > S: A[i + o0][j + o1] = 1; > } > ``` > We first reduce this index expression to a sum of products: > > ``` > (i + o0) * n1 + (j + o1) = n1i + n1o0 + j + o1 > ix(i, j, n0, n1, o0, o1) = n1i + n1o0 + j + o1 > ``` > > `ix` is the index expression which `LLVM` will see, since it is fully > flattened, in comparison with the multi-dimensional index expression > `index:[i + o0][j + o1] | sizes:[][n1]`. > > We will now show _why_ this multi-dimensional index is not always correct, > and why guessing for one language will not work for another: > > Consider a call `set_subarray(n0=8, n1=9, o0=4, o1=6, s0=3, s1=6)`. At face > value, this is incorrect if we view the array as 2D grid, since the size > of the array is `(n0, n1) = (8, 9)`, but we are writing an array of size > `(s0, s1) = (3, 6)` starting from `(o0, o1) = (4, 6)`. Clearly, we will > exceed the width of the array, since `(s1 + o1 = 6 + 6 = 12) > (n1 = 9)`. > However, now think of the array as a flattened 1D representation. In this > case, the total size of the array is `n1xn2 = 8x9 = 72`, while the largest > element we will access is at the largest value of `(i, j)`. That is, > `i = s0 - 1 = 2`, and `j = s1 - 1 = 5`. > > The largest index will be `ix(i=2, j=5, n0=8, n1=9, o0=4, o1=6) = 8*2+8*4+5+6=59`. > Since `59 < 72`, we are clearly at _legal_ array indices, by C semantics! > > The definition of the semantics of the language **changed the illegal > multidimensional access** (which is illegal since it exceeds the `n1` > dimension), into a **legal flattened 1D access** (which is legal since the > flattened array indices are inbounds). > > LLVM has no way of expressing these two different semantics. Hence, we are > forced to: > 1. Consider flattened 1D accesses, which makes analysis of index expressions > equivalent to analysis of polynomials over the integers, which is hard. > 2. Guess multidimensional representations, and use them at the expense of > soundness bugs as shown above. > 3. Guess multidimensional representations, use them, and check their validity > at runtime, causing a runtime performance hit. This implementation follows > the description from the paper [Optimistic Delinearization of Parametrically Sized Arrays](optim-delin-parametrically-sized-arrays). > > > Currently, Polly opts for option (3), which is to emit runtime checks. If > the run-time checks fail, then Polly will not run its optimised code. Instead, > It keeps a copy of the unoptimised code around, which is run in this case. > Note that this effectively doubles the amount of performance-sensitive code > which is finally emitted after running Polly. > > Ideally, we would like a mechanism to directly express the multidimensional > semantics, which would eliminate this kind of guesswork from Polly/LLVM, > which would both make code faster, and easier to analyze. > > ## References > - [The chapel language specification](https://chapel-lang.org/docs/1.13/_downloads/chapelLanguageSpec.pdf) > - [Fortran 2003 standard](http://www.j3-fortran.org/doc/year/04/04-007.pdf}{Fortran <http://www.j3-fortran.org/doc/year/04/04-007.pdf%7D%7BFortran> 2003 standard) > - [C++ subscripting](http://eel.is/c++draft/expr.sub) > - [Michael Kruse's PhD thesis: Lattice QCD Optimization and Polytopic Representations of Distributed Memory](https://tel.archives-ouvertes.fr/tel-01078440). > - [Molly source code link for the intrinsic](https://github.com/Meinersbur/llvm/blob/molly/include/llvm/IR/IntrinsicsMolly.td#L3) > -[Optimistic Delinearization of Parametrically Sized Arrays](optim-delin-parametrically-sized-arrays) > > [optim-delin-parametrically-sized-arrays]:http://delivery.acm.org/10.1145/2760000/2751248/p351-grosser.pdf?ip=93.3.109.183&id=2751248&acc=CHORUS&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1557948443_2f56675c6d04796f27b84593535c9f70 > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > -- > -- > Peter > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190722/62045aac/attachment.html>
Michael Ferguson via llvm-dev
2019-Jul-22  18:18 UTC
[llvm-dev] [RFC] A new multidimensional array indexing intrinsic
> It seems that the main advantage of your proposal is that it would allow for non-constant strides (i.e. variable length arrays) in dimensions other than the first one. Do these appear frequently enough in the programs that you're interested in to be worth optimizing for?Yes - at least in Chapel (which is one of the motivating languages) these are very common. In other words, typical Chapel programs use arrays with a size that is not fixed at compile time. Thanks, -michael On Mon, Jul 22, 2019 at 1:02 PM Peter Collingbourne via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > The restrictions of this new intrinsic seem very similar to those of the inrange attribute on GEPs. It seems that the main advantage of your proposal is that it would allow for non-constant strides (i.e. variable length arrays) in dimensions other than the first one. Do these appear frequently enough in the programs that you're interested in to be worth optimizing for? > > Peter > > On Sun, Jul 21, 2019 at 3:54 PM Siddharth Bhat via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hello, >> >> We would like to begin discussions around a new set of intrinsics, to better express >> multi-dimensional array indexing within LLVM. The motivations and a possible design >> are sketched out below. >> >> Rendered RFC link here >> >> Raw markdown: >> >> # Introducing a new multidimensional array indexing intrinsic >> >> ## The request for change: A new intrinsic, `llvm.multidim.array.index.*`. >> >> We propose the addition of a new family of intrinsics, `llvm.multidim.array.index.*`. >> This will allow us to represent array indexing into an array `A[d1][d2][..][dk]` >> as `llvm.multidim.array.index.k A d1, d2, d3, ... dk` instead of flattening >> the information into `gep A, d1 * n1 + d2 * n2 + ... + dk * nk`. The former >> unflattened representation is advantageous for analysis and optimisation. It also >> allows us to represent array indexing semantics of languages such as Fortran >> and Chapel, which differs from that of C based array indexing. >> >> ## Motivating the need for a multi-dimensional array indexing intrinsic >> >> There are primarily one of two views of array indexing involving >> multiple dimensions most languages take, which we discuss to motivate >> the need for a multi-dimensional array index. This consideration impacts >> the kinds of analysis we can perform on the program. In Polly, we care about >> dependence analysis, so the examples here will focus on that particular problem. >> >> Let us consider an array indexing operation of the form: >> ```cpp >> int ex1(int n, int m, B[n][m], int x1, int x2, int y1, int y2) { >> __builtin_assume(x1 != x2); >> __builtin_assume(y1 != y2); >> B[x1][y1] = 1; >> printf("%d", B[x2][y2]); >> exit(0); >> } >> ``` >> >> One would like to infer that the array indices _interpreted as tuples_ >> `(x1, y1)` and `(x2, y2)` do not have the same value, due to the guarding asserts >> that `x1 != x2` and `y1 != y2`. As a result, the write `B[x1][y1] = 1` can >> in no way interfere with the value of `B[x2][y2]`. Consquently, >> we can optimise the program into: >> >> >> ```cpp >> int ex1_opt(int n, int m, B[n][m], int x1, int x2, int y1, int y2) { >> // B[x1][y1] = 1 is omitted because the result >> // cannot be used: >> // It is not used in the print and then the program exits >> printf("%d", B[x2][y2]); >> exit(0); >> } >> ``` >> >> However, alas, this is illegal, for the C language does not provide >> semantics that allow the final inference above. It is conceivable that >> `x1 != x2, y1 != y2`, but the indices do actually alias, since >> according to C semantics, the two indices alias if the _flattened >> representation of the indices alias_. Consider the parameter >> values: >> >> ``` >> n = m = 3 >> x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3 >> x2 = 0, y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3 >> ``` >> >> Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can alias_, and >> so the transformation proposed in `ex1_opt` is unsound in general. >> >> >> In contrast, many langagues other than C require that index >> expressions for multidimensional arrays have each component within the >> array dimension for that component. As a result, in the example above, >> the index pair `(0,3)` would be out-of-bounds. In languages with these >> semantics, one can infer that the indexing: >> >> `[x1][y1] != [x2][y2]` iff `x1 != x2 || y1 != y2`. >> >> While this particular example is not very interesting, it shows the >> spirit of the lack of expressiveness in LLVM we are trying to >> improve. >> >> Julia, Fortran, and Chapel are examples of such languages which target >> LLVM. >> >> Currently, the LLVM semantics of `getelementptr` talk purely of >> the C style flattened views of arrays. This inhibits the ability of the optimiser >> to understand the multidimensional examples as given above, and we >> are forced to make conservative assumptions, inhibiting optimisation. >> This information must ideally be expressible in LLVM, such that LLVM's optimisers and >> alias analysis can use this information to model multidimensional-array >> semantics. >> >> There is a more realistic (and involved) example in [Appendix A](#Appendix-A) >> in the same spirit as the above simple example, but one a compiler might >> realistically wish to perform. >> >> ## Evaluation of the impact of the intrinsic on accuracy of dependence analysis >> >> - This has been implemented in an exprimental branch of Polly, and was used >> on the COSMO climate weather model. This greatly helped increase the accuracy >> of Polly's analysis, since we eliminated the guessing game from the array analysis. >> >> - This has also been implemented as part of a GSoC effort to unify >> Chapel and Polly. >> >> - Molly, the distributed version of Polly written by Michael Kruse for his PhD >> also implemented a similar scheme. In his use-case, optimistic run-time checks >> with delinearization was not possible, so this kind of intrinsic was _necessary_ >> for the application, not just _good to have_. More details are available >> in his PhD thesis: [Lattice QCD Optimization and Polytopic Representations of Distributed Memory](https://tel.archives-ouvertes.fr/tel-01078440). >> In particular, Chapter 9 contains a detailed discussion. >> >> # Representations >> >> ## Intrinsic >> >> ### Syntax >> ``` >> <result> = llvm.multidim.array.index.* <ty> <ty>* <ptrval> {<stride>, <idx>}* >> ``` >> >> ### Overview: >> >> The `llvm.multidim.array.index.*` intrinsic is used to get the address of >> an element from an array. It performs address calcuation only and >> does not access memory. It is similar to `getelementptr`. However, it imposes >> additional semantics which allows the optimiser to provide better optimisations >> than `getlementptr`. >> >> >> >> ### Arguments: >> >> The first argument is always a type used as the basis for the calculations. The >> second argument is always a pointer, and is the base address to start the >> calculation from. The remaining arguments are a list of pairs. Each pair >> contains a dimension stride, and an offset with respect to that stride. >> >> >> ### Semantics: >> >> `llvm.multidim.array.index.*` represents a multi-dimensional array index, In particular, this will >> mean that we will assume that all indices `<idx_i>` are non-negative. >> >> Additionally, we assume that, for each `<str_i>`, `<idx_i>` pair, that >> `0 <= idx_i < str_i`. >> >> Optimizations can assume that, given two llvm.multidim.array.index.* instructions with matching types: >> >> ``` >> llvm.multidim.array.index.* <ty> <ty>* <ptrvalA> <strA_1>, <idxA_1>, ..., <strA_N>, <idxA_N> >> llvm.multidim.array.index.* <ty> <ty>* <ptrvalB> <strB_1>, <idxB_1>, ..., <strB_N>, <idxb_N> >> ``` >> >> If `ptrvalA == ptrvalB` and the strides are equal `(strA_1 == strB_1 && ... && strA_N == strB_N)` then: >> >> - If all the indices are equal (that is, `idxA_1 == idxB_1 && ... && idxA_N == idxB_N`), >> then the resulting pointer values are equal. >> >> - If any index value is not equal (that is, there exists an `i` such that `idxA_i != idxB_i`), >> then the resulting pointer values are not equal. >> >> >> ##### Address computation: >> Consider an invocation of `llvm.multidim.array.index.*`: >> >> ``` >> <result> = call @llvm.multidim.array.index.* <ty> <ty>* <ptrval> <str_0>, <idx_0>, <str_1> <idx_1>, ..., <str_n> <idx_n> >> ``` >> >> If the pairs are denoted by `(str_i, idx_i)`, where `str_i` denotes the stride >> and `idx_i` denotes the index of the ith pair, then the final address (in bytes) >> is computed as: >> >> ``` >> ptrval + len(ty) * [(str_0 * idx_0) + (str_1 * idx_1) + ... (str_n * idx_n)] >> ``` >> >> ## Transitioning to `llvm.multidim.array.index.*`: Allow `multidim_array_index` to refer to a GEP instruction: >> >> This is a sketch of how we might gradually introduce the `llvm.multidim.array.index.*` >> intrinsic into LLVM without immediately losing the analyses >> that are performed on `getelememtptr` instructions. This section >> lists out some possible choices that we have, since the authors >> do not have a "best" solution. >> >> ##### Choice 1: Write a `llvm.multidim.array.index.*` to `GEP` pass, with the `GEP` annotated with metadata >> >> This pass will flatten all `llvm.multidim.array.index.*` expressions into a `GEP` annotated with metadata. This metadata will indicate that the index expression computed by the lowered GEP is guaranteed to be in a canonical form which allows the analysis >> to infer stride and index sizes. >> >> A multidim index of the form: >> ``` >> %arrayidx = llvm.multidim.array.index.* i64 i64* %A, %str_1, %idx_1, %str_2, %idx_2 >> ``` >> >> is lowered to: >> >> ``` >> %mul1 = mul nsw i64 %str_1, %idx_1 >> %mul2 = mul1 nsw i64 %str_2, %idx_2 >> %total = add nsw i64 %mul2, %mul1 >> %arrayidx = getelementptr inbounds i64, i64* %A, i64 %total, !multidim !1 >> ``` >> with guarantees that the first term in each multiplication is the stride >> and the second term in each multiplication is the index. (What happens >> if intermediate transformation passes decide to change the order? This seems >> complicated). >> >> **TODO:** Lookup how to attach metadata such that the metadata can communicate >> which of the values are strides and which are indeces >> >> >> >> # Caveats >> >> Currently, we assume that the array shape is immutable. However, we will need to deal with >> being able to express `reshape` like primitives where the array shape can be mutated. However, >> this appears to make expressing this information quite difficult: We now need to attach the shape >> information to an array per "shape-live-range". >> >> >> ## Appendix-A >> >> ##### A realistic, more involved example of dependence analysis going wrong >> >> ```cpp >> // In an array A of size (n0 x n1), >> // fill a subarray of size (s0 x s1) >> // which starts at an offset (o0 x o1) in the larger array A >> void set_subarray(unsigned n0, unsigned n1, >> unsigned o0, unsigned o1, >> unsigned s0, unsigned s1, >> float A[n0][n1]) { >> for (unsigned i = 0; i < s0; i++) >> for (unsigned j = 0; j < s1; j++) >> S: A[i + o0][j + o1] = 1; >> } >> ``` >> We first reduce this index expression to a sum of products: >> >> ``` >> (i + o0) * n1 + (j + o1) = n1i + n1o0 + j + o1 >> ix(i, j, n0, n1, o0, o1) = n1i + n1o0 + j + o1 >> ``` >> >> `ix` is the index expression which `LLVM` will see, since it is fully >> flattened, in comparison with the multi-dimensional index expression >> `index:[i + o0][j + o1] | sizes:[][n1]`. >> >> We will now show _why_ this multi-dimensional index is not always correct, >> and why guessing for one language will not work for another: >> >> Consider a call `set_subarray(n0=8, n1=9, o0=4, o1=6, s0=3, s1=6)`. At face >> value, this is incorrect if we view the array as 2D grid, since the size >> of the array is `(n0, n1) = (8, 9)`, but we are writing an array of size >> `(s0, s1) = (3, 6)` starting from `(o0, o1) = (4, 6)`. Clearly, we will >> exceed the width of the array, since `(s1 + o1 = 6 + 6 = 12) > (n1 = 9)`. >> However, now think of the array as a flattened 1D representation. In this >> case, the total size of the array is `n1xn2 = 8x9 = 72`, while the largest >> element we will access is at the largest value of `(i, j)`. That is, >> `i = s0 - 1 = 2`, and `j = s1 - 1 = 5`. >> >> The largest index will be `ix(i=2, j=5, n0=8, n1=9, o0=4, o1=6) = 8*2+8*4+5+6=59`. >> Since `59 < 72`, we are clearly at _legal_ array indices, by C semantics! >> >> The definition of the semantics of the language **changed the illegal >> multidimensional access** (which is illegal since it exceeds the `n1` >> dimension), into a **legal flattened 1D access** (which is legal since the >> flattened array indices are inbounds). >> >> LLVM has no way of expressing these two different semantics. Hence, we are >> forced to: >> 1. Consider flattened 1D accesses, which makes analysis of index expressions >> equivalent to analysis of polynomials over the integers, which is hard. >> 2. Guess multidimensional representations, and use them at the expense of >> soundness bugs as shown above. >> 3. Guess multidimensional representations, use them, and check their validity >> at runtime, causing a runtime performance hit. This implementation follows >> the description from the paper [Optimistic Delinearization of Parametrically Sized Arrays](optim-delin-parametrically-sized-arrays). >> >> >> Currently, Polly opts for option (3), which is to emit runtime checks. If >> the run-time checks fail, then Polly will not run its optimised code. Instead, >> It keeps a copy of the unoptimised code around, which is run in this case. >> Note that this effectively doubles the amount of performance-sensitive code >> which is finally emitted after running Polly. >> >> Ideally, we would like a mechanism to directly express the multidimensional >> semantics, which would eliminate this kind of guesswork from Polly/LLVM, >> which would both make code faster, and easier to analyze. >> >> ## References >> - [The chapel language specification](https://chapel-lang.org/docs/1.13/_downloads/chapelLanguageSpec.pdf) >> - [Fortran 2003 standard](http://www.j3-fortran.org/doc/year/04/04-007.pdf}{Fortran 2003 standard) >> - [C++ subscripting](http://eel.is/c++draft/expr.sub) >> - [Michael Kruse's PhD thesis: Lattice QCD Optimization and Polytopic Representations of Distributed Memory](https://tel.archives-ouvertes.fr/tel-01078440). >> - [Molly source code link for the intrinsic](https://github.com/Meinersbur/llvm/blob/molly/include/llvm/IR/IntrinsicsMolly.td#L3) >> -[Optimistic Delinearization of Parametrically Sized Arrays](optim-delin-parametrically-sized-arrays) >> >> [optim-delin-parametrically-sized-arrays]: http://delivery.acm.org/10.1145/2760000/2751248/p351-grosser.pdf?ip=93.3.109.183&id=2751248&acc=CHORUS&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1557948443_2f56675c6d04796f27b84593535c9f70 >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > -- > -- > Peter > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Doerfert, Johannes via llvm-dev
2019-Aug-03  23:32 UTC
[llvm-dev] [RFC] A new multidimensional array indexing intrinsic
Have you looked into this? https://llvm.org/docs/LangRef.html#llvm-preserve-array-access-index-intrinsic On 07/22, Siddharth Bhat via llvm-dev wrote:> Hello, > > We would like to begin discussions around a new set of intrinsics, to > better express > multi-dimensional array indexing within LLVM. The motivations and a > possible design > are sketched out below. > > Rendered RFC link here > <https://github.com/bollu/llvm-multidim-array-indexing-proposal/blob/master/RFC.md> > > Raw markdown: > > # Introducing a new multidimensional array indexing intrinsic > > ## The request for change: A new intrinsic, `llvm.multidim.array.index.*`. > > We propose the addition of a new family of intrinsics, > `llvm.multidim.array.index.*`. > This will allow us to represent array indexing into an array `A[d1][d2][..][dk]` > as `llvm.multidim.array.index.k A d1, d2, d3, ... dk` instead of flattening > the information into `gep A, d1 * n1 + d2 * n2 + ... + dk * nk`. The former > unflattened representation is advantageous for analysis and > optimisation. It also > allows us to represent array indexing semantics of languages such as Fortran > and Chapel, which differs from that of C based array indexing. > > ## Motivating the need for a multi-dimensional array indexing intrinsic > > There are primarily one of two views of array indexing involving > multiple dimensions most languages take, which we discuss to motivate > the need for a multi-dimensional array index. This consideration impacts > the kinds of analysis we can perform on the program. In Polly, we care about > dependence analysis, so the examples here will focus on that particular problem. > > Let us consider an array indexing operation of the form: > ```cpp > int ex1(int n, int m, B[n][m], int x1, int x2, int y1, int y2) { > __builtin_assume(x1 != x2); > __builtin_assume(y1 != y2); > B[x1][y1] = 1; > printf("%d", B[x2][y2]); > exit(0); > } > ``` > > One would like to infer that the array indices _interpreted as tuples_ > `(x1, y1)` and `(x2, y2)` do not have the same value, due to the > guarding asserts > that `x1 != x2` and `y1 != y2`. As a result, the write `B[x1][y1] = 1` can > in no way interfere with the value of `B[x2][y2]`. Consquently, > we can optimise the program into: > > > ```cpp > int ex1_opt(int n, int m, B[n][m], int x1, int x2, int y1, int y2) { > // B[x1][y1] = 1 is omitted because the result > // cannot be used: > // It is not used in the print and then the program exits > printf("%d", B[x2][y2]); > exit(0); > } > ``` > > However, alas, this is illegal, for the C language does not provide > semantics that allow the final inference above. It is conceivable that > `x1 != x2, y1 != y2`, but the indices do actually alias, since > according to C semantics, the two indices alias if the _flattened > representation of the indices alias_. Consider the parameter > values: > > ``` > n = m = 3 > x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3 > x2 = 0, y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3 > ``` > > Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can alias_, and > so the transformation proposed in `ex1_opt` is unsound in general. > > > In contrast, many langagues other than C require that index > expressions for multidimensional arrays have each component within the > array dimension for that component. As a result, in the example above, > the index pair `(0,3)` would be out-of-bounds. In languages with these > semantics, one can infer that the indexing: > > `[x1][y1] != [x2][y2]` iff `x1 != x2 || y1 != y2`. > > While this particular example is not very interesting, it shows the > spirit of the lack of expressiveness in LLVM we are trying to > improve. > > Julia, Fortran, and Chapel are examples of such languages which target > LLVM. > > Currently, the LLVM semantics of `getelementptr` talk purely of > the C style flattened views of arrays. This inhibits the ability of > the optimiser > to understand the multidimensional examples as given above, and we > are forced to make conservative assumptions, inhibiting optimisation. > This information must ideally be expressible in LLVM, such that LLVM's > optimisers and > alias analysis can use this information to model multidimensional-array > semantics. > > There is a more realistic (and involved) example in [Appendix A](#Appendix-A) > in the same spirit as the above simple example, but one a compiler might > realistically wish to perform. > > ## Evaluation of the impact of the intrinsic on accuracy of dependence analysis > > - This has been implemented in an exprimental branch of Polly, and was used > on the COSMO climate weather model. This greatly helped increase the accuracy > of Polly's analysis, since we eliminated the guessing game from the > array analysis. > > - This has also been implemented as part of a GSoC effort to unify > Chapel and Polly. > > - Molly, the distributed version of Polly written by Michael Kruse for his PhD > also implemented a similar scheme. In his use-case, optimistic run-time checks > with delinearization was not possible, so this kind of intrinsic was > _necessary_ > for the application, not just _good to have_. More details are available > in his PhD thesis: [Lattice QCD Optimization and Polytopic > Representations of Distributed > Memory](https://tel.archives-ouvertes.fr/tel-01078440). > In particular, Chapter 9 contains a detailed discussion. > > # Representations > > ## Intrinsic > > ### Syntax > ``` > <result> = llvm.multidim.array.index.* <ty> <ty>* <ptrval> {<stride>, <idx>}* > ``` > > ### Overview: > > The `llvm.multidim.array.index.*` intrinsic is used to get the address of > an element from an array. It performs address calcuation only and > does not access memory. It is similar to `getelementptr`. However, it imposes > additional semantics which allows the optimiser to provide better optimisations > than `getlementptr`. > > > > ### Arguments: > > The first argument is always a type used as the basis for the calculations. The > second argument is always a pointer, and is the base address to start the > calculation from. The remaining arguments are a list of pairs. Each pair > contains a dimension stride, and an offset with respect to that stride. > > > ### Semantics: > > `llvm.multidim.array.index.*` represents a multi-dimensional array > index, In particular, this will > mean that we will assume that all indices `<idx_i>` are non-negative. > > Additionally, we assume that, for each `<str_i>`, `<idx_i>` pair, that > `0 <= idx_i < str_i`. > > Optimizations can assume that, given two llvm.multidim.array.index.* > instructions with matching types: > > ``` > llvm.multidim.array.index.* <ty> <ty>* <ptrvalA> <strA_1>, <idxA_1>, > ..., <strA_N>, <idxA_N> > llvm.multidim.array.index.* <ty> <ty>* <ptrvalB> <strB_1>, <idxB_1>, > ..., <strB_N>, <idxb_N> > ``` > > If `ptrvalA == ptrvalB` and the strides are equal `(strA_1 == strB_1 > && ... && strA_N == strB_N)` then: > > - If all the indices are equal (that is, `idxA_1 == idxB_1 && ... && > idxA_N == idxB_N`), > then the resulting pointer values are equal. > > - If any index value is not equal (that is, there exists an `i` such > that `idxA_i != idxB_i`), > then the resulting pointer values are not equal. > > > ##### Address computation: > Consider an invocation of `llvm.multidim.array.index.*`: > > ``` > <result> = call @llvm.multidim.array.index.* <ty> <ty>* <ptrval> > <str_0>, <idx_0>, <str_1> <idx_1>, ..., <str_n> <idx_n> > ``` > > If the pairs are denoted by `(str_i, idx_i)`, where `str_i` denotes the stride > and `idx_i` denotes the index of the ith pair, then the final address (in bytes) > is computed as: > > ``` > ptrval + len(ty) * [(str_0 * idx_0) + (str_1 * idx_1) + ... (str_n * idx_n)] > ``` > > ## Transitioning to `llvm.multidim.array.index.*`: Allow > `multidim_array_index` to refer to a GEP instruction: > > This is a sketch of how we might gradually introduce the > `llvm.multidim.array.index.*` > intrinsic into LLVM without immediately losing the analyses > that are performed on `getelememtptr` instructions. This section > lists out some possible choices that we have, since the authors > do not have a "best" solution. > > ##### Choice 1: Write a `llvm.multidim.array.index.*` to `GEP` pass, > with the `GEP` annotated with metadata > > This pass will flatten all `llvm.multidim.array.index.*` expressions > into a `GEP` annotated with metadata. This metadata will indicate that > the index expression computed by the lowered GEP is guaranteed to be > in a canonical form which allows the analysis > to infer stride and index sizes. > > A multidim index of the form: > ``` > %arrayidx = llvm.multidim.array.index.* i64 i64* %A, %str_1, %idx_1, > %str_2, %idx_2 > ``` > > is lowered to: > > ``` > %mul1 = mul nsw i64 %str_1, %idx_1 > %mul2 = mul1 nsw i64 %str_2, %idx_2 > %total = add nsw i64 %mul2, %mul1 > %arrayidx = getelementptr inbounds i64, i64* %A, i64 %total, !multidim !1 > ``` > with guarantees that the first term in each multiplication is the stride > and the second term in each multiplication is the index. (What happens > if intermediate transformation passes decide to change the order? This seems > complicated). > > **TODO:** Lookup how to attach metadata such that the metadata can communicate > which of the values are strides and which are indeces > > > > # Caveats > > Currently, we assume that the array shape is immutable. However, we > will need to deal with > being able to express `reshape` like primitives where the array shape > can be mutated. However, > this appears to make expressing this information quite difficult: We > now need to attach the shape > information to an array per "shape-live-range". > > > ## Appendix-A > > ##### A realistic, more involved example of dependence analysis going wrong > > ```cpp > // In an array A of size (n0 x n1), > // fill a subarray of size (s0 x s1) > // which starts at an offset (o0 x o1) in the larger array A > void set_subarray(unsigned n0, unsigned n1, > unsigned o0, unsigned o1, > unsigned s0, unsigned s1, > float A[n0][n1]) { > for (unsigned i = 0; i < s0; i++) > for (unsigned j = 0; j < s1; j++) > S: A[i + o0][j + o1] = 1; > } > ``` > We first reduce this index expression to a sum of products: > > ``` > (i + o0) * n1 + (j + o1) = n1i + n1o0 + j + o1 > ix(i, j, n0, n1, o0, o1) = n1i + n1o0 + j + o1 > ``` > > `ix` is the index expression which `LLVM` will see, since it is fully > flattened, in comparison with the multi-dimensional index expression > `index:[i + o0][j + o1] | sizes:[][n1]`. > > We will now show _why_ this multi-dimensional index is not always correct, > and why guessing for one language will not work for another: > > Consider a call `set_subarray(n0=8, n1=9, o0=4, o1=6, s0=3, s1=6)`. At face > value, this is incorrect if we view the array as 2D grid, since the size > of the array is `(n0, n1) = (8, 9)`, but we are writing an array of size > `(s0, s1) = (3, 6)` starting from `(o0, o1) = (4, 6)`. Clearly, we will > exceed the width of the array, since `(s1 + o1 = 6 + 6 = 12) > (n1 = 9)`. > However, now think of the array as a flattened 1D representation. In this > case, the total size of the array is `n1xn2 = 8x9 = 72`, while the largest > element we will access is at the largest value of `(i, j)`. That is, > `i = s0 - 1 = 2`, and `j = s1 - 1 = 5`. > > The largest index will be `ix(i=2, j=5, n0=8, n1=9, o0=4, o1=6) > 8*2+8*4+5+6=59`. > Since `59 < 72`, we are clearly at _legal_ array indices, by C semantics! > > The definition of the semantics of the language **changed the illegal > multidimensional access** (which is illegal since it exceeds the `n1` > dimension), into a **legal flattened 1D access** (which is legal since the > flattened array indices are inbounds). > > LLVM has no way of expressing these two different semantics. Hence, we are > forced to: > 1. Consider flattened 1D accesses, which makes analysis of index expressions > equivalent to analysis of polynomials over the integers, which is hard. > 2. Guess multidimensional representations, and use them at the expense of > soundness bugs as shown above. > 3. Guess multidimensional representations, use them, and check their validity > at runtime, causing a runtime performance hit. This implementation follows > the description from the paper [Optimistic Delinearization of > Parametrically Sized Arrays](optim-delin-parametrically-sized-arrays). > > > Currently, Polly opts for option (3), which is to emit runtime checks. If > the run-time checks fail, then Polly will not run its optimised code. Instead, > It keeps a copy of the unoptimised code around, which is run in this case. > Note that this effectively doubles the amount of performance-sensitive code > which is finally emitted after running Polly. > > Ideally, we would like a mechanism to directly express the multidimensional > semantics, which would eliminate this kind of guesswork from Polly/LLVM, > which would both make code faster, and easier to analyze. > > ## References > - [The chapel language > specification](https://chapel-lang.org/docs/1.13/_downloads/chapelLanguageSpec.pdf) > - [Fortran 2003 > standard](http://www.j3-fortran.org/doc/year/04/04-007.pdf}{Fortran > 2003 standard) > - [C++ subscripting](http://eel.is/c++draft/expr.sub) > - [Michael Kruse's PhD thesis: Lattice QCD Optimization and Polytopic > Representations of Distributed > Memory](https://tel.archives-ouvertes.fr/tel-01078440). > - [Molly source code link for the > intrinsic](https://github.com/Meinersbur/llvm/blob/molly/include/llvm/IR/IntrinsicsMolly.td#L3) > -[Optimistic Delinearization of Parametrically Sized > Arrays](optim-delin-parametrically-sized-arrays) > > [optim-delin-parametrically-sized-arrays]: > http://delivery.acm.org/10.1145/2760000/2751248/p351-grosser.pdf?ip=93.3.109.183&id=2751248&acc=CHORUS&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1557948443_2f56675c6d04796f27b84593535c9f70> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190803/a916ac5e/attachment.sig>
Apparently Analagous Threads
- [RFC] A new multidimensional array indexing intrinsic
- [RFC] A new multidimensional array indexing intrinsic
- [RFC] A new multidimensional array indexing intrinsic
- [RFC] A new multidimensional array indexing intrinsic
- [RFC] A new multidimensional array indexing intrinsic