Tobias Grosser
2012-Sep-12 08:38 UTC
[LLVMdev] Multi-dimensional array accesses in LLVM-IR | Thoughts
On 09/11/2012 03:38 PM, Hal Finkel wrote:> This is a general problem that also affects, for example, Preston's > dependence-analysis work. How have you thought about dealing with this?Hi Hal, you are right. Preston's dependence-analysis and Polly are sharing the same problem. As far as I know, non of us implemented a solution yet. Preston, what is your status on multi-dim arrays? As stated earlier the problem is that access functions to multi-dimensional arrays are linearized at LLVM-IR level. Unfortunately it is a lot harder to reason about linearized accesses. Hence, we want to recover the multi-dimensional access function. In LLVM-IR words: When we currently analyze a multi-dimensional variable length memory access, we calculate a single SCEV expression which describes the memory access. However, we would prefer to have for each array dimension a separate SCEV* expression. Those individual expressions will be a lot easier to analyze. I think there are two major directions we can go: 1) Represent multi-dimensional access functions in LLVM-IR Instead of linearizing multi-dimensional array accesses to single dimensional array accesses, we extend LLVM-IR such that it can express multi-dimensional accesses and we ensure that front-ends directly generate LLVM-IR multi-dimensional accesses. 2) Recover multi-dimensionality from linearized accesses We take a linearized single-dimensional SCEV* expression, apply some magic and recover both the number and size of the original array dimensions as well as a set of SCEV* expressions describing the individual access functions of the original dimensions. Both approaches have advantages/disadvantages: Approach 1) Pro: - Provides exact information about multi-dimensional arrays, if the multi-dimensionality is explicit in the source program Contra: - We need to explicitly express multi-dimensionality in LLVM-IR - Front-ends need to explicitly generate the multi-dimensional IR - Existing passes need to maintain and reason about the new IR - Only C99 supports variable length arrays. Most existing C/C++ programs have a manual implementation of variable length arrays, which cannot be automatically translated by the front-end. Approach 2) Pro: - Does not distinguish between manually simulated multi-dimensional arrays and multi-dimensional arrays that have been explicitly expressed in the source code. - Changes are limited to an LLVM-IR analysis Contra: - The general problem is difficult - We may need to add some run-time checks as statically proving the delinearization may not be entirely possible I personally would first have a look at approach '2'. Access functions of multi-dimensional follow often a very regular structure, which appears not only for accesses generated by compiler front-ends, but also for manual implementations of multi-dimensional arrays. One common pattern is that there is a single parameter that represents the size of a dimension [1]. Meaning SCEV* expressions describing access to an array "A[][m][p]" will contain the sizes '%p' and '%m * %p' more or less explicit. Here some examples I committed in r163619: multidim_only_ivs_2d.ll: ; void foo(long n, long m, double A[n][m]) { ; for (long i = 0; i < n; i++) ; for (long j = 0; j < m; j++) ; A[i][j] = 1.0; ; } ; Access function: {{0,+,%m}<%for.i>,+,1}<nw><%for.j> multidim_only_ivs_3d.ll: ; void foo(long n, long m, long o, double A[n][m][o]) { ; for (long i = 0; i < n; i++) ; for (long j = 0; j < m; j++) ; for (long k = 0; k < o; k++) ; A[i][j][k] = 1.0; ; } ; Access function: {{{0,+,(%m * %o)}<%for.i>,+,%o}<%for.j>,+,1}<nw><%for.k> multidim_ivs_and_integer_offsets_3d.ll: ; void foo(long n, long m, long o, double A[n][m][o]) { ; for (long i = 0; i < n; i++) ; for (long j = 0; j < m; j++) ; for (long k = 0; k < o; k++) ; A[i+3][j-4][k+7] = 1.0; ; } ; ; Access function: ; {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * %o)}<%for.i>,+, ; (8 * %o)}<%for.j>,+,8}<%for.k> multidim_only_ivs_3d_cast.ll: ; void foo(int n, int m, int o, double A[n][m][o]) { ; for (int i = 0; i < n; i++) ; for (int j = 0; j < m; j++) ; for (int k = 0; k < o; k++) ; A[i][j][k] = 1.0; ; } ; ; Access function: ; {{{%A,+,(8 * (zext i32 %m to i64) * (zext i32 %o to i64))}<%for.i>,+, ; (8 * (zext i32 %o to i64))}<%for.j>,+,8}<%for.k> multidim_ivs_and_parameteric_offsets_3d.ll: ; void foo(long n, long m, long o, double A[n][m][o], long p, long q, long r) { ; ; for (long i = 0; i < n; i++) ; for (long j = 0; j < m; j++) ; for (long k = 0; k < o; k++) ; A[i+p][j+q][k+r] = 1.0; ; } ; ; Access function: ; {{{((8 * ((((%m * %p) + %q) * %o) + %r)) + %A),+,(8 * %m * %o)}<%for.i>,+, ; (8 * %o)}<%for.j>,+,8}<%for.k> Those examples are obviously not exhaustive, but they already cover many common cases. Analyzing them seems to be possible. As you may realize the size of the array dimensions are all explicitly available in the 'step' of the SCEVAddRec repressions. Even in the last example which contains parameters both for the sizes and the offsets, the parameters for the sizes are the only ones that appear on the right hand sides ('8 * %m * %o' and '8 * %o'). I would guess that we could write a SCEVIterator, that analyzes the SCEVs and guesses the dimension sizes. After having written that, it should be possible to write another SCEVIterator that given the dimension sizes can separate the different dimensions of a SCEV. I doubt we will always be able to prove that our guess is correct, but we could add a run-time check to test the conditions that we can not prove statically. This is also the point, where I think the front-end (or the user with attributes) could help. Accesses could be annotated with meta-data providing the size of the array dimensions, such that our analysis can start from this meta-data. This could allow our analysis to take some short cuts and to avoid some run-time checks. Does this sound like a reasonable approach? Any ideas or suggestions on that topic? Cheers Tobi [1] FORTRAN seems to express array size as "max(0, %dim_size)"
Armin Größlinger
2012-Sep-12 12:18 UTC
[LLVMdev] Multi-dimensional array accesses in LLVM-IR | Thoughts
Hi, I had a few thoughts about recovering multi-dimensional accesses in Polly and maybe it's time to share them. On 09/12/2012 10:38 AM, Tobias Grosser wrote:> [...] Even in the last example which contains parameters both for the > sizes and the offsets, the parameters for the sizes are the only ones that > appear on the right hand sides ('8 * %m * %o' and '8 * %o').I guess this is the key observation. When we have a multi-dimensional access, the coefficients of the iterators have to form an increasing chain w.r.t. divisibility. For the example> multidim_ivs_and_integer_offsets_3d.ll: > ; void foo(long n, long m, long o, double A[n][m][o]) { > ; for (long i = 0; i < n; i++) > ; for (long j = 0; j < m; j++) > ; for (long k = 0; k < o; k++) > ; A[i+3][j-4][k+7] = 1.0; > ; }with access function> ; {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * %o)}<%for.i>,+, > ; (8 * %o)}<%for.j>,+,8}<%for.k>we have (writing the SCEV in ordinary notation and in a suggestive order) 8*%o*%m * i + 8*%o * j + 8 * k + 24*%m*%o - 32*%o + %A + 56 and we see that the coefficients of the iterators are 8, 8*%o, 8*%o*%m and every coefficient in this chain divides it successor. By factoring these coefficients out from all the term they divide, we find the subscripts for the dimensions: 8*%o*%m * (i+3) + 8*%o * (j-4) + 8 * (k+7) + %A So A[i+4][j-4][k-7] is our candidate for a multi-dimensional access. The math tool we'd need to implement this is (multivariate) polynomial division or something similar (coefficients could be arbitrary polynomials, e.g., %o+5*%m+7 if somebody declares something like double A[][o+5*m+7] or even more complex expressions). Note that two iterators can have the same coefficients (e.g., for A[i+j][.][.]) or one iterator can have two coefficients (e.g., A[k][k][.]). Or a "coefficient" may actually be a constant (e.g., A[42][.][.]) or missing in one access (e.g., A[0][.][.]) so determining the coefficients is non-trivial (but not too difficult I guess).> I would guess that we could write a SCEVIterator, that analyzes the SCEVs and guesses > the dimension sizes.I'm not sure if this cane be done on the SCEV directly (instead of polynomials) for cases that are more complex than products of parameters but I haven't thought about it thoroughly. OTOH, add recursions should be of the right structure to check divisibility directly; I'm not sure without more thinking...> After having written that, it should be possible to write > another SCEVIterator that given the dimension sizes can separate the different > dimensions of a SCEV.Here we have to be careful because when there are several accesses to the same array (base pointer) not only the coefficients (8, 8*%o, 8*%o*%m above) have to match but also the range of the subscripts must be right (globally over all arrays). For example with two accesses A[i+4][j-4][k-7] A[...][i-k][3*j] we'd need to check that max{j-4, i-k} - min{j-4, i-k} <= %m - 1 max{k-7, 3*j} - min{k-7, 3*j} <= %o - 1 so there is no (inconsistent) overrun between the dimensions (%m and %o are the factors "between" the dimensions). Note that this also ensures that the factors between the dimensions are positive, i.e., no dimension is collapsed. This check can, of course, be relaxed to checking 0 <= j-4 <= %m - 1 0 <= k-7 <= %o - 1 0 <= i-k <= %m - 1 0 <= 3*j <= %o - 1 (allowing for independent checks for each access) in the assumption that subscripts start from 0 but I'm not sure how often this will hold on the IR (after loop index and array base pointer shifts during optimization). The problem here is that the factors between the dimensions (%m and %o here) could be non-affine.> I doubt we will always be able to prove that our guess is correct, but we could > add a run-time check to test the conditions that we can not prove statically. This is > also the point, where I think the front-end (or the user with attributes) could help. > Accesses could be annotated with meta-data providing the size of the array dimensions, > such that our analysis can start from this meta-data. This could allow our analysis > to take some short cuts and to avoid some run-time checks.Makes sense. I think that it should not be too difficult to implement the above approach for affine factors between the dimensions (which should cover the common cases) and then we should evaluate how often we need run-time checks and whether we need more information from the front-ends because other cases occur in practice. So far my thoughts for now. Regards, Armin
Hal Finkel
2012-Sep-12 14:27 UTC
[LLVMdev] Multi-dimensional array accesses in LLVM-IR | Thoughts
On Wed, 12 Sep 2012 14:18:43 +0200 Armin Größlinger <armin.groesslinger at uni-passau.de> wrote:> Hi, > > I had a few thoughts about recovering multi-dimensional accesses in > Polly and maybe it's time to share them. > > On 09/12/2012 10:38 AM, Tobias Grosser wrote: > > [...] Even in the last example which contains parameters both for > > the sizes and the offsets, the parameters for the sizes are the > > only ones that appear on the right hand sides ('8 * %m * %o' and '8 > > * %o'). > > I guess this is the key observation. When we have a multi-dimensional > access, the coefficients of the iterators have to form an increasing > chain w.r.t. divisibility. For the example > > > multidim_ivs_and_integer_offsets_3d.ll: > > ; void foo(long n, long m, long o, double A[n][m][o]) { > > ; for (long i = 0; i < n; i++) > > ; for (long j = 0; j < m; j++) > > ; for (long k = 0; k < o; k++) > > ; A[i+3][j-4][k+7] = 1.0; > > ; } > > with access function > > > ; {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * > > %o)}<%for.i>,+, ; (8 * %o)}<%for.j>,+,8}<%for.k> > > we have (writing the SCEV in ordinary notation and in a suggestive > order) > > 8*%o*%m * i + 8*%o * j + 8 * k + 24*%m*%o - 32*%o + %A + 56 > > and we see that the coefficients of the iterators are 8, 8*%o, > 8*%o*%m and every coefficient in this chain divides it successor. By > factoring these coefficients out from all the term they divide, we > find the subscripts for the dimensions: > > 8*%o*%m * (i+3) + 8*%o * (j-4) + 8 * (k+7) + %A > > So A[i+4][j-4][k-7] is our candidate for a multi-dimensional access. > The math tool we'd need to implement this is (multivariate) > polynomial division or something similarAgreed. Furthermore, I already have code that does this (I've started working on a transformation pass which re-factors and optimizes integer polynomial expressions, so I can repurpose code from there). It sounds like we just need to extract integer polynomials used by GEPs containing induction variables, and, order-by-order, factor out the non-induction variables. That will give us the 'size' of each dimension. As you've noted below, we need to make sure that no index exceeds its size, and that all accesses to an array within a given loop are consistent. Does that sound reasonable? -Hal> (coefficients could be > arbitrary polynomials, e.g., %o+5*%m+7 if somebody declares something > like double A[][o+5*m+7] or even more complex expressions). > > Note that two iterators can have the same coefficients (e.g., for > A[i+j][.][.]) or one iterator can have two coefficients (e.g., > A[k][k][.]). Or a "coefficient" may actually be a constant (e.g., > A[42][.][.]) or missing in one access (e.g., A[0][.][.]) so > determining the coefficients is non-trivial (but not too difficult I > guess). > > > I would guess that we could write a SCEVIterator, that analyzes the > > SCEVs and guesses the dimension sizes. > > I'm not sure if this cane be done on the SCEV directly (instead of > polynomials) for cases that are more complex than products of > parameters but I haven't thought about it thoroughly. OTOH, add > recursions should be of the right structure to check divisibility > directly; I'm not sure without more thinking... > > > After having written that, it should be possible to write > > another SCEVIterator that given the dimension sizes can separate > > the different dimensions of a SCEV. > > Here we have to be careful because when there are several accesses to > the same array (base pointer) not only the coefficients (8, 8*%o, > 8*%o*%m above) have to match but also the range of the subscripts > must be right (globally over all arrays). For example with two > accesses > > A[i+4][j-4][k-7] > A[...][i-k][3*j] > > we'd need to check that > > max{j-4, i-k} - min{j-4, i-k} <= %m - 1 > max{k-7, 3*j} - min{k-7, 3*j} <= %o - 1 > > so there is no (inconsistent) overrun between the dimensions (%m and > %o are the factors "between" the dimensions). Note that this also > ensures that the factors between the dimensions are positive, i.e., > no dimension is collapsed. > > This check can, of course, be relaxed to checking > > 0 <= j-4 <= %m - 1 > 0 <= k-7 <= %o - 1 > > 0 <= i-k <= %m - 1 > 0 <= 3*j <= %o - 1 > > (allowing for independent checks for each access) in the assumption > that subscripts start from 0 but I'm not sure how often this will > hold on the IR (after loop index and array base pointer shifts during > optimization). > > The problem here is that the factors between the dimensions (%m and > %o here) could be non-affine. > > > I doubt we will always be able to prove that our guess is correct, > > but we could add a run-time check to test the conditions that we > > can not prove statically. This is also the point, where I think the > > front-end (or the user with attributes) could help. Accesses could > > be annotated with meta-data providing the size of the array > > dimensions, such that our analysis can start from this meta-data. > > This could allow our analysis to take some short cuts and to avoid > > some run-time checks. > > Makes sense. I think that it should not be too difficult to implement > the above approach for affine factors between the dimensions (which > should cover the common cases) and then we should evaluate how often > we need run-time checks and whether we need more information from the > front-ends because other cases occur in practice. > > So far my thoughts for now. > > Regards, > Armin-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Renato Golin
2012-Sep-12 14:56 UTC
[LLVMdev] Multi-dimensional array accesses in LLVM-IR | Thoughts
On 12 September 2012 09:38, Tobias Grosser <tobias at grosser.es> wrote:> I personally would first have a look at approach '2'.While I normally argue to leave the IR as it is (since it's a compiler IR, not a magical one), I can see some trends going on that should not be ignored. This is one example, where the front-end bends its knees to generate IR that LLVM understands, so that you can revert it to what should have been in the first place (to analyse parallelism) and convert it back again to "correct" IR. As you mentioned, there will be cases that the analysis won't work, ex. when receiving an array from a function, it could look like an opaque pointer in some architectures. Last month, in the Cambridge LLVM Social, David Chisnall asked me about a builder that would validate procedure call standards depending on the target, so that the front-end could use that to build the horrible messes we end up doing in that scenario. Another idea is to create a pass that will convert from high-level function declaration to low-level target-dependent declaration during the validation phase, so the front-end produces "good-looking" calls (with types as-is) and the pass makes them target-valid. This technique could also apply to your case. If the front-end supports multi-dimensional access and produces code conforming to that, your pass can analyse and vectorize. If not, no worries, you just ignore them. Which means not all front-ends must change at the same time to accommodate your analysis. Later on, a validation pass would transform all multi- to single-dimensional array access and produce IR that the back-ends can consume. It might not be the easiest route, but would avoid some dead-ends down the road... -- cheers, --renato http://systemcall.org/
David Tweed
2012-Sep-12 16:56 UTC
[LLVMdev] Multi-dimensional array accesses in LLVM-IR | Thoughts
| Here we have to be careful because when there are several accesses to the same | array (base pointer) not only the coefficients (8, 8*%o, 8*%o*%m above) have to | match but also the range of the subscripts must be right (globally over all | arrays). For example with two accesses | | A[i+4][j-4][k-7] | A[...][i-k][3*j] | | we'd need to check that | | max{j-4, i-k} - min{j-4, i-k} <= %m - 1 | max{k-7, 3*j} - min{k-7, 3*j} <= %o - 1 An interjection from the peanut gallery... One thing that would be nice would be a way to indicate in LLVM instructions that _by construction_ a given variable will be a valid index for a given dimension of a given array. This isn't so much use in C-style languages where arrays don't have size data canonically associated with them, but for languages with constructs like (in the simplest case) for i=0:dimension(A,0): acc+=A[i] or other more complicated "index generator" expressions like generating indices from stencils. At the moment (as I understand the LLVM IR) a language can only specify that an entire linearised access expression is within the linearised bounds. This would provide a way to hopefully boost the "safe to apply" rate of LLVM optimisers like polly for languages that are amenable to such constructs. This presumably requires a way to link array pointers, dimensions, bounds and indexes so it's not obvious how to do it within the LLVM context of course... (There's also the issue of exposing transitivity: the source language may know i is a by construction index into A, and B is the same shape as A hence i is also guaranteed to be within-bounds index into B.) Just a vague thought, Regards, Dave
Hal Finkel
2012-Sep-12 22:27 UTC
[LLVMdev] Multi-dimensional array accesses in LLVM-IR | Thoughts
On Wed, 12 Sep 2012 15:56:25 +0100 Renato Golin <rengolin at systemcall.org> wrote:> On 12 September 2012 09:38, Tobias Grosser <tobias at grosser.es> wrote: > > I personally would first have a look at approach '2'. > > While I normally argue to leave the IR as it is (since it's a compiler > IR, not a magical one), I can see some trends going on that should not > be ignored. > > This is one example, where the front-end bends its knees to generate > IR that LLVM understands, so that you can revert it to what should > have been in the first place (to analyse parallelism) and convert it > back again to "correct" IR. As you mentioned, there will be cases that > the analysis won't work, ex. when receiving an array from a function, > it could look like an opaque pointer in some architectures.I agree that this seems suboptimal: information known to the frontend is lost, and then must be guessed by the backend. This might also be a case where metadata is helpful. I also think that it is important to keep in mind that this particular case is one in which guessing is important. This is because there is a lot of existing C/C++ code which does manual indexing of multidimensional arrays, and we should try to support iteration-space transformations involving those arrays as well.> > Last month, in the Cambridge LLVM Social, David Chisnall asked me > about a builder that would validate procedure call standards depending > on the target, so that the front-end could use that to build the > horrible messes we end up doing in that scenario. Another idea is to > create a pass that will convert from high-level function declaration > to low-level target-dependent declaration during the validation phase, > so the front-end produces "good-looking" calls (with types as-is) and > the pass makes them target-valid. > > This technique could also apply to your case. If the front-end > supports multi-dimensional access and produces code conforming to > that,Do you mean that there is some kind of canonical form that all of the frontends will use, and that LLVM provides some kind of utility for generating this canonical form? -Hal> your pass can analyse and vectorize. If not, no worries, you > just ignore them. Which means not all front-ends must change at the > same time to accommodate your analysis. > > Later on, a validation pass would transform all multi- to > single-dimensional array access and produce IR that the back-ends can > consume. > > It might not be the easiest route, but would avoid some dead-ends down > the road... > >-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Possibly Parallel Threads
- [LLVMdev] Multi-dimensional array accesses in LLVM-IR | Thoughts
- [LLVMdev] Multi-dimensional array accesses in LLVM-IR | Thoughts
- Making 2 dimensional vector from the 3 dimensional one
- two-dimensional integration?
- [LLVMdev] Multi-dimensional array accesses in LLVM-IR | Thoughts