Graham Hunter via llvm-dev
2016-Nov-22 14:49 UTC
[llvm-dev] [RFC] Supporting ARM's SVE in LLVM
Hi Renato, Sorry for the delay in responding. We've been busy rethinking some of our changes after the feedback we've received thus far (particularly from the devmeeting). The incremental patches will use our revised design(which should be less invasive), and I'll be updating our document to match. On 16/11/2016, 12:46, "Renato Golin" <renato.golin at linaro.org> wrote:> This email is long and hard to read. I'm not surprised no one replied > yet. I think your PDF attached is a good start away from the > complexity, but we're not going to get far if we try to do things in > one step.> Based on your repository, the number of changes is so great, and the > changes so invasive, that we really should look back at what we need > to do, one step at a time, and only perform the refactoring changes > that are needed for each step.We don't intend to do this all in one go; we fully expect that we'll need to refactor a few times based on community feedback as we incrementally add support for scalable vectors.> > * This is a warts-and-all release of our development tree, with plenty of TODOs and unfinished experiments > > * We haven't posted our clang changes yet > > I don't mind FIXMEs or TODOs, but I did see a lot of spurious name > changes, enum value moves (breaking old binaries) and a lot of new > high-level passes (LoopVectorisationAnalysis) which will need a long > review on their own before we even start thinking about SVE. > > I recommend you guys separate the refactoring from the implementation > and try to upstream the initial and uncontroversial refactorings (name > changes, etc), as well as move out the current functionality into new > passes, so then you can extend for SVE as a refactoring, not > move-and-extend in the same pass.So our highest priority is getting basic support for SVE into the codebase (types, codegen, assembler/disassembler, simple vectorization); after that is in, we'll be happy to discuss our other changes like separating out loop vectorization legality, controlling loops via predication, or adding search loop vectorization.> We want to minimise the number of changes, so that we can revert > breakages more easily, and have a steady progress, rather than a > break-the-world situation.Same for us. The individual patches will be relatively small, this repo was just for context if needed when discussing the smaller patches.> Finally, *every* test change needs to be scrutinised and guaranteed to > make sense. We really dislike spurious test changes, unless we can > prove that the test was unstable to being with, in which case we > change it to a better test.Yep, makes sense. Thanks, -Graham
Graham Hunter via llvm-dev
2016-Nov-24 15:39 UTC
[llvm-dev] [RFC] Supporting ARM's SVE in LLVM
Hi, Paul Walker has now uploaded the first set of IR support patches to phabricator, which use our revised design. We managed to remove the need for new instructions for basic scalable vectorization in favor of adding two new constant classes; here's a subset of the revised documentation describing just those constants: ## *vscale* ### Syntax:> `vscale`### Overview: This complex constant represents the runtime value of `n` for any scalable type `<n x m x ty>`. This is primarily used to increment induction variables and generate offsets. ### Interface: ```cpp Constant *VScaleValue::get(Type *Ty); ``` ### Example: The following shows how an induction variable would be incremented for a scalable vector of type `<n x 4 x i32>`. ```llvm %index.next = add nuw nsw i64 %index, mul (i64 vscale, i64 4) ``` ## *stepvector* ### Syntax:> `stepvector`### Overview: This complex constant represents the runtime value of a vector of increasing integers in the arithmetic series:> `<0, 1, 2, ... num_elements-1>`This is the basis for a scalable form of vector constants. Adding a splat changes the effective starting point, and multiplying changes the step. The main uses for this are: * Predicate creation using vector compares for fully predicated loops (see also: [*propff*](#propff), [*test*](#test)). * Creating offset vectors for gather/scatter via `getelementptr`. * Creating masks for `shufflevector`. For the following loop, a `stepvector` constant would be added to a splat of the loop induction variable to create the data vector to store: ```cpp unsigned a[LIMIT]; for (unsigned i = 0; i < LIMIT; i++) { a[i] = i; } ``` ### Interface: ```cpp Constant *StepVectorValue::get(Type *Ty); ``` ### Example: The following shows the construction of a scalable vector of the form <start, start-2, start-4, ...>: ```llvm %elt = insertelement <n x 4 x i32> undef, i32 %start, i32 0 %widestart = shufflevector <n x 4 x i32> %elt, <n x 4 x i32> undef, <n x 4 x i32> zeroinitializer %step = insertelement <n x 4 x i32> undef, i32 -2, i32 0 %widestep = shufflevector <n x 4 x i32> %step, <n x 4 x i32> undef, <n x 4 x i32> zeroinitializer %stridevec = mul <n x 4 x i32> stepvector, %widestep %finalvec = add <n x 4 x i32> %widestart, %stridevec ``` Current patch set: https://reviews.llvm.org/D27101 https://reviews.llvm.org/D27102 https://reviews.llvm.org/D27103 https://reviews.llvm.org/D27105 -Graham On 22/11/2016, 14:49, "Graham Hunter via llvm-dev" <llvm-dev at lists.llvm.org> wrote: Hi Renato, Sorry for the delay in responding. We've been busy rethinking some of our changes after the feedback we've received thus far (particularly from the devmeeting). The incremental patches will use our revised design(which should be less invasive), and I'll be updating our document to match. On 16/11/2016, 12:46, "Renato Golin" <renato.golin at linaro.org> wrote: > This email is long and hard to read. I'm not surprised no one replied > yet. I think your PDF attached is a good start away from the > complexity, but we're not going to get far if we try to do things in > one step. > Based on your repository, the number of changes is so great, and the > changes so invasive, that we really should look back at what we need > to do, one step at a time, and only perform the refactoring changes > that are needed for each step. We don't intend to do this all in one go; we fully expect that we'll need to refactor a few times based on community feedback as we incrementally add support for scalable vectors. > > * This is a warts-and-all release of our development tree, with plenty of TODOs and unfinished experiments > > * We haven't posted our clang changes yet > > I don't mind FIXMEs or TODOs, but I did see a lot of spurious name > changes, enum value moves (breaking old binaries) and a lot of new > high-level passes (LoopVectorisationAnalysis) which will need a long > review on their own before we even start thinking about SVE. > > I recommend you guys separate the refactoring from the implementation > and try to upstream the initial and uncontroversial refactorings (name > changes, etc), as well as move out the current functionality into new > passes, so then you can extend for SVE as a refactoring, not > move-and-extend in the same pass. So our highest priority is getting basic support for SVE into the codebase (types, codegen, assembler/disassembler, simple vectorization); after that is in, we'll be happy to discuss our other changes like separating out loop vectorization legality, controlling loops via predication, or adding search loop vectorization. > We want to minimise the number of changes, so that we can revert > breakages more easily, and have a steady progress, rather than a > break-the-world situation. Same for us. The individual patches will be relatively small, this repo was just for context if needed when discussing the smaller patches. > Finally, *every* test change needs to be scrutinised and guaranteed to > make sense. We really dislike spurious test changes, unless we can > prove that the test was unstable to being with, in which case we > change it to a better test. Yep, makes sense. Thanks, -Graham _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
James Molloy via llvm-dev
2016-Nov-24 20:49 UTC
[llvm-dev] [RFC] Supporting ARM's SVE in LLVM
Hi Graham, One high level comment without reading the patchset too much - it seems 'vscale' in particular could be just as easy to implement as an intrinsic, which would be a less invasive patch. Is there a reason you didn't go down the intrinsic route? James On Thu, 24 Nov 2016 at 15:39, Graham Hunter via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi, > > Paul Walker has now uploaded the first set of IR support patches to > phabricator, which use our revised design. We managed to remove the need > for new instructions for basic scalable vectorization in favor of adding > two new constant classes; here's a subset of the revised documentation > describing just those constants: > > ## *vscale* > > ### Syntax: > > > `vscale` > > ### Overview: > > This complex constant represents the runtime value of `n` for any scalable > type > `<n x m x ty>`. This is primarily used to increment induction variables and > generate offsets. > > ### Interface: > > ```cpp > Constant *VScaleValue::get(Type *Ty); > ``` > > ### Example: > > The following shows how an induction variable would be incremented for a > scalable vector of type `<n x 4 x i32>`. > > ```llvm > %index.next = add nuw nsw i64 %index, mul (i64 vscale, i64 4) > ``` > > ## *stepvector* > > ### Syntax: > > > `stepvector` > > ### Overview: > > This complex constant represents the runtime value of a vector of > increasing > integers in the arithmetic series: > > > `<0, 1, 2, ... num_elements-1>` > > This is the basis for a scalable form of vector constants. Adding a splat > changes the effective starting point, and multiplying changes the step. The > main uses for this are: > > * Predicate creation using vector compares for fully predicated loops (see > also: > [*propff*](#propff), [*test*](#test)). > * Creating offset vectors for gather/scatter via `getelementptr`. > * Creating masks for `shufflevector`. > > For the following loop, a `stepvector` constant would be added to a splat > of the > loop induction variable to create the data vector to store: > > ```cpp > unsigned a[LIMIT]; > > for (unsigned i = 0; i < LIMIT; i++) { > a[i] = i; > } > ``` > > ### Interface: > > ```cpp > Constant *StepVectorValue::get(Type *Ty); > ``` > > ### Example: > > The following shows the construction of a scalable vector of the form > <start, start-2, start-4, ...>: > > ```llvm > %elt = insertelement <n x 4 x i32> undef, i32 %start, i32 0 > %widestart = shufflevector <n x 4 x i32> %elt, <n x 4 x i32> undef, <n x > 4 x i32> zeroinitializer > %step = insertelement <n x 4 x i32> undef, i32 -2, i32 0 > %widestep = shufflevector <n x 4 x i32> %step, <n x 4 x i32> undef, <n x > 4 x i32> zeroinitializer > %stridevec = mul <n x 4 x i32> stepvector, %widestep > %finalvec = add <n x 4 x i32> %widestart, %stridevec > ``` > > > > > Current patch set: > https://reviews.llvm.org/D27101 > https://reviews.llvm.org/D27102 > https://reviews.llvm.org/D27103 > https://reviews.llvm.org/D27105 > > -Graham > > > > On 22/11/2016, 14:49, "Graham Hunter via llvm-dev" < > llvm-dev at lists.llvm.org> wrote: > > Hi Renato, > > Sorry for the delay in responding. We've been busy rethinking some of > our changes after the feedback we've received thus far (particularly from > the devmeeting). The incremental patches will use our revised design(which > should be less invasive), and I'll be updating our document to match. > > On 16/11/2016, 12:46, "Renato Golin" <renato.golin at linaro.org> wrote: > > > This email is long and hard to read. I'm not surprised no one > replied > > yet. I think your PDF attached is a good start away from the > > complexity, but we're not going to get far if we try to do things in > > one step. > > > Based on your repository, the number of changes is so great, and the > > changes so invasive, that we really should look back at what we need > > to do, one step at a time, and only perform the refactoring changes > > that are needed for each step. > > We don't intend to do this all in one go; we fully expect that we'll > need to refactor a few times based on community feedback as we > incrementally add support for scalable vectors. > > > > * This is a warts-and-all release of our development tree, with > plenty of TODOs and unfinished experiments > > > * We haven't posted our clang changes yet > > > > I don't mind FIXMEs or TODOs, but I did see a lot of spurious name > > changes, enum value moves (breaking old binaries) and a lot of new > > high-level passes (LoopVectorisationAnalysis) which will need a long > > review on their own before we even start thinking about SVE. > > > > I recommend you guys separate the refactoring from the > implementation > > and try to upstream the initial and uncontroversial refactorings > (name > > changes, etc), as well as move out the current functionality into > new > > passes, so then you can extend for SVE as a refactoring, not > > move-and-extend in the same pass. > > So our highest priority is getting basic support for SVE into the > codebase (types, codegen, assembler/disassembler, simple vectorization); > after that is in, we'll be happy to discuss our other changes like > separating out loop vectorization legality, controlling loops via > predication, or adding search loop vectorization. > > > We want to minimise the number of changes, so that we can revert > > breakages more easily, and have a steady progress, rather than a > > break-the-world situation. > > Same for us. The individual patches will be relatively small, this > repo was just for context if needed when discussing the smaller patches. > > > Finally, *every* test change needs to be scrutinised and guaranteed > to > > make sense. We really dislike spurious test changes, unless we can > > prove that the test was unstable to being with, in which case we > > change it to a better test. > > Yep, makes sense. > > Thanks, > > -Graham > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161124/d5b78471/attachment.html>
Renato Golin via llvm-dev
2016-Nov-25 13:39 UTC
[llvm-dev] [RFC] Supporting ARM's SVE in LLVM
Hi Graham, I'll look into the patches next, but first some questions after reading the available white papers on the net. On 24 November 2016 at 15:39, Graham Hunter <Graham.Hunter at arm.com> wrote:> This complex constant represents the runtime value of `n` for any scalable type > `<n x m x ty>`. This is primarily used to increment induction variables and > generate offsets.What do you mean by "complex constant"? Surely not Complex, but this is not really a constant either.>From what I read around (and this is why releasing the spec isimportant, because I'm basing my reviews on guess work), is that the length of a vector is not constant, even on the same machine. In theory, according to a post in the ARM forums (which now I forget), the kernel could choose the vector length per process, meaning this is not known even at link time. But that's ok, because the SVE instructions completely (I'm guessing, again) bypass the need for that "constant" to be constant at all, ie, the use of `incw/incp`. Since you can fail half-way through, the width that you need to increment to the induction variable is not even known at run time! Meaning, that's not a constant at all! Example: a[i] = b[ c[i] ]; ld1w z0.s, p0/z, [ c, i, lsl 2 ] ld1w z1.s, p0/z, [ b, z0.s, stxw 2 ] Now, z0.s load may have failed with seg fault somewhere, and it's up to the FFR to tell brka/brkb how to deal with this. Each iteration will have: * The same vector length *per process* for accessing c[] * A potentially *different* vector length, *per iteration*, for accessing b[] So, while <n x m x i32> could be constant on some vectors, even at compile time (if we have a flag that forces certain length), it could be unknown *per iteration* at run time.> ```llvm > %index.next = add nuw nsw i64 %index, mul (i64 vscale, i64 4) > ```Right, this would be translated to: incw x2 Now, the question is, why do we need "mul (i64 vscale, i64 4)" in the IR? There is no semantic analysis you can do on a value that can change on every iteration of the loop. You can't elide, hoist, combine or const fold. If I got it right (from random documents on the web), `incX` relates to a number of "increment induction" functionality. `incw` is probably "increment W", ie. 32-bits, while `incp` is "increment predicate", ie. whatever the size of the predicate you use: Examples: incw x2 # increments x2 to 4*(FFR successful lanes) incp x2, p0.b # increments x2 to 1*(FFR successful lanes) So, this IR semantics is valid for the second case, but irrelevant for the second. Also, I'm worried that we'll end up ignoring the multiplier altogether, if we change the vector types (from byte to word, for example), or make the process of doing so more complex.> The following shows the construction of a scalable vector of the form > <start, start-2, start-4, ...>: > > ```llvm > %elt = insertelement <n x 4 x i32> undef, i32 %start, i32 0 > %widestart = shufflevector <n x 4 x i32> %elt, <n x 4 x i32> undef, <n x 4 x i32> zeroinitializer > %step = insertelement <n x 4 x i32> undef, i32 -2, i32 0 > %widestep = shufflevector <n x 4 x i32> %step, <n x 4 x i32> undef, <n x 4 x i32> zeroinitializer > %stridevec = mul <n x 4 x i32> stepvector, %widestep > %finalvec = add <n x 4 x i32> %widestart, %stridevec > ```This is really fragile and confusing, and I agree with James, an intrinsic here would be *much* better. Something like %const_vec = <n x 4 x i32> @llvm.sve.constant_vector(i32 %start, i32 %step) cheers, --renato
Possibly Parallel Threads
- [RFC] Supporting ARM's SVE in LLVM
- [RFC] Supporting ARM's SVE in LLVM
- [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
- [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
- [EXT] Re: [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths