Bruce Hoult via llvm-dev
2019-May-24 19:12 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
In the RISC-V V extension, there is no upper limit to the size vector registers can be in a future CPU. (Formally, the upper limit is at least 2^31 bytes) Generic code can enquire the size, dynamically allocate space, and transparently save and restore the contents of a vector register or registers. On Fri, May 24, 2019 at 11:28 AM JinGu Kang via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Hi Graham, > > I am working on a custom target and it is considering scalable vector type representation in programming language. While I am collecting the information about it, I have met your RFC. I have a question. I think the one of fundamental issues is that we do not know the memory layout of the type at compile time. I am not sure whether the RFC covers this issue or not. Conservatively, I imagined the memory layout of biggest type which the scalable vector type can support. I could miss some discussions about it. If I missed something, please let me know. > > Thanks, > JinGu Kang > > ________________________________ > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Graham Hunter via llvm-dev <llvm-dev at lists.llvm.org> > Sent: 05 June 2018 14:15 > To: Chris Lattner; Hal Finkel; Jones, Joel; dag at cray.com; Renato Golin; Kristof Beyls; Amara Emerson; Florian Hahn; Sander De Smalen; Robin Kruppe; llvm-dev at lists.llvm.org; mkuper at google.com; Sjoerd Meijer; Sam Parker > Cc: nd > Subject: [llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths > > Hi, > > Now that Sander has committed enough MC support for SVE, here's an updated > RFC for variable length vector support with a set of 14 patches (listed at the end) > to demonstrate code generation for SVE using the extensions proposed in the RFC. > > I have some ideas about how to support RISC-V's upcoming extension alongside > SVE; I'll send an email with some additional comments on Robin's RFC later. > > Feedback and questions welcome. > > -Graham > > ============================================================> Supporting SIMD instruction sets with variable vector lengths > ============================================================> > In this RFC we propose extending LLVM IR to support code-generation for variable > length vector architectures like Arm's SVE or RISC-V's 'V' extension. Our > approach is backwards compatible and should be as non-intrusive as possible; the > only change needed in other backends is how size is queried on vector types, and > it only requires a change in which function is called. We have created a set of > proof-of-concept patches to represent a simple vectorized loop in IR and > generate SVE instructions from that IR. These patches (listed in section 7 of > this rfc) can be found on Phabricator and are intended to illustrate the scope > of changes required by the general approach described in this RFC. > > =========> Background > =========> > *ARMv8-A Scalable Vector Extensions* (SVE) is a new vector ISA extension for > AArch64 which is intended to scale with hardware such that the same binary > running on a processor with longer vector registers can take advantage of the > increased compute power without recompilation. > > As the vector length is no longer a compile-time known value, the way in which > the LLVM vectorizer generates code requires modifications such that certain > values are now runtime evaluated expressions instead of compile-time constants. > > Documentation for SVE can be found at > https://developer.arm.com/docs/ddi0584/latest/arm-architecture-reference-manual-supplement-the-scalable-vector-extension-sve-for-armv8-a > > =======> Contents > =======> > The rest of this RFC covers the following topics: > > 1. Types -- a proposal to extend VectorType to be able to represent vectors that > have a length which is a runtime-determined multiple of a known base length. > > 2. Size Queries - how to reason about the size of types for which the size isn't > fully known at compile time. > > 3. Representing the runtime multiple of vector length in IR for use in address > calculations and induction variable comparisons. > > 4. Generating 'constant' values in IR for vectors with a runtime-determined > number of elements. > > 5. A brief note on code generation of these new operations for AArch64. > > 6. An example of C code and matching IR using the proposed extensions. > > 7. A list of patches demonstrating the changes required to emit SVE instructions > for a loop that has already been vectorized using the extensions described > in this RFC. > > =======> 1. Types > =======> > To represent a vector of unknown length a boolean `Scalable` property has been > added to the `VectorType` class, which indicates that the number of elements in > the vector is a runtime-determined integer multiple of the `NumElements` field. > Most code that deals with vectors doesn't need to know the exact length, but > does need to know relative lengths -- e.g. get a vector with the same number of > elements but a different element type, or with half or double the number of > elements. > > In order to allow code to transparently support scalable vectors, we introduce > an `ElementCount` class with two members: > > - `unsigned Min`: the minimum number of elements. > - `bool Scalable`: is the element count an unknown multiple of `Min`? > > For non-scalable vectors (``Scalable=false``) the scale is considered to be > equal to one and thus `Min` represents the exact number of elements in the > vector. > > The intent for code working with vectors is to use convenience methods and avoid > directly dealing with the number of elements. If needed, calling > `getElementCount` on a vector type instead of `getVectorNumElements` can be used > to obtain the (potentially scalable) number of elements. Overloaded division and > multiplication operators allow an ElementCount instance to be used in much the > same manner as an integer for most cases. > > This mixture of compile-time and runtime quantities allow us to reason about the > relationship between different scalable vector types without knowing their > exact length. > > The runtime multiple is not expected to change during program execution for SVE, > but it is possible. The model of scalable vectors presented in this RFC assumes > that the multiple will be constant within a function but not necessarily across > functions. As suggested in the recent RISC-V rfc, a new function attribute to > inherit the multiple across function calls will allow for function calls with > vector arguments/return values and inlining/outlining optimizations. > > IR Textual Form > --------------- > > The textual form for a scalable vector is: > > ``<scalable <n> x <type>>`` > > where `type` is the scalar type of each element, `n` is the minimum number of > elements, and the string literal `scalable` indicates that the total number of > elements is an unknown multiple of `n`; `scalable` is just an arbitrary choice > for indicating that the vector is scalable, and could be substituted by another. > For fixed-length vectors, the `scalable` is omitted, so there is no change in > the format for existing vectors. > > Scalable vectors with the same `Min` value have the same number of elements, and > the same number of bytes if `Min * sizeof(type)` is the same (assuming they are > used within the same function): > > ``<scalable 4 x i32>`` and ``<scalable 4 x i8>`` have the same number of > elements. > > ``<scalable x 4 x i32>`` and ``<scalable x 8 x i16>`` have the same number of > bytes. > > IR Bitcode Form > --------------- > > To serialize scalable vectors to bitcode, a new boolean field is added to the > type record. If the field is not present the type will default to a fixed-length > vector type, preserving backwards compatibility. > > Alternatives Considered > ----------------------- > > We did consider one main alternative -- a dedicated target type, like the > x86_mmx type. > > A dedicated target type would either need to extend all existing passes that > work with vectors to recognize the new type, or to duplicate all that code > in order to get reasonable code generation and autovectorization. > > This hasn't been done for the x86_mmx type, and so it is only capable of > providing support for C-level intrinsics instead of being used and recognized by > passes inside llvm. > > Although our current solution will need to change some of the code that creates > new VectorTypes, much of that code doesn't need to care about whether the types > are scalable or not -- they can use preexisting methods like > `getHalfElementsVectorType`. If the code is a little more complex, > `ElementCount` structs can be used instead of an `unsigned` value to represent > the number of elements. > > ==============> 2. Size Queries > ==============> > This is a proposal for how to deal with querying the size of scalable types. > While it has not been implemented in full, the general approach works well > for calculating offsets into structures with scalable types in a modified > version of ComputeValueVTs in our downstream compiler. > > Current IR types that have a known size all return a single integer constant. > For scalable types a second integer is needed to indicate the number of bytes > which need to be scaled by the runtime multiple to obtain the actual length. > > For primitive types, getPrimitiveSizeInBits will function as it does today, > except that it will no longer return a size for vector types (it will return 0, > as it does for other derived types). The majority of calls to this function are > already for scalar rather than vector types. > > For derived types, a function (getSizeExpressionInBits) to return a pair of > integers (one to indicate unscaled bits, the other for bits that need to be > scaled by the runtime multiple) will be added. For backends that do not need to > deal with scalable types, another function (getFixedSizeExpressionInBits) that > only returns unscaled bits will be provided, with a debug assert that the type > isn't scalable. > > Similar functionality will be added to DataLayout. > > Comparing two of these sizes together is straightforward if only unscaled sizes > are used. Comparisons between scaled sizes is also simple when comparing sizes > within a function (or across functions with the inherit flag mentioned in the > changes to the type), but cannot be compared otherwise. If a mix is present, > then any number of unscaled bits will not be considered to have a greater size > than a smaller number of scaled bits, but a smaller number of unscaled bits > will be considered to have a smaller size than a greater number of scaled bits > (since the runtime multiple is at least one). > > Future Work > ----------- > > Since we cannot determine the exact size of a scalable vector, the > existing logic for alias detection won't work when multiple accesses > share a common base pointer with different offsets. > > However, SVE's predication will mean that a dynamic 'safe' vector length > can be determined at runtime, so after initial support has been added we > can work on vectorizing loops using runtime predication to avoid aliasing > problems. > > Alternatives Considered > ----------------------- > > Marking scalable vectors as unsized doesn't work well, as many parts of > llvm dealing with loads and stores assert that 'isSized()' returns true > and make use of the size when calculating offsets. > > We have considered introducing multiple helper functions instead of > using direct size queries, but that doesn't cover all cases. It may > still be a good idea to introduce them to make the purpose in a given > case more obvious, e.g. 'isBitCastableTo(Type*,Type*)'. > > =======================================> 3. Representing Vector Length at Runtime > =======================================> > With a scalable vector type defined, we now need a way to represent the runtime > length in IR in order to generate addresses for consecutive vectors in memory > and determine how many elements have been processed in an iteration of a loop. > > We have added an experimental `vscale` intrinsic to represent the runtime > multiple. Multiplying the result of this intrinsic by the minimum number of > elements in a vector gives the total number of elements in a scalable vector. > > Fixed-Length Code > ----------------- > > Assuming a vector type of <4 x <ty>> > `` > vector.body: > %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.body.preheader ] > ;; <loop body> > ;; Increment induction var > %index.next = add i64 %index, 4 > ;; <check and branch> > `` > Scalable Equivalent > ------------------- > > Assuming a vector type of <scalable 4 x <ty>> > `` > vector.body: > %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.body.preheader ] > ;; <loop body> > ;; Increment induction var > %vscale64 = call i64 @llvm.experimental.vector.vscale.64() > %index.next = add i64 %index, mul (i64 %vscale64, i64 4) > ;; <check and branch> > `` > ==========================> 4. Generating Vector Values > ==========================> For constant vector values, we cannot specify all the elements as we can for > fixed-length vectors; fortunately only a small number of easily synthesized > patterns are required for autovectorization. The `zeroinitializer` constant > can be used in the same manner as fixed-length vectors for a constant zero > splat. This can then be combined with `insertelement` and `shufflevector` > to create arbitrary value splats in the same manner as fixed-length vectors. > > For constants consisting of a sequence of values, an experimental `stepvector` > intrinsic has been added to represent a simple constant of the form > `<0, 1, 2... num_elems-1>`. To change the starting value a splat of the new > start can be added, and changing the step requires multiplying by a splat. > > Fixed-Length Code > ----------------- > `` > ;; Splat a value > %insert = insertelement <4 x i32> undef, i32 %value, i32 0 > %splat = shufflevector <4 x i32> %insert, <4 x i32> undef, <4 x i32> zeroinitializer > ;; Add a constant sequence > %add = add <4 x i32> %splat, <i32 2, i32 4, i32 6, i32 8> > `` > Scalable Equivalent > ------------------- > `` > ;; Splat a value > %insert = insertelement <scalable 4 x i32> undef, i32 %value, i32 0 > %splat = shufflevector <scalable 4 x i32> %insert, <scalable 4 x i32> undef, <scalable 4 x i32> zeroinitializer > ;; Splat offset + stride (the same in this case) > %insert2 = insertelement <scalable 4 x i32> under, i32 2, i32 0 > %str_off = shufflevector <scalable 4 x i32> %insert2, <scalable 4 x i32> undef, <scalable 4 x i32> zeroinitializer > ;; Create sequence for scalable vector > %stepvector = call <scalable 4 x i32> @llvm.experimental.vector.stepvector.nxv4i32() > %mulbystride = mul <scalable 4 x i32> %stepvector, %str_off > %addoffset = add <scalable 4 x i32> %mulbystride, %str_off > ;; Add the runtime-generated sequence > %add = add <scalable 4 x i32> %splat, %addoffset > `` > Future Work > ----------- > > Intrinsics cannot currently be used for constant folding. Our downstream > compiler (using Constants instead of intrinsics) relies quite heavily on this > for good code generation, so we will need to find new ways to recognize and > fold these values. > > =================> 5. Code Generation > =================> > IR splats will be converted to an experimental splatvector intrinsic in > SelectionDAGBuilder. > > All three intrinsics are custom lowered and legalized in the AArch64 backend. > > Two new AArch64ISD nodes have been added to represent the same concepts > at the SelectionDAG level, while splatvector maps onto the existing > AArch64ISD::DUP. > > GlobalISel > ---------- > > Since GlobalISel was enabled by default on AArch64, it was necessary to add > scalable vector support to the LowLevelType implementation. A single bit was > added to the raw_data representation for vectors and vectors of pointers. > > In addition, types that only exist in destination patterns are planted in > the enumeration of available types for generated code. While this may not be > necessary in future, generating an all-true 'ptrue' value was necessary to > convert a predicated instruction into an unpredicated one. > > =========> 6. Example > =========> > The following example shows a simple C loop which assigns the array index to > the array elements matching that index. The IR shows how vscale and stepvector > are used to create the needed values and to advance the index variable in the > loop. > > C Code > ------ > > `` > void IdentityArrayInit(int *a, int count) { > for (int i = 0; i < count; ++i) > a[i] = i; > } > `` > > Scalable IR Vector Body > ----------------------- > > `` > vector.body.preheader: > ;; Other setup > ;; Stepvector used to create initial identity vector > %stepvector = call <scalable 4 x i32> @llvm.experimental.vector.stepvector.nxv4i32() > br vector.body > > vector.body > %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.body.preheader ] > %0 = phi i64 [ %1, %vector.body ], [ 0, %vector.body.preheader ] > > ;; stepvector used for index identity on entry to loop body ;; > %vec.ind7 = phi <scalable 4 x i32> [ %step.add8, %vector.body ], > [ %stepvector, %vector.body.preheader ] > %vscale64 = call i64 @llvm.experimental.vector.vscale.64() > %vscale32 = trunc i64 %vscale64 to i32 > %1 = add i64 %0, mul (i64 %vscale64, i64 4) > > ;; vscale splat used to increment identity vector ;; > %insert = insertelement <scalable 4 x i32> undef, i32 mul (i32 %vscale32, i32 4), i32 0 > %splat shufflevector <scalable 4 x i32> %insert, <scalable 4 x i32> undef, <scalable 4 x i32> zeroinitializer > %step.add8 = add <scalable 4 x i32> %vec.ind7, %splat > %2 = getelementptr inbounds i32, i32* %a, i64 %0 > %3 = bitcast i32* %2 to <scalable 4 x i32>* > store <scalable 4 x i32> %vec.ind7, <scalable 4 x i32>* %3, align 4 > > ;; vscale used to increment loop index > %index.next = add i64 %index, mul (i64 %vscale64, i64 4) > %4 = icmp eq i64 %index.next, %n.vec > br i1 %4, label %middle.block, label %vector.body, !llvm.loop !5 > `` > > =========> 7. Patches > =========> > List of patches: > > 1. Extend VectorType: https://reviews.llvm.org/D32530 > 2. Vector element type Tablegen constraint: https://reviews.llvm.org/D47768 > 3. LLT support for scalable vectors: https://reviews.llvm.org/D47769 > 4. EVT strings and Type mapping: https://reviews.llvm.org/D47770 > 5. SVE Calling Convention: https://reviews.llvm.org/D47771 > 6. Intrinsic lowering cleanup: https://reviews.llvm.org/D47772 > 7. Add VScale intrinsic: https://reviews.llvm.org/D47773 > 8. Add StepVector intrinsic: https://reviews.llvm.org/D47774 > 9. Add SplatVector intrinsic: https://reviews.llvm.org/D47775 > 10. Initial store patterns: https://reviews.llvm.org/D47776 > 11. Initial addition patterns: https://reviews.llvm.org/D47777 > 12. Initial left-shift patterns: https://reviews.llvm.org/D47778 > 13. Implement copy logic for Z regs: https://reviews.llvm.org/D47779 > 14. Prevectorized loop unit test: https://reviews.llvm.org/D47780 > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
JinGu Kang via llvm-dev
2019-May-24 20:47 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
Hi Bruce, Thanks for your comment.> Generic code can enquire the size, dynamically allocate space, andtransparently save and restore the contents of a vector register or registers. I am not sure what the Generic code is. It seems it uses similar way as the implementation of variable length array. If so, I think it could be one way to support it. Thanks, JinGu Kang ________________________________ From: Bruce Hoult <brucehoult at sifive.com> Sent: 24 May 2019 20:12 To: JinGu Kang Cc: Chris Lattner; Hal Finkel; Jones, Joel; dag at cray.com; Renato Golin; Kristof Beyls; Amara Emerson; Florian Hahn; Sander De Smalen; Robin Kruppe; llvm-dev at lists.llvm.org; mkuper at google.com; Sjoerd Meijer; Sam Parker; Graham Hunter; nd Subject: Re: [llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths In the RISC-V V extension, there is no upper limit to the size vector registers can be in a future CPU. (Formally, the upper limit is at least 2^31 bytes) Generic code can enquire the size, dynamically allocate space, and transparently save and restore the contents of a vector register or registers. On Fri, May 24, 2019 at 11:28 AM JinGu Kang via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Hi Graham, > > I am working on a custom target and it is considering scalable vector type representation in programming language. While I am collecting the information about it, I have met your RFC. I have a question. I think the one of fundamental issues is that we do not know the memory layout of the type at compile time. I am not sure whether the RFC covers this issue or not. Conservatively, I imagined the memory layout of biggest type which the scalable vector type can support. I could miss some discussions about it. If I missed something, please let me know. > > Thanks, > JinGu Kang > > ________________________________ > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Graham Hunter via llvm-dev <llvm-dev at lists.llvm.org> > Sent: 05 June 2018 14:15 > To: Chris Lattner; Hal Finkel; Jones, Joel; dag at cray.com; Renato Golin; Kristof Beyls; Amara Emerson; Florian Hahn; Sander De Smalen; Robin Kruppe; llvm-dev at lists.llvm.org; mkuper at google.com; Sjoerd Meijer; Sam Parker > Cc: nd > Subject: [llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths > > Hi, > > Now that Sander has committed enough MC support for SVE, here's an updated > RFC for variable length vector support with a set of 14 patches (listed at the end) > to demonstrate code generation for SVE using the extensions proposed in the RFC. > > I have some ideas about how to support RISC-V's upcoming extension alongside > SVE; I'll send an email with some additional comments on Robin's RFC later. > > Feedback and questions welcome. > > -Graham > > ============================================================> Supporting SIMD instruction sets with variable vector lengths > ============================================================> > In this RFC we propose extending LLVM IR to support code-generation for variable > length vector architectures like Arm's SVE or RISC-V's 'V' extension. Our > approach is backwards compatible and should be as non-intrusive as possible; the > only change needed in other backends is how size is queried on vector types, and > it only requires a change in which function is called. We have created a set of > proof-of-concept patches to represent a simple vectorized loop in IR and > generate SVE instructions from that IR. These patches (listed in section 7 of > this rfc) can be found on Phabricator and are intended to illustrate the scope > of changes required by the general approach described in this RFC. > > =========> Background > =========> > *ARMv8-A Scalable Vector Extensions* (SVE) is a new vector ISA extension for > AArch64 which is intended to scale with hardware such that the same binary > running on a processor with longer vector registers can take advantage of the > increased compute power without recompilation. > > As the vector length is no longer a compile-time known value, the way in which > the LLVM vectorizer generates code requires modifications such that certain > values are now runtime evaluated expressions instead of compile-time constants. > > Documentation for SVE can be found at > https://developer.arm.com/docs/ddi0584/latest/arm-architecture-reference-manual-supplement-the-scalable-vector-extension-sve-for-armv8-a > > =======> Contents > =======> > The rest of this RFC covers the following topics: > > 1. Types -- a proposal to extend VectorType to be able to represent vectors that > have a length which is a runtime-determined multiple of a known base length. > > 2. Size Queries - how to reason about the size of types for which the size isn't > fully known at compile time. > > 3. Representing the runtime multiple of vector length in IR for use in address > calculations and induction variable comparisons. > > 4. Generating 'constant' values in IR for vectors with a runtime-determined > number of elements. > > 5. A brief note on code generation of these new operations for AArch64. > > 6. An example of C code and matching IR using the proposed extensions. > > 7. A list of patches demonstrating the changes required to emit SVE instructions > for a loop that has already been vectorized using the extensions described > in this RFC. > > =======> 1. Types > =======> > To represent a vector of unknown length a boolean `Scalable` property has been > added to the `VectorType` class, which indicates that the number of elements in > the vector is a runtime-determined integer multiple of the `NumElements` field. > Most code that deals with vectors doesn't need to know the exact length, but > does need to know relative lengths -- e.g. get a vector with the same number of > elements but a different element type, or with half or double the number of > elements. > > In order to allow code to transparently support scalable vectors, we introduce > an `ElementCount` class with two members: > > - `unsigned Min`: the minimum number of elements. > - `bool Scalable`: is the element count an unknown multiple of `Min`? > > For non-scalable vectors (``Scalable=false``) the scale is considered to be > equal to one and thus `Min` represents the exact number of elements in the > vector. > > The intent for code working with vectors is to use convenience methods and avoid > directly dealing with the number of elements. If needed, calling > `getElementCount` on a vector type instead of `getVectorNumElements` can be used > to obtain the (potentially scalable) number of elements. Overloaded division and > multiplication operators allow an ElementCount instance to be used in much the > same manner as an integer for most cases. > > This mixture of compile-time and runtime quantities allow us to reason about the > relationship between different scalable vector types without knowing their > exact length. > > The runtime multiple is not expected to change during program execution for SVE, > but it is possible. The model of scalable vectors presented in this RFC assumes > that the multiple will be constant within a function but not necessarily across > functions. As suggested in the recent RISC-V rfc, a new function attribute to > inherit the multiple across function calls will allow for function calls with > vector arguments/return values and inlining/outlining optimizations. > > IR Textual Form > --------------- > > The textual form for a scalable vector is: > > ``<scalable <n> x <type>>`` > > where `type` is the scalar type of each element, `n` is the minimum number of > elements, and the string literal `scalable` indicates that the total number of > elements is an unknown multiple of `n`; `scalable` is just an arbitrary choice > for indicating that the vector is scalable, and could be substituted by another. > For fixed-length vectors, the `scalable` is omitted, so there is no change in > the format for existing vectors. > > Scalable vectors with the same `Min` value have the same number of elements, and > the same number of bytes if `Min * sizeof(type)` is the same (assuming they are > used within the same function): > > ``<scalable 4 x i32>`` and ``<scalable 4 x i8>`` have the same number of > elements. > > ``<scalable x 4 x i32>`` and ``<scalable x 8 x i16>`` have the same number of > bytes. > > IR Bitcode Form > --------------- > > To serialize scalable vectors to bitcode, a new boolean field is added to the > type record. If the field is not present the type will default to a fixed-length > vector type, preserving backwards compatibility. > > Alternatives Considered > ----------------------- > > We did consider one main alternative -- a dedicated target type, like the > x86_mmx type. > > A dedicated target type would either need to extend all existing passes that > work with vectors to recognize the new type, or to duplicate all that code > in order to get reasonable code generation and autovectorization. > > This hasn't been done for the x86_mmx type, and so it is only capable of > providing support for C-level intrinsics instead of being used and recognized by > passes inside llvm. > > Although our current solution will need to change some of the code that creates > new VectorTypes, much of that code doesn't need to care about whether the types > are scalable or not -- they can use preexisting methods like > `getHalfElementsVectorType`. If the code is a little more complex, > `ElementCount` structs can be used instead of an `unsigned` value to represent > the number of elements. > > ==============> 2. Size Queries > ==============> > This is a proposal for how to deal with querying the size of scalable types. > While it has not been implemented in full, the general approach works well > for calculating offsets into structures with scalable types in a modified > version of ComputeValueVTs in our downstream compiler. > > Current IR types that have a known size all return a single integer constant. > For scalable types a second integer is needed to indicate the number of bytes > which need to be scaled by the runtime multiple to obtain the actual length. > > For primitive types, getPrimitiveSizeInBits will function as it does today, > except that it will no longer return a size for vector types (it will return 0, > as it does for other derived types). The majority of calls to this function are > already for scalar rather than vector types. > > For derived types, a function (getSizeExpressionInBits) to return a pair of > integers (one to indicate unscaled bits, the other for bits that need to be > scaled by the runtime multiple) will be added. For backends that do not need to > deal with scalable types, another function (getFixedSizeExpressionInBits) that > only returns unscaled bits will be provided, with a debug assert that the type > isn't scalable. > > Similar functionality will be added to DataLayout. > > Comparing two of these sizes together is straightforward if only unscaled sizes > are used. Comparisons between scaled sizes is also simple when comparing sizes > within a function (or across functions with the inherit flag mentioned in the > changes to the type), but cannot be compared otherwise. If a mix is present, > then any number of unscaled bits will not be considered to have a greater size > than a smaller number of scaled bits, but a smaller number of unscaled bits > will be considered to have a smaller size than a greater number of scaled bits > (since the runtime multiple is at least one). > > Future Work > ----------- > > Since we cannot determine the exact size of a scalable vector, the > existing logic for alias detection won't work when multiple accesses > share a common base pointer with different offsets. > > However, SVE's predication will mean that a dynamic 'safe' vector length > can be determined at runtime, so after initial support has been added we > can work on vectorizing loops using runtime predication to avoid aliasing > problems. > > Alternatives Considered > ----------------------- > > Marking scalable vectors as unsized doesn't work well, as many parts of > llvm dealing with loads and stores assert that 'isSized()' returns true > and make use of the size when calculating offsets. > > We have considered introducing multiple helper functions instead of > using direct size queries, but that doesn't cover all cases. It may > still be a good idea to introduce them to make the purpose in a given > case more obvious, e.g. 'isBitCastableTo(Type*,Type*)'. > > =======================================> 3. Representing Vector Length at Runtime > =======================================> > With a scalable vector type defined, we now need a way to represent the runtime > length in IR in order to generate addresses for consecutive vectors in memory > and determine how many elements have been processed in an iteration of a loop. > > We have added an experimental `vscale` intrinsic to represent the runtime > multiple. Multiplying the result of this intrinsic by the minimum number of > elements in a vector gives the total number of elements in a scalable vector. > > Fixed-Length Code > ----------------- > > Assuming a vector type of <4 x <ty>> > `` > vector.body: > %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.body.preheader ] > ;; <loop body> > ;; Increment induction var > %index.next = add i64 %index, 4 > ;; <check and branch> > `` > Scalable Equivalent > ------------------- > > Assuming a vector type of <scalable 4 x <ty>> > `` > vector.body: > %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.body.preheader ] > ;; <loop body> > ;; Increment induction var > %vscale64 = call i64 @llvm.experimental.vector.vscale.64() > %index.next = add i64 %index, mul (i64 %vscale64, i64 4) > ;; <check and branch> > `` > ==========================> 4. Generating Vector Values > ==========================> For constant vector values, we cannot specify all the elements as we can for > fixed-length vectors; fortunately only a small number of easily synthesized > patterns are required for autovectorization. The `zeroinitializer` constant > can be used in the same manner as fixed-length vectors for a constant zero > splat. This can then be combined with `insertelement` and `shufflevector` > to create arbitrary value splats in the same manner as fixed-length vectors. > > For constants consisting of a sequence of values, an experimental `stepvector` > intrinsic has been added to represent a simple constant of the form > `<0, 1, 2... num_elems-1>`. To change the starting value a splat of the new > start can be added, and changing the step requires multiplying by a splat. > > Fixed-Length Code > ----------------- > `` > ;; Splat a value > %insert = insertelement <4 x i32> undef, i32 %value, i32 0 > %splat = shufflevector <4 x i32> %insert, <4 x i32> undef, <4 x i32> zeroinitializer > ;; Add a constant sequence > %add = add <4 x i32> %splat, <i32 2, i32 4, i32 6, i32 8> > `` > Scalable Equivalent > ------------------- > `` > ;; Splat a value > %insert = insertelement <scalable 4 x i32> undef, i32 %value, i32 0 > %splat = shufflevector <scalable 4 x i32> %insert, <scalable 4 x i32> undef, <scalable 4 x i32> zeroinitializer > ;; Splat offset + stride (the same in this case) > %insert2 = insertelement <scalable 4 x i32> under, i32 2, i32 0 > %str_off = shufflevector <scalable 4 x i32> %insert2, <scalable 4 x i32> undef, <scalable 4 x i32> zeroinitializer > ;; Create sequence for scalable vector > %stepvector = call <scalable 4 x i32> @llvm.experimental.vector.stepvector.nxv4i32() > %mulbystride = mul <scalable 4 x i32> %stepvector, %str_off > %addoffset = add <scalable 4 x i32> %mulbystride, %str_off > ;; Add the runtime-generated sequence > %add = add <scalable 4 x i32> %splat, %addoffset > `` > Future Work > ----------- > > Intrinsics cannot currently be used for constant folding. Our downstream > compiler (using Constants instead of intrinsics) relies quite heavily on this > for good code generation, so we will need to find new ways to recognize and > fold these values. > > =================> 5. Code Generation > =================> > IR splats will be converted to an experimental splatvector intrinsic in > SelectionDAGBuilder. > > All three intrinsics are custom lowered and legalized in the AArch64 backend. > > Two new AArch64ISD nodes have been added to represent the same concepts > at the SelectionDAG level, while splatvector maps onto the existing > AArch64ISD::DUP. > > GlobalISel > ---------- > > Since GlobalISel was enabled by default on AArch64, it was necessary to add > scalable vector support to the LowLevelType implementation. A single bit was > added to the raw_data representation for vectors and vectors of pointers. > > In addition, types that only exist in destination patterns are planted in > the enumeration of available types for generated code. While this may not be > necessary in future, generating an all-true 'ptrue' value was necessary to > convert a predicated instruction into an unpredicated one. > > =========> 6. Example > =========> > The following example shows a simple C loop which assigns the array index to > the array elements matching that index. The IR shows how vscale and stepvector > are used to create the needed values and to advance the index variable in the > loop. > > C Code > ------ > > `` > void IdentityArrayInit(int *a, int count) { > for (int i = 0; i < count; ++i) > a[i] = i; > } > `` > > Scalable IR Vector Body > ----------------------- > > `` > vector.body.preheader: > ;; Other setup > ;; Stepvector used to create initial identity vector > %stepvector = call <scalable 4 x i32> @llvm.experimental.vector.stepvector.nxv4i32() > br vector.body > > vector.body > %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.body.preheader ] > %0 = phi i64 [ %1, %vector.body ], [ 0, %vector.body.preheader ] > > ;; stepvector used for index identity on entry to loop body ;; > %vec.ind7 = phi <scalable 4 x i32> [ %step.add8, %vector.body ], > [ %stepvector, %vector.body.preheader ] > %vscale64 = call i64 @llvm.experimental.vector.vscale.64() > %vscale32 = trunc i64 %vscale64 to i32 > %1 = add i64 %0, mul (i64 %vscale64, i64 4) > > ;; vscale splat used to increment identity vector ;; > %insert = insertelement <scalable 4 x i32> undef, i32 mul (i32 %vscale32, i32 4), i32 0 > %splat shufflevector <scalable 4 x i32> %insert, <scalable 4 x i32> undef, <scalable 4 x i32> zeroinitializer > %step.add8 = add <scalable 4 x i32> %vec.ind7, %splat > %2 = getelementptr inbounds i32, i32* %a, i64 %0 > %3 = bitcast i32* %2 to <scalable 4 x i32>* > store <scalable 4 x i32> %vec.ind7, <scalable 4 x i32>* %3, align 4 > > ;; vscale used to increment loop index > %index.next = add i64 %index, mul (i64 %vscale64, i64 4) > %4 = icmp eq i64 %index.next, %n.vec > br i1 %4, label %middle.block, label %vector.body, !llvm.loop !5 > `` > > =========> 7. Patches > =========> > List of patches: > > 1. Extend VectorType: https://reviews.llvm.org/D32530 > 2. Vector element type Tablegen constraint: https://reviews.llvm.org/D47768 > 3. LLT support for scalable vectors: https://reviews.llvm.org/D47769 > 4. EVT strings and Type mapping: https://reviews.llvm.org/D47770 > 5. SVE Calling Convention: https://reviews.llvm.org/D47771 > 6. Intrinsic lowering cleanup: https://reviews.llvm.org/D47772 > 7. Add VScale intrinsic: https://reviews.llvm.org/D47773 > 8. Add StepVector intrinsic: https://reviews.llvm.org/D47774 > 9. Add SplatVector intrinsic: https://reviews.llvm.org/D47775 > 10. Initial store patterns: https://reviews.llvm.org/D47776 > 11. Initial addition patterns: https://reviews.llvm.org/D47777 > 12. Initial left-shift patterns: https://reviews.llvm.org/D47778 > 13. Implement copy logic for Z regs: https://reviews.llvm.org/D47779 > 14. Prevectorized loop unit test: https://reviews.llvm.org/D47780 > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190524/97c6eaf3/attachment-0001.html>
Bruce Hoult via llvm-dev
2019-May-24 21:07 UTC
[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
By "generic code" I mean program instructions that is guaranteed to run correctly on any RISC-V CPU with the V extension, now or in the far future. On Fri, May 24, 2019 at 1:47 PM JinGu Kang <jingu at codeplay.com> wrote:> > Hi Bruce, > > Thanks for your comment. > > > Generic code can enquire the size, dynamically allocate space, and > transparently save and restore the contents of a vector register or > registers. > > I am not sure what the Generic code is. It seems it uses similar way as the implementation of variable length array. If so, I think it could be one way to support it. > > Thanks, > JinGu Kang > > ________________________________ > From: Bruce Hoult <brucehoult at sifive.com> > Sent: 24 May 2019 20:12 > To: JinGu Kang > Cc: Chris Lattner; Hal Finkel; Jones, Joel; dag at cray.com; Renato Golin; Kristof Beyls; Amara Emerson; Florian Hahn; Sander De Smalen; Robin Kruppe; llvm-dev at lists.llvm.org; mkuper at google.com; Sjoerd Meijer; Sam Parker; Graham Hunter; nd > Subject: Re: [llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths > > In the RISC-V V extension, there is no upper limit to the size vector > registers can be in a future CPU. (Formally, the upper limit is at > least 2^31 bytes) > > Generic code can enquire the size, dynamically allocate space, and > transparently save and restore the contents of a vector register or > registers. > > On Fri, May 24, 2019 at 11:28 AM JinGu Kang via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > > Hi Graham, > > > > I am working on a custom target and it is considering scalable vector type representation in programming language. While I am collecting the information about it, I have met your RFC. I have a question. I think the one of fundamental issues is that we do not know the memory layout of the type at compile time. I am not sure whether the RFC covers this issue or not. Conservatively, I imagined the memory layout of biggest type which the scalable vector type can support. I could miss some discussions about it. If I missed something, please let me know. > > > > Thanks, > > JinGu Kang > > > > ________________________________ > > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Graham Hunter via llvm-dev <llvm-dev at lists.llvm.org> > > Sent: 05 June 2018 14:15 > > To: Chris Lattner; Hal Finkel; Jones, Joel; dag at cray.com; Renato Golin; Kristof Beyls; Amara Emerson; Florian Hahn; Sander De Smalen; Robin Kruppe; llvm-dev at lists.llvm.org; mkuper at google.com; Sjoerd Meijer; Sam Parker > > Cc: nd > > Subject: [llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths > > > > Hi, > > > > Now that Sander has committed enough MC support for SVE, here's an updated > > RFC for variable length vector support with a set of 14 patches (listed at the end) > > to demonstrate code generation for SVE using the extensions proposed in the RFC. > > > > I have some ideas about how to support RISC-V's upcoming extension alongside > > SVE; I'll send an email with some additional comments on Robin's RFC later. > > > > Feedback and questions welcome. > > > > -Graham > > > > ============================================================> > Supporting SIMD instruction sets with variable vector lengths > > ============================================================> > > > In this RFC we propose extending LLVM IR to support code-generation for variable > > length vector architectures like Arm's SVE or RISC-V's 'V' extension. Our > > approach is backwards compatible and should be as non-intrusive as possible; the > > only change needed in other backends is how size is queried on vector types, and > > it only requires a change in which function is called. We have created a set of > > proof-of-concept patches to represent a simple vectorized loop in IR and > > generate SVE instructions from that IR. These patches (listed in section 7 of > > this rfc) can be found on Phabricator and are intended to illustrate the scope > > of changes required by the general approach described in this RFC. > > > > =========> > Background > > =========> > > > *ARMv8-A Scalable Vector Extensions* (SVE) is a new vector ISA extension for > > AArch64 which is intended to scale with hardware such that the same binary > > running on a processor with longer vector registers can take advantage of the > > increased compute power without recompilation. > > > > As the vector length is no longer a compile-time known value, the way in which > > the LLVM vectorizer generates code requires modifications such that certain > > values are now runtime evaluated expressions instead of compile-time constants. > > > > Documentation for SVE can be found at > > https://developer.arm.com/docs/ddi0584/latest/arm-architecture-reference-manual-supplement-the-scalable-vector-extension-sve-for-armv8-a > > > > =======> > Contents > > =======> > > > The rest of this RFC covers the following topics: > > > > 1. Types -- a proposal to extend VectorType to be able to represent vectors that > > have a length which is a runtime-determined multiple of a known base length. > > > > 2. Size Queries - how to reason about the size of types for which the size isn't > > fully known at compile time. > > > > 3. Representing the runtime multiple of vector length in IR for use in address > > calculations and induction variable comparisons. > > > > 4. Generating 'constant' values in IR for vectors with a runtime-determined > > number of elements. > > > > 5. A brief note on code generation of these new operations for AArch64. > > > > 6. An example of C code and matching IR using the proposed extensions. > > > > 7. A list of patches demonstrating the changes required to emit SVE instructions > > for a loop that has already been vectorized using the extensions described > > in this RFC. > > > > =======> > 1. Types > > =======> > > > To represent a vector of unknown length a boolean `Scalable` property has been > > added to the `VectorType` class, which indicates that the number of elements in > > the vector is a runtime-determined integer multiple of the `NumElements` field. > > Most code that deals with vectors doesn't need to know the exact length, but > > does need to know relative lengths -- e.g. get a vector with the same number of > > elements but a different element type, or with half or double the number of > > elements. > > > > In order to allow code to transparently support scalable vectors, we introduce > > an `ElementCount` class with two members: > > > > - `unsigned Min`: the minimum number of elements. > > - `bool Scalable`: is the element count an unknown multiple of `Min`? > > > > For non-scalable vectors (``Scalable=false``) the scale is considered to be > > equal to one and thus `Min` represents the exact number of elements in the > > vector. > > > > The intent for code working with vectors is to use convenience methods and avoid > > directly dealing with the number of elements. If needed, calling > > `getElementCount` on a vector type instead of `getVectorNumElements` can be used > > to obtain the (potentially scalable) number of elements. Overloaded division and > > multiplication operators allow an ElementCount instance to be used in much the > > same manner as an integer for most cases. > > > > This mixture of compile-time and runtime quantities allow us to reason about the > > relationship between different scalable vector types without knowing their > > exact length. > > > > The runtime multiple is not expected to change during program execution for SVE, > > but it is possible. The model of scalable vectors presented in this RFC assumes > > that the multiple will be constant within a function but not necessarily across > > functions. As suggested in the recent RISC-V rfc, a new function attribute to > > inherit the multiple across function calls will allow for function calls with > > vector arguments/return values and inlining/outlining optimizations. > > > > IR Textual Form > > --------------- > > > > The textual form for a scalable vector is: > > > > ``<scalable <n> x <type>>`` > > > > where `type` is the scalar type of each element, `n` is the minimum number of > > elements, and the string literal `scalable` indicates that the total number of > > elements is an unknown multiple of `n`; `scalable` is just an arbitrary choice > > for indicating that the vector is scalable, and could be substituted by another. > > For fixed-length vectors, the `scalable` is omitted, so there is no change in > > the format for existing vectors. > > > > Scalable vectors with the same `Min` value have the same number of elements, and > > the same number of bytes if `Min * sizeof(type)` is the same (assuming they are > > used within the same function): > > > > ``<scalable 4 x i32>`` and ``<scalable 4 x i8>`` have the same number of > > elements. > > > > ``<scalable x 4 x i32>`` and ``<scalable x 8 x i16>`` have the same number of > > bytes. > > > > IR Bitcode Form > > --------------- > > > > To serialize scalable vectors to bitcode, a new boolean field is added to the > > type record. If the field is not present the type will default to a fixed-length > > vector type, preserving backwards compatibility. > > > > Alternatives Considered > > ----------------------- > > > > We did consider one main alternative -- a dedicated target type, like the > > x86_mmx type. > > > > A dedicated target type would either need to extend all existing passes that > > work with vectors to recognize the new type, or to duplicate all that code > > in order to get reasonable code generation and autovectorization. > > > > This hasn't been done for the x86_mmx type, and so it is only capable of > > providing support for C-level intrinsics instead of being used and recognized by > > passes inside llvm. > > > > Although our current solution will need to change some of the code that creates > > new VectorTypes, much of that code doesn't need to care about whether the types > > are scalable or not -- they can use preexisting methods like > > `getHalfElementsVectorType`. If the code is a little more complex, > > `ElementCount` structs can be used instead of an `unsigned` value to represent > > the number of elements. > > > > ==============> > 2. Size Queries > > ==============> > > > This is a proposal for how to deal with querying the size of scalable types. > > While it has not been implemented in full, the general approach works well > > for calculating offsets into structures with scalable types in a modified > > version of ComputeValueVTs in our downstream compiler. > > > > Current IR types that have a known size all return a single integer constant. > > For scalable types a second integer is needed to indicate the number of bytes > > which need to be scaled by the runtime multiple to obtain the actual length. > > > > For primitive types, getPrimitiveSizeInBits will function as it does today, > > except that it will no longer return a size for vector types (it will return 0, > > as it does for other derived types). The majority of calls to this function are > > already for scalar rather than vector types. > > > > For derived types, a function (getSizeExpressionInBits) to return a pair of > > integers (one to indicate unscaled bits, the other for bits that need to be > > scaled by the runtime multiple) will be added. For backends that do not need to > > deal with scalable types, another function (getFixedSizeExpressionInBits) that > > only returns unscaled bits will be provided, with a debug assert that the type > > isn't scalable. > > > > Similar functionality will be added to DataLayout. > > > > Comparing two of these sizes together is straightforward if only unscaled sizes > > are used. Comparisons between scaled sizes is also simple when comparing sizes > > within a function (or across functions with the inherit flag mentioned in the > > changes to the type), but cannot be compared otherwise. If a mix is present, > > then any number of unscaled bits will not be considered to have a greater size > > than a smaller number of scaled bits, but a smaller number of unscaled bits > > will be considered to have a smaller size than a greater number of scaled bits > > (since the runtime multiple is at least one). > > > > Future Work > > ----------- > > > > Since we cannot determine the exact size of a scalable vector, the > > existing logic for alias detection won't work when multiple accesses > > share a common base pointer with different offsets. > > > > However, SVE's predication will mean that a dynamic 'safe' vector length > > can be determined at runtime, so after initial support has been added we > > can work on vectorizing loops using runtime predication to avoid aliasing > > problems. > > > > Alternatives Considered > > ----------------------- > > > > Marking scalable vectors as unsized doesn't work well, as many parts of > > llvm dealing with loads and stores assert that 'isSized()' returns true > > and make use of the size when calculating offsets. > > > > We have considered introducing multiple helper functions instead of > > using direct size queries, but that doesn't cover all cases. It may > > still be a good idea to introduce them to make the purpose in a given > > case more obvious, e.g. 'isBitCastableTo(Type*,Type*)'. > > > > =======================================> > 3. Representing Vector Length at Runtime > > =======================================> > > > With a scalable vector type defined, we now need a way to represent the runtime > > length in IR in order to generate addresses for consecutive vectors in memory > > and determine how many elements have been processed in an iteration of a loop. > > > > We have added an experimental `vscale` intrinsic to represent the runtime > > multiple. Multiplying the result of this intrinsic by the minimum number of > > elements in a vector gives the total number of elements in a scalable vector. > > > > Fixed-Length Code > > ----------------- > > > > Assuming a vector type of <4 x <ty>> > > `` > > vector.body: > > %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.body.preheader ] > > ;; <loop body> > > ;; Increment induction var > > %index.next = add i64 %index, 4 > > ;; <check and branch> > > `` > > Scalable Equivalent > > ------------------- > > > > Assuming a vector type of <scalable 4 x <ty>> > > `` > > vector.body: > > %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.body.preheader ] > > ;; <loop body> > > ;; Increment induction var > > %vscale64 = call i64 @llvm.experimental.vector.vscale.64() > > %index.next = add i64 %index, mul (i64 %vscale64, i64 4) > > ;; <check and branch> > > `` > > ==========================> > 4. Generating Vector Values > > ==========================> > For constant vector values, we cannot specify all the elements as we can for > > fixed-length vectors; fortunately only a small number of easily synthesized > > patterns are required for autovectorization. The `zeroinitializer` constant > > can be used in the same manner as fixed-length vectors for a constant zero > > splat. This can then be combined with `insertelement` and `shufflevector` > > to create arbitrary value splats in the same manner as fixed-length vectors. > > > > For constants consisting of a sequence of values, an experimental `stepvector` > > intrinsic has been added to represent a simple constant of the form > > `<0, 1, 2... num_elems-1>`. To change the starting value a splat of the new > > start can be added, and changing the step requires multiplying by a splat. > > > > Fixed-Length Code > > ----------------- > > `` > > ;; Splat a value > > %insert = insertelement <4 x i32> undef, i32 %value, i32 0 > > %splat = shufflevector <4 x i32> %insert, <4 x i32> undef, <4 x i32> zeroinitializer > > ;; Add a constant sequence > > %add = add <4 x i32> %splat, <i32 2, i32 4, i32 6, i32 8> > > `` > > Scalable Equivalent > > ------------------- > > `` > > ;; Splat a value > > %insert = insertelement <scalable 4 x i32> undef, i32 %value, i32 0 > > %splat = shufflevector <scalable 4 x i32> %insert, <scalable 4 x i32> undef, <scalable 4 x i32> zeroinitializer > > ;; Splat offset + stride (the same in this case) > > %insert2 = insertelement <scalable 4 x i32> under, i32 2, i32 0 > > %str_off = shufflevector <scalable 4 x i32> %insert2, <scalable 4 x i32> undef, <scalable 4 x i32> zeroinitializer > > ;; Create sequence for scalable vector > > %stepvector = call <scalable 4 x i32> @llvm.experimental.vector.stepvector.nxv4i32() > > %mulbystride = mul <scalable 4 x i32> %stepvector, %str_off > > %addoffset = add <scalable 4 x i32> %mulbystride, %str_off > > ;; Add the runtime-generated sequence > > %add = add <scalable 4 x i32> %splat, %addoffset > > `` > > Future Work > > ----------- > > > > Intrinsics cannot currently be used for constant folding. Our downstream > > compiler (using Constants instead of intrinsics) relies quite heavily on this > > for good code generation, so we will need to find new ways to recognize and > > fold these values. > > > > =================> > 5. Code Generation > > =================> > > > IR splats will be converted to an experimental splatvector intrinsic in > > SelectionDAGBuilder. > > > > All three intrinsics are custom lowered and legalized in the AArch64 backend. > > > > Two new AArch64ISD nodes have been added to represent the same concepts > > at the SelectionDAG level, while splatvector maps onto the existing > > AArch64ISD::DUP. > > > > GlobalISel > > ---------- > > > > Since GlobalISel was enabled by default on AArch64, it was necessary to add > > scalable vector support to the LowLevelType implementation. A single bit was > > added to the raw_data representation for vectors and vectors of pointers. > > > > In addition, types that only exist in destination patterns are planted in > > the enumeration of available types for generated code. While this may not be > > necessary in future, generating an all-true 'ptrue' value was necessary to > > convert a predicated instruction into an unpredicated one. > > > > =========> > 6. Example > > =========> > > > The following example shows a simple C loop which assigns the array index to > > the array elements matching that index. The IR shows how vscale and stepvector > > are used to create the needed values and to advance the index variable in the > > loop. > > > > C Code > > ------ > > > > `` > > void IdentityArrayInit(int *a, int count) { > > for (int i = 0; i < count; ++i) > > a[i] = i; > > } > > `` > > > > Scalable IR Vector Body > > ----------------------- > > > > `` > > vector.body.preheader: > > ;; Other setup > > ;; Stepvector used to create initial identity vector > > %stepvector = call <scalable 4 x i32> @llvm.experimental.vector.stepvector.nxv4i32() > > br vector.body > > > > vector.body > > %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.body.preheader ] > > %0 = phi i64 [ %1, %vector.body ], [ 0, %vector.body.preheader ] > > > > ;; stepvector used for index identity on entry to loop body ;; > > %vec.ind7 = phi <scalable 4 x i32> [ %step.add8, %vector.body ], > > [ %stepvector, %vector.body.preheader ] > > %vscale64 = call i64 @llvm.experimental.vector.vscale.64() > > %vscale32 = trunc i64 %vscale64 to i32 > > %1 = add i64 %0, mul (i64 %vscale64, i64 4) > > > > ;; vscale splat used to increment identity vector ;; > > %insert = insertelement <scalable 4 x i32> undef, i32 mul (i32 %vscale32, i32 4), i32 0 > > %splat shufflevector <scalable 4 x i32> %insert, <scalable 4 x i32> undef, <scalable 4 x i32> zeroinitializer > > %step.add8 = add <scalable 4 x i32> %vec.ind7, %splat > > %2 = getelementptr inbounds i32, i32* %a, i64 %0 > > %3 = bitcast i32* %2 to <scalable 4 x i32>* > > store <scalable 4 x i32> %vec.ind7, <scalable 4 x i32>* %3, align 4 > > > > ;; vscale used to increment loop index > > %index.next = add i64 %index, mul (i64 %vscale64, i64 4) > > %4 = icmp eq i64 %index.next, %n.vec > > br i1 %4, label %middle.block, label %vector.body, !llvm.loop !5 > > `` > > > > =========> > 7. Patches > > =========> > > > List of patches: > > > > 1. Extend VectorType: https://reviews.llvm.org/D32530 > > 2. Vector element type Tablegen constraint: https://reviews.llvm.org/D47768 > > 3. LLT support for scalable vectors: https://reviews.llvm.org/D47769 > > 4. EVT strings and Type mapping: https://reviews.llvm.org/D47770 > > 5. SVE Calling Convention: https://reviews.llvm.org/D47771 > > 6. Intrinsic lowering cleanup: https://reviews.llvm.org/D47772 > > 7. Add VScale intrinsic: https://reviews.llvm.org/D47773 > > 8. Add StepVector intrinsic: https://reviews.llvm.org/D47774 > > 9. Add SplatVector intrinsic: https://reviews.llvm.org/D47775 > > 10. Initial store patterns: https://reviews.llvm.org/D47776 > > 11. Initial addition patterns: https://reviews.llvm.org/D47777 > > 12. Initial left-shift patterns: https://reviews.llvm.org/D47778 > > 13. Implement copy logic for Z regs: https://reviews.llvm.org/D47779 > > 14. Prevectorized loop unit test: https://reviews.llvm.org/D47780 > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev