Amara Emerson via llvm-dev
2017-Jul-06 22:03 UTC
[llvm-dev] [RFC][SVE] Supporting Scalable Vector Architectures in LLVM IR (take 2)
[Sending again to list] Hi Chris, Responses inline... On 6 July 2017 at 21:02, Chris Lattner via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Thanks for sending this out Graham. Here are some comments: > > This is a clever approach to unifying the two concepts, and I think that the approach is basically reasonable. The primary problem that this will introduce is: > > 1) Almost anything touching (e.g. transforming) vector operations will have to be aware of this concept. Given a first class implementation of SVE, I don’t see how that’s avoidable though, and your extension of VectorType is sensible.Yes, however we have found that the vast majority of vector transforms don't need any modification to deal with scalable types. There are obviously exceptions, things like analysing shuffle vector masks for specific patterns etc.> > 2) This means that VectorType is sometimes fixed size, and sometime unknowable. I don’t think we have an existing analog for that in the type system. > > Is this type a first class type? Can you PHI them, can you load/store them, can you pass them as function arguments without limitations? If not, that is a serious problem. How does struct layout with a scalable vector in it work? What does an alloca of one of them look like? What does a spill look like in codegen?Yes, as an extension to VectorType they can be manipulated and passed around like normal vectors, load/stored directly, phis, put in llvm structs etc. Address computation generates expressions in terms vscale and it seems to work well.> > I think that a target-specific type (e.g. like we have X86_mmx) is the only reasonable alternative. A subclass of VectorType is just another implementation approach of your design above. This is assuming that scalable vectors are really first class types. > > The pros and cons of a separate type is that it avoids you having to touch everything that touches VectorTypes, and if it turns out that the code that needs to handle normal SIMD and scalable SIMD vectors is different, then it is a win to split them into two types. If, on the other hand, most code would treat the two types similarly, then it is better to just have one type.Fortunately the latter case is exactly what we've found. Most operations on vectors are not actually concerned with their absolute size, and more usually concerned with relative sizes if anything.> > The major concern I have here is that I’m not sure how scalable vectors can be considered to be first class types, given that we don’t know their size. If they can’t be put in an LLVM struct (for example), then this would pose a significant problem with your current approach. It would be a huge problem if VectorType could be in structs in some cases, but not others.We can have them as first class types but as you say it does require us to be careful with reasoning about their sizes. In practice there are architectural limits on the sizes of vectors, so it's possible to have an upper bound on the size. However to be completely accurate, type sizes in LLVM probably need to have some symbolic representation such that we can reason about their sizes in terms of, essentially, the vscale constant. The other potential avenue is to make all type size queries in LLVM return optional values. We haven't implemented either of these and we haven't yet hit an issue, not to say there isn't one. I think most of the uses of querying type sizes are to compare against other type sizes, so relative comparisons still work even with scalable types. This area is something we want some community input to build consensus on though.>> With a scalable vector type defined, we now need a way to generate addresses for >> consecutive vector values in memory and to be able to create basic constant >> vector values. >> >> For address generation, the `vscale` constant is added to represent the runtime >> value of `n` in `<n x m x type>`. > > This should probably be an intrinsic, not an llvm::Constant. The design of llvm::Constant is already wrong: it shouldn’t have operations like divide, and it would be better to not contribute to the problem.Could you explain your position more on this? The Constant architecture has been a very natural fit for this concept from our perspective.> >> Multiplying `vscale` by `m` and the number of >> bytes in `type` gives the total length of a scalable vector, and the backend >> can pattern match to the various load and store instructions in SVE that >> automatically scale with vector length. > > It is fine for the intrinsic to turn into a target specific ISD node in selection dag to allow your pattern matching. >> >> How do we spill/fill scalable registers on the stack? >> ----------------------------------------------------- >> >> SVE registers have a (partially) unknown size at build time and their associated >> fill/spill instructions require an offset that is implicitly scaled by the >> vector length instead of bytes or element size. To accommodate this we >> created the concept of Stack Regions that are areas on the stack associated >> with specific data types or register classes. > > Ok, that sounds complicated, but can surely be made to work. The bigger problem is that there are various LLVM IR transformations that want to put registers into memory. All of these will be broken with this sort of type.Could you give an example? Thanks for taking the time to review this, Amara
Chris Lattner via llvm-dev
2017-Jul-06 22:13 UTC
[llvm-dev] [RFC][SVE] Supporting Scalable Vector Architectures in LLVM IR (take 2)
On Jul 6, 2017, at 3:03 PM, Amara Emerson <amara.emerson at gmail.com> wrote:>> 1) Almost anything touching (e.g. transforming) vector operations will have to be aware of this concept. Given a first class implementation of SVE, I don’t see how that’s avoidable though, and your extension of VectorType is sensible. > > Yes, however we have found that the vast majority of vector transforms > don't need any modification to deal with scalable types. There are > obviously exceptions, things like analysing shuffle vector masks for > specific patterns etc.Ok great.>> 2) This means that VectorType is sometimes fixed size, and sometime unknowable. I don’t think we have an existing analog for that in the type system. >> >> Is this type a first class type? Can you PHI them, can you load/store them, can you pass them as function arguments without limitations? If not, that is a serious problem. How does struct layout with a scalable vector in it work? What does an alloca of one of them look like? What does a spill look like in codegen? > Yes, as an extension to VectorType they can be manipulated and passed > around like normal vectors, load/stored directly, phis, put in llvm > structs etc. Address computation generates expressions in terms vscale > and it seems to work well.Right, that works out through composition, but what does it mean? I can't have a global variable of a scalable vector type, nor does it make sense for a scalable vector to be embeddable in an LLVM IR struct: nothing that measures the size of a struct is prepared to deal with a non-constant answer.>>> With a scalable vector type defined, we now need a way to generate addresses for >>> consecutive vector values in memory and to be able to create basic constant >>> vector values. >>> >>> For address generation, the `vscale` constant is added to represent the runtime >>> value of `n` in `<n x m x type>`. >> >> This should probably be an intrinsic, not an llvm::Constant. The design of llvm::Constant is already wrong: it shouldn’t have operations like divide, and it would be better to not contribute to the problem. > Could you explain your position more on this? The Constant > architecture has been a very natural fit for this concept from our > perspective.It is appealing, but it is wrong. Constant should really only model primitive constants (ConstantInt/FP, etc) and we should have one more form for “relocatable” constants. Instead, we have intertwined constant folding and ConstantExpr logic that doesn’t make sense. A better pattern to follow are intrinsics like (e.g.) llvm.coro.size.i32(), which always returns a constant value.>> Ok, that sounds complicated, but can surely be made to work. The bigger problem is that there are various LLVM IR transformations that want to put registers into memory. All of these will be broken with this sort of type. > Could you give an example?The concept of “reg2mem” is to put SSA values into allocas for passes that can’t (or don’t want to) update SSA. Similarly, function body extraction can turn SSA values into parameters, and depending on the implementation can pack them into structs. The coroutine logic similarly needs to store registers if they cross suspend points, there are multiple other examples. -Chris
Amara Emerson via llvm-dev
2017-Jul-06 22:53 UTC
[llvm-dev] [RFC][SVE] Supporting Scalable Vector Architectures in LLVM IR (take 2)
On 6 July 2017 at 23:13, Chris Lattner <clattner at nondot.org> wrote:>> Yes, as an extension to VectorType they can be manipulated and passed >> around like normal vectors, load/stored directly, phis, put in llvm >> structs etc. Address computation generates expressions in terms vscale >> and it seems to work well. > > Right, that works out through composition, but what does it mean? I can't have a global variable of a scalable vector type, nor does it make sense for a scalable vector to be embeddable in an LLVM IR struct: nothing that measures the size of a struct is prepared to deal with a non-constant answer.Although the absolute size of the types aren't known at compile time, there are upper bounds which the compiler can assume and use to allow allocation of storage for global variables and the like. The issue with composite type sizes again reduce to the issue of type sizes being either symbolic expressions or simply unknown in some cases.>>> This should probably be an intrinsic, not an llvm::Constant. The design of llvm::Constant is already wrong: it shouldn’t have operations like divide, and it would be better to not contribute to the problem. >> Could you explain your position more on this? The Constant >> architecture has been a very natural fit for this concept from our >> perspective. > > It is appealing, but it is wrong. Constant should really only model primitive constants (ConstantInt/FP, etc) and we should have one more form for “relocatable” constants. Instead, we have intertwined constant folding and ConstantExpr logic that doesn’t make sense. > > A better pattern to follow are intrinsics like (e.g.) llvm.coro.size.i32(), which always returns a constant value.Ok, we'll investigate this issue further.> >>> Ok, that sounds complicated, but can surely be made to work. The bigger problem is that there are various LLVM IR transformations that want to put registers into memory. All of these will be broken with this sort of type. >> Could you give an example? > > The concept of “reg2mem” is to put SSA values into allocas for passes that can’t (or don’t want to) update SSA. Similarly, function body extraction can turn SSA values into parameters, and depending on the implementation can pack them into structs. The coroutine logic similarly needs to store registers if they cross suspend points, there are multiple other examples.I think this should still work. Allocas of scalable vectors are supported, and it's only later at codegen that the unknown sizes result in more work being needed to compute stack offsets correctly. The caveat being that a direct call to something like getTypeStoreSize() will need to be aware of expressions/sizeless-types. If however these passes are exclusively using allocas to put registers into memory, or using structs with extractvalue etc, then they shouldn't need to care and codegen deals with the low level details. Thanks, Amara