Chris Tetreault via llvm-dev
2020-Dec-01 12:33 UTC
[llvm-dev] [RFC] Named shuffle intrinsics
Joe, I suppose my concern is that if we take these paths of least resistance, we'll never get to the goal of scalable vectors being first class citizens. If the goal is to get these experimental intrinsics in now in order to get work done, and to upgrade shufflevector to support these use cases in the long term rather than moving these intrinsics out of the experimental namespace, then I suppose that's fine. Thanks, Christopher Tetreault -----Original Message----- From: Joe Ellis <Joe.Ellis at arm.com> Sent: Monday, November 30, 2020 9:04 AM To: Chris Tetreault <ctetreau at quicinc.com> Cc: LLVM Dev <llvm-dev at lists.llvm.org> Subject: [EXT] Re: [llvm-dev] [RFC] Named shuffle intrinsics Hi Christopher, Thanks for your response! I am aware of Eli's RFC. There is definitely some overlap between the two proposals, but I think having named intrinsics to represent common vector shuffles is preferable for our needs. The original RFC is much wider in scope and more extensive in the changes that are required. We're trying to solve an immediate problem here: expressing shuffles on scalable vectors. The proof-of-concept on Phabricator currently places the intrinsics into the `llvm.experimental` namespace, so if we want to adopt a different way forward in the future that's certainly possible. Thanks! Joe> On 30 Nov 2020, at 13:47, Chris Tetreault <ctetreau at quicinc.com> wrote: > > Joe, > > Last January, an RFC was posted for this very use case: http://lists.llvm.org/pipermail/llvm-dev/2020-January/138762.html. In that RFC, the proposal was to extend the actual shufflevector instruction to have a "named shuffles" mode. I'm curious, why have you chosen to go with intrinsics for this RFC rather than use an approach similar to the previous RFC? > > Thanks, > Christopher Tetreault > > -----Original Message----- > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Joe Ellis via llvm-dev > Sent: Tuesday, November 24, 2020 7:58 AM > To: llvm-dev at lists.llvm.org > Subject: [EXT] [llvm-dev] [RFC] Named shuffle intrinsics > > Hi there, > > For fixed-length vectors, shufflevector can represent all possible shuffles. However, shufflevector uses an ArrayRef for its mask, which cannot work for scalable vectors. Splats are an exception to this in that shufflevector can represent scalable vector splats, but they are inconsistent with all other shuffles because the result's element count is taken from the first source vector rather than the mask. > > shufflevector could be extended with more support for scalable types, but it is not clear what advantage this has over using explicitly named shuffles. > > We are proposing having named shuffle intrinsics under the llvm.vector namespace that work for both fixed-length and scalable vectors. We do not intend to introduce new types of shuffles, and it is our expectation that all intrinsics will simplify to shufflevector when operating on fixed-length vectors. > > We would like to start with the following named shuffle intrinsics: > > llvm.vector.extract (see below) > llvm.vector.insert (see below) > llvm.vector.reverse (as used by LoopVectorize) > llvm.vector.splice (as used by LoopVectorize) > > Our immediate interest here is with the llvm.vector.insert and llvm.vector.extract intrinsics, for which there is a proof of concept on Phabricator[1]. The llvm.vector.insert and llvm.llvm.vector.extract intrinsics are directly lowered to the INSERT_SUBVECTOR and EXTRACT_SUBVECTOR ISD nodes, and have the same semantics. We plan to simplify fixed-length variants of these intrinsics to shufflevector within LLVM IR to maintain existing code paths/optimisations. > > We intend to use the llvm.vector.insert and llvm.vector.extract intrinsics to avoid going through memory when generating IR that translates a C/C++ bitcast from scalable vectors to fixed-width vectors, and vice-versa. As an example, see clang/test/CodeGen/attr-arm-sve-vector-bits-cast.c, which shows that these bitcasts are currently done through memory. > > Joe > > [1]: https://reviews.llvm.org/D91362 > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Hi Christopher, I can definitely appreciate what you are saying. For now, though, these intrinsics satisfy our immediate concerns. That said, I am not opposed to removing these intrinsics and following a different approach (such as Eli's proposal) in the future if it is needed. Thanks! Joe> On 1 Dec 2020, at 12:33, Chris Tetreault <ctetreau at quicinc.com> wrote: > > Joe, > > I suppose my concern is that if we take these paths of least resistance, we'll never get to the goal of scalable vectors being first class citizens. If the goal is to get these experimental intrinsics in now in order to get work done, and to upgrade shufflevector to support these use cases in the long term rather than moving these intrinsics out of the experimental namespace, then I suppose that's fine. > > Thanks, > Christopher Tetreault > > -----Original Message----- > From: Joe Ellis <Joe.Ellis at arm.com> > Sent: Monday, November 30, 2020 9:04 AM > To: Chris Tetreault <ctetreau at quicinc.com> > Cc: LLVM Dev <llvm-dev at lists.llvm.org> > Subject: [EXT] Re: [llvm-dev] [RFC] Named shuffle intrinsics > > Hi Christopher, > > Thanks for your response! I am aware of Eli's RFC. There is definitely some overlap between the two proposals, but I think having named intrinsics to represent common vector shuffles is preferable for our needs. > > The original RFC is much wider in scope and more extensive in the changes that are required. We're trying to solve an immediate problem here: expressing shuffles on scalable vectors. The proof-of-concept on Phabricator currently places the intrinsics into the `llvm.experimental` namespace, so if we want to adopt a different way forward in the future that's certainly possible. > > Thanks! > Joe > >> On 30 Nov 2020, at 13:47, Chris Tetreault <ctetreau at quicinc.com> wrote: >> >> Joe, >> >> Last January, an RFC was posted for this very use case: http://lists.llvm.org/pipermail/llvm-dev/2020-January/138762.html. In that RFC, the proposal was to extend the actual shufflevector instruction to have a "named shuffles" mode. I'm curious, why have you chosen to go with intrinsics for this RFC rather than use an approach similar to the previous RFC? >> >> Thanks, >> Christopher Tetreault >> >> -----Original Message----- >> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Joe Ellis via llvm-dev >> Sent: Tuesday, November 24, 2020 7:58 AM >> To: llvm-dev at lists.llvm.org >> Subject: [EXT] [llvm-dev] [RFC] Named shuffle intrinsics >> >> Hi there, >> >> For fixed-length vectors, shufflevector can represent all possible shuffles. However, shufflevector uses an ArrayRef for its mask, which cannot work for scalable vectors. Splats are an exception to this in that shufflevector can represent scalable vector splats, but they are inconsistent with all other shuffles because the result's element count is taken from the first source vector rather than the mask. >> >> shufflevector could be extended with more support for scalable types, but it is not clear what advantage this has over using explicitly named shuffles. >> >> We are proposing having named shuffle intrinsics under the llvm.vector namespace that work for both fixed-length and scalable vectors. We do not intend to introduce new types of shuffles, and it is our expectation that all intrinsics will simplify to shufflevector when operating on fixed-length vectors. >> >> We would like to start with the following named shuffle intrinsics: >> >> llvm.vector.extract (see below) >> llvm.vector.insert (see below) >> llvm.vector.reverse (as used by LoopVectorize) >> llvm.vector.splice (as used by LoopVectorize) >> >> Our immediate interest here is with the llvm.vector.insert and llvm.vector.extract intrinsics, for which there is a proof of concept on Phabricator[1]. The llvm.vector.insert and llvm.llvm.vector.extract intrinsics are directly lowered to the INSERT_SUBVECTOR and EXTRACT_SUBVECTOR ISD nodes, and have the same semantics. We plan to simplify fixed-length variants of these intrinsics to shufflevector within LLVM IR to maintain existing code paths/optimisations. >> >> We intend to use the llvm.vector.insert and llvm.vector.extract intrinsics to avoid going through memory when generating IR that translates a C/C++ bitcast from scalable vectors to fixed-width vectors, and vice-versa. As an example, see clang/test/CodeGen/attr-arm-sve-vector-bits-cast.c, which shows that these bitcasts are currently done through memory. >> >> Joe >> >> [1]: https://reviews.llvm.org/D91362 >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >