Hi, There is now an RFC for a roadmap to native vector predication support in LLVM and a prototype implementation: https://reviews.llvm.org/D57504 The prototype demonstrates: - Predicated vector intrinsics with an explicit mask and vector length parameter on IR level. - First-class predicated SDNodes on ISel level. Mask and vector length are value operands. - An incremental strategy to generalize PatternMatch/InstCombine/InstSimplify and DAGCombiner to work on both regular instructions and EVL intrinsics. - DAGCombiner example: FMA fusion. - InstCombine/InstSimplify example: FSub pattern re-writes. - Early experiments on the LNT test suite (Clang static release, O3 -ffast-math) indicate that compile time on non-EVL IR is not affected by the API abstractions in PatternMatch, etc. We’d like to get your feedback, in particular on the following to move forward: - Can we agree on EVL intrinsics as a transitional step to predicated IR instructions? - Can we agree on native EVL SDNodes for CodeGen? - Are the changes to InstCombine/InstSimplify/DAGCombiner and utility classes that go with it acceptable? Thanks Simon -- Simon Moll Researcher / PhD Student Compiler Design Lab (Prof. Hack) Saarland University, Computer Science Building E1.3, Room 4.31 Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de Fax. +49 (0)681 302-3065 : http://compilers.cs.uni-saarland.de/people/moll -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190131/1d18aaf1/attachment.html>
Can I start w/some basic questions? I've skimmed your proposal, but haven't read it in detail, so if something I ask is already addressed, please feel free to cite existing docs/discussion/whatever. I'm going to use fsub as my running example, just because it's the first IR test you posted. :) %result = call <4 x float> @llvm.evl.fsub.v4f32(<4 x float> %x, <4 x float> %y, <4 x i1> %M, i32 %L) Question 1 - Why do we need separate mask and lengths? Can't the length be easily folded into the mask operand? e.g. newmask = (<4 x i1>)((i4)%y & (1 << %L -1)) and then pattern matched in the backend if needed Question 2 - Have you explored using selects instead? What practical problems do you run into which make you believe explicit predication is required? e.g. %sub = fsub <4 x float> %x, %y %result = select <4 x i1> %M, <4 x float> %sub, undef My context for these questions is that my experience recently w/o existing masked intrinsics shows us missing fairly basic optimizations, precisely because they weren't able to reuse all of the existing infrastructure. (I've been working on SimplifyDemandedVectorElts recently for exactly this reason.) My concern is that your EVL proposal will end up in the same state. Philip On 1/31/19 7:58 AM, Simon Moll via llvm-dev wrote:> Hi, > > There is now an RFC for a roadmap to native vector predication support > in LLVM and a prototype implementation: > > https://reviews.llvm.org/D57504 > > The prototype demonstrates: > > - Predicated vector intrinsics with an explicit mask and vector > length parameter on IR level. > - First-class predicated SDNodes on ISel level. Mask and vector > length are value operands. > - An incremental strategy to generalize > PatternMatch/InstCombine/InstSimplify and DAGCombiner to work on both > regular instructions and EVL intrinsics. > - DAGCombiner example: FMA fusion. > - InstCombine/InstSimplify example: FSub pattern re-writes. > - Early experiments on the LNT test suite (Clang static release, O3 > -ffast-math) indicate that compile time on non-EVL IR is not affected > by the API abstractions in PatternMatch, etc. > > We’d like to get your feedback, in particular on the following to move > forward: > > - Can we agree on EVL intrinsics as a transitional step to predicated > IR instructions? > - Can we agree on native EVL SDNodes for CodeGen? > - Are the changes to InstCombine/InstSimplify/DAGCombiner and utility > classes that go with it acceptable? > > Thanks > Simon > -- > > Simon Moll > Researcher / PhD Student > > Compiler Design Lab (Prof. Hack) > Saarland University, Computer Science > Building E1.3, Room 4.31 > > Tel. +49 (0)681 302-57521 :moll at cs.uni-saarland.de > Fax. +49 (0)681 302-3065 :http://compilers.cs.uni-saarland.de/people/moll > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190131/8156a40b/attachment-0001.html>
Philip Reames <listmail at philipreames.com> writes:> Question 1 - Why do we need separate mask and lengths? Can't the > length be easily folded into the mask operand? > > e.g. newmask = (<4 x i1>)((i4)%y & (1 << %L -1)) > and then pattern matched in the backend if neededI'm a little concerned about how difficult it will be to maintain enough information throughout compilation to be able to match this on a machine with an explicit vector length value.> Question 2 - Have you explored using selects instead? What practical > problems do you run into which make you believe explicit predication > is required? > > e.g. %sub = fsub <4 x float> %x, %y > %result = select <4 x i1> %M, <4 x float> %sub, undefThat is semantically incorrect. According to IR semantics, the fsub is fully evaluated before the select comes along. It could trap for elements where %M is 0, whereas a masked intrinsic conveys the proper semantics of masking traps for masked-out elements. We need intrinsics and eventually (IMHO) fully first-class predication to make this work properly.> My context for these questions is that my experience recently w/o > existing masked intrinsics shows us missing fairly basic > optimizations, precisely because they weren't able to reuse all of the > existing infrastructure. (I've been working on > SimplifyDemandedVectorElts recently for exactly this reason.) My > concern is that your EVL proposal will end up in the same state.I think that's just the nature of the beast. We need IR-level support for masking and we have to teach LLVM about it. -David
We're in-progress designing a RISC-V extension ( http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html) that would have variable-length vectors of short vectors (1 to 4): <VL x <4 x float>> where each predicate bit masks out a whole short vector. We're using this extension to vectorize graphics code where where variables in the pre-vectorization code are short vectors. So, vectorizing code like: for(int i = 0; i < 1000; i++) { vec4 color = colors[i]; vec3 normal = normals[i]; color.rgb *= fmax(0.0, dot(normal, light_dir)); colors[i] = color; } I'm planning on passing already vectorized code into LLVM and using LLVM as a backend for optimization and JIT code generation. Do you think the EVL proposal would support an ISA like this as it's currently written (by pattern matching on predicate expansion and vector-length multiplication)? Or, do you think the EVL proposal would need modification to effectively support this (by adding a element group size argument to EVL intrinsics or something)? Jacob Lifshay On Thu, Jan 31, 2019, 07:58 Simon Moll via llvm-dev <llvm-dev at lists.llvm.org wrote:> Hi, > > There is now an RFC for a roadmap to native vector predication support in > LLVM and a prototype implementation: > > https://reviews.llvm.org/D57504 > > The prototype demonstrates: > > - Predicated vector intrinsics with an explicit mask and vector length > parameter on IR level. > - First-class predicated SDNodes on ISel level. Mask and vector length > are value operands. > - An incremental strategy to generalize > PatternMatch/InstCombine/InstSimplify and DAGCombiner to work on both > regular instructions and EVL intrinsics. > - DAGCombiner example: FMA fusion. > - InstCombine/InstSimplify example: FSub pattern re-writes. > - Early experiments on the LNT test suite (Clang static release, O3 > -ffast-math) indicate that compile time on non-EVL IR is not affected by > the API abstractions in PatternMatch, etc. > > We’d like to get your feedback, in particular on the following to move > forward: > > - Can we agree on EVL intrinsics as a transitional step to predicated IR > instructions? > - Can we agree on native EVL SDNodes for CodeGen? > - Are the changes to InstCombine/InstSimplify/DAGCombiner and utility > classes that go with it acceptable? > > Thanks > Simon > > -- > > Simon Moll > Researcher / PhD Student > > Compiler Design Lab (Prof. Hack) > Saarland University, Computer Science > Building E1.3, Room 4.31 > > Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de > Fax. +49 (0)681 302-3065 : http://compilers.cs.uni-saarland.de/people/moll > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190131/543fa6e9/attachment-0001.html>
On Thu, Jan 31, 2019 at 9:03 AM Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Question 1 - Why do we need separate mask and lengths? Can't the length be easily folded into the mask operand?RISC-V has both masks and an active vector length and the semantics are different. TLDR: Masked-off elements in the destination register retain their previous value, but elements past the active vector length are zeroed. I'll quote from the current version of the draft spec: =========5.4. Active and Tail Element Definitions The elements within a vector instruction can be divided into four disjoint subsets. The prestart elements are those whose element index is less than the initial value in the vstart register. The prestart elements do not raise exceptions and do not update the destination vector register. The active elements during a vector instruction’s execution are the elements within the current vector length setting and where the current mask is enabled at that element position. The active elements can raise exceptions and update the destination vector register group. The inactive elements are the elements within the current vector length setting but where the current mask is disabled at that element position. The inactive elements do not raise exceptions and do not update the destination vector register. The tail elements during a vector instruction’s execution are the elements past the current vector length setting. The tail elements do not raise exceptions, but do zero the results in the destination vector register group. for element index x prestart = (0 <= x < vstart) active(x) = (vstart <= x < vl) && (unmasked || mask(x)) inactive(x) = (vstart <= x < vl) && !(unmasked || mask(x)) tail(x) = (vl <= x < VLMAX) All vector instructions place zeros in the tail elements of the destination vector register group. Some vector arithmetic instructions are not maskable, so have no inactive elements, but still zero the tail elements. ========= Note: vstart is almost always zero, exists to support interruptable vector instructions. "The vstart CSR is writable by unprivileged code, but non-zero vstart values may cause vector instructions to run substantially slower on some implementations, so vstart should not be used by application programmers."
Luke Kenneth Casson Leighton via llvm-dev
2019-Feb-01 07:52 UTC
[llvm-dev] [RFC] Vector Predication
--- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 On Thu, Jan 31, 2019 at 10:22 PM Jacob Lifshay <programmerjake at gmail.com> wrote:> > We're in-progress designing a RISC-V extension (http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html) that would have variable-length vectors of short vectors (1 to 4): > <VL x <4 x float>> > where each predicate bit masks out a whole short vector. We're using this extension to vectorize graphics code where where variables in the pre-vectorization code are short vectors. > So, vectorizing code like: > for(int i = 0; i < 1000; i++) > { > vec4 color = colors[i]; > vec3 normal = normals[i]; > color.rgb *= fmax(0.0, dot(normal, light_dir)); > colors[i] = color; > } > > I'm planning on passing already vectorized code into LLVM and using LLVM as a backend for optimization and JIT code generation. > > Do you think the EVL proposal would support an ISA like this as it's currently > written (by pattern matching on predicate expansion and vector-length > multiplication)?whilst it may be tempting to suggest that a solution is to multiply up the bits in the predicate (into groups of 3 or 4), the problem with that is that if there are operations that require vec3 or vec4 as operands interspersed with predicated operations that do not, that realistically implies a need for two separate predicate registers, otherwise cycles are wasted swapping predicates OR it implies that the architecture *allows* two separate predicate registers to be selected. consequently, it would be much, much better to be able to have a single bit of a predicate apply to the *entire* vec3 or vec4 type, on each outer loop. l.
Hi, On 1/31/19 11:20 PM, Jacob Lifshay wrote:> We're in-progress designing a RISC-V extension > (http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html) > that would have variable-length vectors of short vectors (1 to 4): > <VL x <4 x float>> > where each predicate bit masks out a whole short vector. We're using > this extension to vectorize graphics code where where variables in the > pre-vectorization code are short vectors. > So, vectorizing code like: > for(int i = 0; i < 1000; i++) > { > vec4 color = colors[i]; > vec3 normal = normals[i]; > color.rgb *= fmax(0.0, dot(normal, light_dir)); > colors[i] = color; > } > > I'm planning on passing already vectorized code into LLVM and using > LLVM as a backend for optimization and JIT code generation. > > Do you think the EVL proposal would support an ISA like this as it's > currently written (by pattern matching on predicate expansion and > vector-length multiplication)? > Or, do you think the EVL proposal would need modification to > effectively support this (by adding a element group size argument to > EVL intrinsics or something)?We could untie the mask length from the data length: %result = call <scalable 4 x float> @llvm.evl.fsub.v4f32(<scalable 4 x float> %x, <scalable 4 x float> %y, <scalable 1 x i1> %M, i32 %L) would then indicate the the mask %M applies to groups of "4 / 1" float elements. - Simon> Jacob Lifshay > > On Thu, Jan 31, 2019, 07:58 Simon Moll via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> wrote: > > Hi, > > There is now an RFC for a roadmap to native vector predication > support in LLVM and a prototype implementation: > > https://reviews.llvm.org/D57504 > > The prototype demonstrates: > > - Predicated vector intrinsics with an explicit mask and vector > length parameter on IR level. > - First-class predicated SDNodes on ISel level. Mask and vector > length are value operands. > - An incremental strategy to generalize > PatternMatch/InstCombine/InstSimplify and DAGCombiner to work on > both regular instructions and EVL intrinsics. > - DAGCombiner example: FMA fusion. > - InstCombine/InstSimplify example: FSub pattern re-writes. > - Early experiments on the LNT test suite (Clang static release, > O3 -ffast-math) indicate that compile time on non-EVL IR is not > affected by the API abstractions in PatternMatch, etc. > > We’d like to get your feedback, in particular on the following to > move forward: > > - Can we agree on EVL intrinsics as a transitional step to > predicated IR instructions? > - Can we agree on native EVL SDNodes for CodeGen? > - Are the changes to InstCombine/InstSimplify/DAGCombiner and > utility classes that go with it acceptable? > > Thanks > Simon > > -- > > Simon Moll > Researcher / PhD Student > > Compiler Design Lab (Prof. Hack) > Saarland University, Computer Science > Building E1.3, Room 4.31 > > Tel. +49 (0)681 302-57521 :moll at cs.uni-saarland.de <mailto:moll at cs.uni-saarland.de> > Fax. +49 (0)681 302-3065 :http://compilers.cs.uni-saarland.de/people/moll > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Simon Moll Researcher / PhD Student Compiler Design Lab (Prof. Hack) Saarland University, Computer Science Building E1.3, Room 4.31 Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de Fax. +49 (0)681 302-3065 : http://compilers.cs.uni-saarland.de/people/moll -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190201/322abdca/attachment-0001.html>