On Thu, Jan 31, 2019 at 4:05 PM Philip Reames via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Do such architectures frequently have arithmetic operations on the mask registers? (i.e. can I reasonable compute a conservative length given a mask register value) If I can, then having a mask as the canonical form and re-deriving the length register from a mask for a sequence of instructions which share a predicate seems fairly reasonable. Note that I'm assuming this as a fallback, and that the common case is handled via the equivalent of ComputeKnownBits on the mask itself at compile time.If masking is used (which it is usually not for loops without control flow inside the vectorised loop) then, yes, logical operations on the mask registers will happen at every basic block boundary. But it is NOT the case that you can computer the active vector length VL from an initial mask value. The active vector length is set by the hardware based on the remaining application vector length. The VL can change for each loop iteration -- the normal pattern is for VL to equal VLMAX for initial executions of the loop, and then be less than VLMAX for the final one or two iterations of the loop. For example if VLMAX is 16 and there are 19 elements left in the application vector then the hardware might choose to use 10 elements for the 2nd to last iteration and 9 elements for the last iteration. Or not. Other hardware might choose to perform the last three iterations as 12/12/11 instead of 16/10/9. (It is constrained to be monotonic). VL can also be dynamically shortened in the middle of a loop iteration by an unaligned vector load that crosses a protection boundary if the later elements are inaccessible. I'm curious what SVE will do if there is an if/then/else in the middle of a vectorised loop with a shorter-than-maximum vector length. You can't just invert the mask when going from the then-part to the else-part because that would re-enable elements past the end of the vector. You'd need to invert the mask and then AND it with the mask containing the (bitwise representation of) the vector length.
On 1/31/19 4:57 PM, Bruce Hoult wrote:> On Thu, Jan 31, 2019 at 4:05 PM Philip Reames via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Do such architectures frequently have arithmetic operations on the mask registers? (i.e. can I reasonable compute a conservative length given a mask register value) If I can, then having a mask as the canonical form and re-deriving the length register from a mask for a sequence of instructions which share a predicate seems fairly reasonable. Note that I'm assuming this as a fallback, and that the common case is handled via the equivalent of ComputeKnownBits on the mask itself at compile time. > If masking is used (which it is usually not for loops without control > flow inside the vectorised loop) then, yes, logical operations on the > mask registers will happen at every basic block boundary. > > But it is NOT the case that you can computer the active vector length > VL from an initial mask value. The active vector length is set by the > hardware based on the remaining application vector length. The VL can > change for each loop iteration -- the normal pattern is for VL to > equal VLMAX for initial executions of the loop, and then be less than > VLMAX for the final one or two iterations of the loop. For example if > VLMAX is 16 and there are 19 elements left in the application vector > then the hardware might choose to use 10 elements for the 2nd to last > iteration and 9 elements for the last iteration. Or not. Other > hardware might choose to perform the last three iterations as 12/12/11 > instead of 16/10/9. (It is constrained to be monotonic). > > VL can also be dynamically shortened in the middle of a loop iteration > by an unaligned vector load that crosses a protection boundary if the > later elements are inaccessible.I can't reconcile this complexity with either the snippet on RISV which was shared, or the current EVL proposal. Doesn't this imply that the vector length can change between *every* pair of vector instructions? If so, how does having it as part of the EVL intrinsics work?> > I'm curious what SVE will do if there is an if/then/else in the middle > of a vectorised loop with a shorter-than-maximum vector length. You > can't just invert the mask when going from the then-part to the > else-part because that would re-enable elements past the end of the > vector. You'd need to invert the mask and then AND it with the mask > containing the (bitwise representation of) the vector length.
On 2/5/19 1:27 AM, Philip Reames via llvm-dev wrote:> > On 1/31/19 4:57 PM, Bruce Hoult wrote: >> On Thu, Jan 31, 2019 at 4:05 PM Philip Reames via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >>> Do such architectures frequently have arithmetic operations on the >>> mask registers? (i.e. can I reasonable compute a conservative >>> length given a mask register value) If I can, then having a mask as >>> the canonical form and re-deriving the length register from a mask >>> for a sequence of instructions which share a predicate seems fairly >>> reasonable. Note that I'm assuming this as a fallback, and that the >>> common case is handled via the equivalent of ComputeKnownBits on the >>> mask itself at compile time. >> If masking is used (which it is usually not for loops without control >> flow inside the vectorised loop) then, yes, logical operations on the >> mask registers will happen at every basic block boundary. >> >> But it is NOT the case that you can computer the active vector length >> VL from an initial mask value. The active vector length is set by the >> hardware based on the remaining application vector length. The VL can >> change for each loop iteration -- the normal pattern is for VL to >> equal VLMAX for initial executions of the loop, and then be less than >> VLMAX for the final one or two iterations of the loop. For example if >> VLMAX is 16 and there are 19 elements left in the application vector >> then the hardware might choose to use 10 elements for the 2nd to last >> iteration and 9 elements for the last iteration. Or not. Other >> hardware might choose to perform the last three iterations as 12/12/11 >> instead of 16/10/9. (It is constrained to be monotonic). >> >> VL can also be dynamically shortened in the middle of a loop iteration >> by an unaligned vector load that crosses a protection boundary if the >> later elements are inaccessible. > I can't reconcile this complexity with either the snippet on RISV > which was shared, or the current EVL proposal. Doesn't this imply > that the vector length can change between *every* pair of vector > instructions? If so, how does having it as part of the EVL intrinsics > work?I think this is the usual mixup of AVL and MVL. AVL: is part of the predicate and can change between vector operations just like a mask can (light weight). MVL: Is the physical vector register length and can be re-configured per function (RVV only atm) - (heavy weight, stop-the-world instruction). The vectorlen parameter in EVL intrinsics is for the AVL.>> >> I'm curious what SVE will do if there is an if/then/else in the middle >> of a vectorised loop with a shorter-than-maximum vector length. You >> can't just invert the mask when going from the then-part to the >> else-part because that would re-enable elements past the end of the >> vector. You'd need to invert the mask and then AND it with the mask >> containing the (bitwise representation of) the vector length.I folks have issues with carrying the vlen around even if the target only supports masking, we can rephrase EVL using higher-order functions with varargs (basically prefixing): ARM SVE, AVX512 (mask only targets): llvm.evl.masked(<16 x i1> mask %M, ...) llvm.evl.fsub(<16 x float>, <16 x float>) ; exists only to get a function handls call @llvm.evl.masked.v16f32(%M, @llvm.evl.fsub(v16f32, <16 x float>, <16 x float>) RISC-V V, SX-Aurora: llvm.evl.pred(<16 x i1> mask %M, i32 vlen %VL, ...) llvm.evl.pred(%M, %vl, @llvm.evl.fsub, %a, %b) The problem with this is mostly that the operand positions are now off compared to regular IR and the API abstractions that accept both will have to account for that. - Simon> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Simon Moll Researcher / PhD Student Compiler Design Lab (Prof. Hack) Saarland University, Computer Science Building E1.3, Room 4.31 Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de Fax. +49 (0)681 302-3065 : http://compilers.cs.uni-saarland.de/people/moll