thr3ads.net - llvm dev - [llvm-dev] [RFC] Vector Predication [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Bruce Hoult via llvm-dev

2019-Feb-01 00:57 UTC

[llvm-dev] [RFC] Vector Predication

On Thu, Jan 31, 2019 at 4:05 PM Philip Reames via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Do such architectures frequently have arithmetic operations on the mask
registers? (i.e. can I reasonable compute a conservative length given a mask
register value) If I can, then having a mask as the canonical form and
re-deriving the length register from a mask for a sequence of instructions which
share a predicate seems fairly reasonable. Note that I'm assuming this as a
fallback, and that the common case is handled via the equivalent of
ComputeKnownBits on the mask itself at compile time.
If masking is used (which it is usually not for loops without control
flow inside the vectorised loop) then, yes, logical operations on the
mask registers will happen at every basic block boundary.

But it is NOT the case that you can computer the active vector length
VL from an initial mask value. The active vector length is set by the
hardware based on the remaining application vector length. The VL can
change for each loop iteration -- the normal pattern is for VL to
equal VLMAX for initial executions of the loop, and then be less than
VLMAX for the final one or two iterations of the loop. For example if
VLMAX is 16 and there are 19 elements left in the application vector
then the hardware might choose to use 10 elements for the 2nd to last
iteration and 9 elements for the last iteration. Or not. Other
hardware might choose to perform the last three iterations as 12/12/11
instead of 16/10/9. (It is constrained to be monotonic).

VL can also be dynamically shortened in the middle of a loop iteration
by an unaligned vector load that crosses a protection boundary if the
later elements are inaccessible.

I'm curious what SVE will do if there is an if/then/else in the middle
of a vectorised loop with a shorter-than-maximum vector length. You
can't just invert the mask when going from the then-part to the
else-part because that would re-enable elements past the end of the
vector. You'd need to invert the mask and then AND it with the mask
containing the (bitwise representation of) the vector length.

Philip Reames via llvm-dev

2019-Feb-05 00:27 UTC

head link

[llvm-dev] [RFC] Vector Predication

On 1/31/19 4:57 PM, Bruce Hoult wrote:> On Thu, Jan 31, 2019 at 4:05 PM Philip Reames via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Do such architectures frequently have arithmetic operations on the mask
registers?  (i.e. can I reasonable compute a conservative length given a mask
register value)  If I can, then having a mask as the canonical form and
re-deriving the length register from a mask for a sequence of instructions which
share a predicate seems fairly reasonable.  Note that I'm assuming this as a
fallback, and that the common case is handled via the equivalent of
ComputeKnownBits on the mask itself at compile time.
> If masking is used (which it is usually not for loops without control
> flow inside the vectorised loop) then, yes, logical operations on the
> mask registers will happen at every basic block boundary.
>
> But it is NOT the case that you can computer the active vector length
> VL from an initial mask value. The active vector length is set by the
> hardware based on the remaining application vector length. The VL can
> change for each loop iteration -- the normal pattern is for VL to
> equal VLMAX for initial executions of the loop, and then be less than
> VLMAX for the final one or two iterations of the loop. For example if
> VLMAX is 16 and there are 19 elements left in the application vector
> then the hardware might choose to use 10 elements for the 2nd to last
> iteration and 9 elements for the last iteration. Or not. Other
> hardware might choose to perform the last three iterations as 12/12/11
> instead of 16/10/9. (It is constrained to be monotonic).
>
> VL can also be dynamically shortened in the middle of a loop iteration
> by an unaligned vector load that crosses a protection boundary if the
> later elements are inaccessible.I can't reconcile this complexity with either the snippet on RISV which 
was shared, or the current EVL proposal.  Doesn't this imply that the 
vector length can change between *every* pair of vector instructions?  
If so, how does having it as part of the EVL intrinsics
work?>
> I'm curious what SVE will do if there is an if/then/else in the middle
> of a vectorised loop with a shorter-than-maximum vector length. You
> can't just invert the mask when going from the then-part to the
> else-part because that would re-enable elements past the end of the
> vector. You'd need to invert the mask and then AND it with the mask
> containing the (bitwise representation of) the vector length.

Simon Moll via llvm-dev

2019-Feb-05 08:11 UTC

head link

[llvm-dev] [RFC] Vector Predication

On 2/5/19 1:27 AM, Philip Reames via llvm-dev wrote:>
> On 1/31/19 4:57 PM, Bruce Hoult wrote:
>> On Thu, Jan 31, 2019 at 4:05 PM Philip Reames via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> Do such architectures frequently have arithmetic operations on the 
>>> mask registers?  (i.e. can I reasonable compute a conservative 
>>> length given a mask register value)  If I can, then having a mask
as
>>> the canonical form and re-deriving the length register from a mask 
>>> for a sequence of instructions which share a predicate seems fairly
>>> reasonable. Note that I'm assuming this as a fallback, and that
the
>>> common case is handled via the equivalent of ComputeKnownBits on
the
>>> mask itself at compile time.
>> If masking is used (which it is usually not for loops without control
>> flow inside the vectorised loop) then, yes, logical operations on the
>> mask registers will happen at every basic block boundary.
>>
>> But it is NOT the case that you can computer the active vector length
>> VL from an initial mask value. The active vector length is set by the
>> hardware based on the remaining application vector length. The VL can
>> change for each loop iteration -- the normal pattern is for VL to
>> equal VLMAX for initial executions of the loop, and then be less than
>> VLMAX for the final one or two iterations of the loop. For example if
>> VLMAX is 16 and there are 19 elements left in the application vector
>> then the hardware might choose to use 10 elements for the 2nd to last
>> iteration and 9 elements for the last iteration. Or not. Other
>> hardware might choose to perform the last three iterations as 12/12/11
>> instead of 16/10/9. (It is constrained to be monotonic).
>>
>> VL can also be dynamically shortened in the middle of a loop iteration
>> by an unaligned vector load that crosses a protection boundary if the
>> later elements are inaccessible.
> I can't reconcile this complexity with either the snippet on RISV 
> which was shared, or the current EVL proposal.  Doesn't this imply 
> that the vector length can change between *every* pair of vector 
> instructions?  If so, how does having it as part of the EVL intrinsics 
> work?
I think this is the usual mixup of AVL and MVL.

AVL: is part of the predicate and can change between vector operations 
just like a mask can (light weight).

MVL: Is the physical vector register length and can be re-configured per 
function (RVV only atm) - (heavy weight, stop-the-world instruction).

The vectorlen parameter in EVL intrinsics is for the AVL.
>>
>> I'm curious what SVE will do if there is an if/then/else in the
middle
>> of a vectorised loop with a shorter-than-maximum vector length. You
>> can't just invert the mask when going from the then-part to the
>> else-part because that would re-enable elements past the end of the
>> vector. You'd need to invert the mask and then AND it with the mask
>> containing the (bitwise representation of) the vector length.
I folks have issues with carrying the vlen around even if the target 
only supports masking, we can rephrase EVL using higher-order functions 
with varargs (basically prefixing):

ARM SVE, AVX512 (mask only targets):

    llvm.evl.masked(<16 x i1> mask %M, ...)

     llvm.evl.fsub(<16 x float>, <16 x float>)  ; exists only to get
a
function handls

     call @llvm.evl.masked.v16f32(%M, @llvm.evl.fsub(v16f32, <16 x 
float>, <16 x float>)

RISC-V V, SX-Aurora:

     llvm.evl.pred(<16 x i1> mask %M, i32 vlen %VL, ...)

     llvm.evl.pred(%M, %vl, @llvm.evl.fsub, %a, %b)

The problem with this is mostly that the operand positions are now off 
compared to regular IR and the API abstractions that accept both will 
have to account for that.

- Simon
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll

llvm dev - Feb 2019 - [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication