thr3ads.net - llvm dev - [llvm-dev] [RFC] Vector Predication [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Simon Moll via llvm-dev

2019-Jan-31 15:58 UTC

[llvm-dev] [RFC] Vector Predication

Hi,

There is now an RFC for a roadmap to native vector predication support 
in LLVM and a prototype implementation:

   https://reviews.llvm.org/D57504

The prototype demonstrates:

-  Predicated vector intrinsics with an explicit mask and vector length 
parameter on IR level.
-  First-class predicated SDNodes on ISel level. Mask and vector length 
are value operands.
-  An incremental strategy to generalize 
PatternMatch/InstCombine/InstSimplify and DAGCombiner to work on both 
regular instructions and EVL intrinsics.
-  DAGCombiner example: FMA fusion.
-  InstCombine/InstSimplify example: FSub pattern re-writes.
-  Early experiments on the LNT test suite (Clang static release, O3 
-ffast-math) indicate that compile time on non-EVL IR is not affected by 
the API abstractions in PatternMatch, etc.

We’d like to get your feedback, in particular on the following to move 
forward:

-  Can we agree on EVL intrinsics as a transitional step to predicated 
IR instructions?
-  Can we agree on native EVL SDNodes for CodeGen?
-  Are the changes to InstCombine/InstSimplify/DAGCombiner and utility 
classes that go with it acceptable?

Thanks
Simon

-- 

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190131/1d18aaf1/attachment.html>

Philip Reames via llvm-dev

2019-Jan-31 17:02 UTC

head link

[llvm-dev] [RFC] Vector Predication

Can I start w/some basic questions?   I've skimmed your proposal, but 
haven't read it in detail, so if something I ask is already addressed, 
please feel free to cite existing docs/discussion/whatever.

I'm going to use fsub as my running example, just because it's the first
IR test you posted. :)

%result = call <4 x float> @llvm.evl.fsub.v4f32(<4 x float> %x,
<4 x
float> %y, <4 x i1> %M, i32 %L)

Question 1 - Why do we need separate mask and lengths?  Can't the length 
be easily folded into the mask operand?

e.g. newmask = (<4 x i1>)((i4)%y & (1 << %L -1))
and then pattern matched in the backend if needed

Question 2 - Have you explored using selects instead?  What practical 
problems do you run into which make you believe explicit predication is 
required?

e.g. %sub = fsub <4 x float> %x, %y
         %result = select <4 x i1> %M, <4 x float> %sub, undef

My context for these questions is that my experience recently w/o 
existing masked intrinsics shows us missing fairly basic optimizations, 
precisely because they weren't able to reuse all of the existing 
infrastructure.  (I've been working on SimplifyDemandedVectorElts 
recently for exactly this reason.)  My concern is that your EVL proposal 
will end up in the same state.

Philip

On 1/31/19 7:58 AM, Simon Moll via llvm-dev wrote:> Hi,
>
> There is now an RFC for a roadmap to native vector predication support 
> in LLVM and a prototype implementation:
>
> https://reviews.llvm.org/D57504
>
> The prototype demonstrates:
>
> -  Predicated vector intrinsics with an explicit mask and vector 
> length parameter on IR level.
> -  First-class predicated SDNodes on ISel level. Mask and vector 
> length are value operands.
> -  An incremental strategy to generalize 
> PatternMatch/InstCombine/InstSimplify and DAGCombiner to work on both 
> regular instructions and EVL intrinsics.
> -  DAGCombiner example: FMA fusion.
> -  InstCombine/InstSimplify example: FSub pattern re-writes.
> -  Early experiments on the LNT test suite (Clang static release, O3 
> -ffast-math) indicate that compile time on non-EVL IR is not affected 
> by the API abstractions in PatternMatch, etc.
>
> We’d like to get your feedback, in particular on the following to move 
> forward:
>
> -  Can we agree on EVL intrinsics as a transitional step to predicated 
> IR instructions?
> -  Can we agree on native EVL SDNodes for CodeGen?
> -  Are the changes to InstCombine/InstSimplify/DAGCombiner and utility 
> classes that go with it acceptable?
>
> Thanks
> Simon
> -- 
>
> Simon Moll
> Researcher / PhD Student
>
> Compiler Design Lab (Prof. Hack)
> Saarland University, Computer Science
> Building E1.3, Room 4.31
>
> Tel. +49 (0)681 302-57521 :moll at cs.uni-saarland.de
> Fax. +49 (0)681 302-3065  :http://compilers.cs.uni-saarland.de/people/moll
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190131/8156a40b/attachment-0001.html>

David Greene via llvm-dev

2019-Jan-31 19:03 UTC

head link

[llvm-dev] [RFC] Vector Predication

Philip Reames <listmail at philipreames.com> writes:
> Question 1 - Why do we need separate mask and lengths? Can't the
> length be easily folded into the mask operand? 
>
> e.g. newmask = (<4 x i1>)((i4)%y & (1 << %L -1))
> and then pattern matched in the backend if needed
I'm a little concerned about how difficult it will be to maintain enough
information throughout compilation to be able to match this on a machine
with an explicit vector length value.
> Question 2 - Have you explored using selects instead? What practical
> problems do you run into which make you believe explicit predication
> is required?
>
> e.g. %sub = fsub <4 x float> %x, %y
> %result = select <4 x i1> %M, <4 x float> %sub, undef
That is semantically incorrect.  According to IR semantics, the fsub is
fully evaluated before the select comes along.  It could trap for
elements where %M is 0, whereas a masked intrinsic conveys the proper
semantics of masking traps for masked-out elements.  We need intrinsics
and eventually (IMHO) fully first-class predication to make this work
properly.
> My context for these questions is that my experience recently w/o
> existing masked intrinsics shows us missing fairly basic
> optimizations, precisely because they weren't able to reuse all of the
> existing infrastructure. (I've been working on
> SimplifyDemandedVectorElts recently for exactly this reason.) My
> concern is that your EVL proposal will end up in the same state.
I think that's just the nature of the beast.  We need IR-level support
for masking and we have to teach LLVM about it.

                           -David

Jacob Lifshay via llvm-dev

2019-Jan-31 22:20 UTC

head link

[llvm-dev] [RFC] Vector Predication

We're in-progress designing a RISC-V extension (
http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html)
that would have variable-length vectors of short vectors (1 to 4):
<VL x <4 x float>>
where each predicate bit masks out a whole short vector. We're using this
extension to vectorize graphics code where where variables in the
pre-vectorization code are short vectors.
So, vectorizing code like:
for(int i = 0; i < 1000; i++)
{
    vec4 color = colors[i];
    vec3 normal = normals[i];
    color.rgb *= fmax(0.0, dot(normal, light_dir));
    colors[i] = color;
}

I'm planning on passing already vectorized code into LLVM and using LLVM as
a backend for optimization and JIT code generation.

Do you think the EVL proposal would support an ISA like this as it's
currently written (by pattern matching on predicate expansion and
vector-length multiplication)?
Or, do you think the EVL proposal would need modification to effectively
support this (by adding a element group size argument to EVL intrinsics or
something)?

Jacob Lifshay

On Thu, Jan 31, 2019, 07:58 Simon Moll via llvm-dev <llvm-dev at
lists.llvm.org
wrote:
> Hi,
>
> There is now an RFC for a roadmap to native vector predication support in
> LLVM and a prototype implementation:
>
>   https://reviews.llvm.org/D57504
>
> The prototype demonstrates:
>
> -  Predicated vector intrinsics with an explicit mask and vector length
> parameter on IR level.
> -  First-class predicated SDNodes on ISel level. Mask and vector length
> are value operands.
> -  An incremental strategy to generalize
> PatternMatch/InstCombine/InstSimplify and DAGCombiner to work on both
> regular instructions and EVL intrinsics.
> -  DAGCombiner example: FMA fusion.
> -  InstCombine/InstSimplify example: FSub pattern re-writes.
> -  Early experiments on the LNT test suite (Clang static release, O3
> -ffast-math) indicate that compile time on non-EVL IR is not affected by
> the API abstractions in PatternMatch, etc.
>
> We’d like to get your feedback, in particular on the following to move
> forward:
>
> -  Can we agree on EVL intrinsics as a transitional step to predicated IR
> instructions?
> -  Can we agree on native EVL SDNodes for CodeGen?
> -  Are the changes to InstCombine/InstSimplify/DAGCombiner and utility
> classes that go with it acceptable?
>
> Thanks
> Simon
>
> --
>
> Simon Moll
> Researcher / PhD Student
>
> Compiler Design Lab (Prof. Hack)
> Saarland University, Computer Science
> Building E1.3, Room 4.31
>
> Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
> Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190131/543fa6e9/attachment-0001.html>

Bruce Hoult via llvm-dev

2019-Feb-01 00:38 UTC

head link

[llvm-dev] [RFC] Vector Predication

On Thu, Jan 31, 2019 at 9:03 AM Philip Reames via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Question 1 - Why do we need separate mask and lengths?  Can't the
length be easily folded into the mask operand?
RISC-V has both masks and an active vector length and the semantics
are different.

TLDR: Masked-off elements in the destination register retain their
previous value, but elements past the active vector length are zeroed.

I'll quote from the current version of the draft spec:

=========5.4. Active and Tail Element Definitions

The elements within a vector instruction can be divided into four
disjoint subsets.

The prestart elements are those whose element index is less than the
initial value in the vstart register. The prestart elements do not
raise exceptions and do not update the destination vector register.

The active elements during a vector instruction’s execution are the
elements within the current vector length setting and where the
current mask is enabled at that element position. The active elements
can raise exceptions and update the destination vector register group.

The inactive elements are the elements within the current vector
length setting but where the current mask is disabled at that element
position. The inactive elements do not raise exceptions and do not
update the destination vector register.

The tail elements during a vector instruction’s execution are the
elements past the current vector length setting. The tail elements do
not raise exceptions, but do zero the results in the destination
vector register group.

    for element index x
    prestart    = (0 <= x < vstart)
    active(x)   = (vstart <= x < vl) && (unmasked || mask(x))
    inactive(x) = (vstart <= x < vl) && !(unmasked || mask(x))
    tail(x)     = (vl <= x < VLMAX)

All vector instructions place zeros in the tail elements of the
destination vector register group. Some vector arithmetic instructions
are not maskable, so have no inactive elements, but still zero the
tail elements.
=========
Note: vstart is almost always zero, exists to support interruptable
vector instructions. "The vstart CSR is writable by unprivileged code,
but non-zero vstart values may cause vector instructions to run
substantially slower on some implementations, so vstart should not be
used by application programmers."

Luke Kenneth Casson Leighton via llvm-dev

2019-Feb-01 07:52 UTC

head link

[llvm-dev] [RFC] Vector Predication

---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Thu, Jan 31, 2019 at 10:22 PM Jacob Lifshay <programmerjake at
gmail.com> wrote:>
> We're in-progress designing a RISC-V extension
(http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html)
that would have variable-length vectors of short vectors (1 to 4):
> <VL x <4 x float>>
> where each predicate bit masks out a whole short vector. We're using
this extension to vectorize graphics code where where variables in the
pre-vectorization code are short vectors.
> So, vectorizing code like:
> for(int i = 0; i < 1000; i++)
> {
>     vec4 color = colors[i];
>     vec3 normal = normals[i];
>     color.rgb *= fmax(0.0, dot(normal, light_dir));
>     colors[i] = color;
> }
>
> I'm planning on passing already vectorized code into LLVM and using
LLVM as a backend for optimization and JIT code generation.
>
> Do you think the EVL proposal would support an ISA like this as it's
currently
> written (by pattern matching on predicate expansion and vector-length
> multiplication)?
whilst it may be tempting to suggest that a solution is to multiply up
the bits in the predicate (into groups of 3 or 4), the problem with
that is that if there are operations that require vec3 or vec4 as
operands interspersed with predicated operations that do not, that
realistically implies a need for two separate predicate registers,
otherwise cycles are wasted swapping predicates OR it implies that the
architecture *allows* two separate predicate registers to be selected.

 consequently, it would be much, much better to be able to have a
single bit of a predicate apply to the *entire* vec3 or vec4 type, on
each outer loop.

l.

Simon Moll via llvm-dev

2019-Feb-01 09:54 UTC

head link

[llvm-dev] [RFC] Vector Predication

Hi,

On 1/31/19 11:20 PM, Jacob Lifshay wrote:> We're in-progress designing a RISC-V extension 
>
(http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html)
> that would have variable-length vectors of short vectors (1 to 4):
> <VL x <4 x float>>
> where each predicate bit masks out a whole short vector. We're using 
> this extension to vectorize graphics code where where variables in the 
> pre-vectorization code are short vectors.
> So, vectorizing code like:
> for(int i = 0; i < 1000; i++)
> {
>     vec4 color = colors[i];
>     vec3 normal = normals[i];
>     color.rgb *= fmax(0.0, dot(normal, light_dir));
>     colors[i] = color;
> }
>
> I'm planning on passing already vectorized code into LLVM and using 
> LLVM as a backend for optimization and JIT code generation.
>
> Do you think the EVL proposal would support an ISA like this as it's 
> currently written (by pattern matching on predicate expansion and 
> vector-length multiplication)?
> Or, do you think the EVL proposal would need modification to 
> effectively support this (by adding a element group size argument to 
> EVL intrinsics or something)?
We could untie the mask length from the data length:

   %result = call <scalable 4 x float> @llvm.evl.fsub.v4f32(<scalable 4
x float> %x, <scalable 4 x float> %y, <scalable 1 x i1> %M, i32
%L)

would then indicate the the mask %M applies to groups of "4 / 1" float
elements.

- Simon

> Jacob Lifshay
>
> On Thu, Jan 31, 2019, 07:58 Simon Moll via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
wrote:
>
>     Hi,
>
>     There is now an RFC for a roadmap to native vector predication
>     support in LLVM and a prototype implementation:
>
>     https://reviews.llvm.org/D57504
>
>     The prototype demonstrates:
>
>     -  Predicated vector intrinsics with an explicit mask and vector
>     length parameter on IR level.
>     -  First-class predicated SDNodes on ISel level. Mask and vector
>     length are value operands.
>     -  An incremental strategy to generalize
>     PatternMatch/InstCombine/InstSimplify and DAGCombiner to work on
>     both regular instructions and EVL intrinsics.
>     -  DAGCombiner example: FMA fusion.
>     -  InstCombine/InstSimplify example: FSub pattern re-writes.
>     -  Early experiments on the LNT test suite (Clang static release,
>     O3 -ffast-math) indicate that compile time on non-EVL IR is not
>     affected by the API abstractions in PatternMatch, etc.
>
>     We’d like to get your feedback, in particular on the following to
>     move forward:
>
>     -  Can we agree on EVL intrinsics as a transitional step to
>     predicated IR instructions?
>     -  Can we agree on native EVL SDNodes for CodeGen?
>     -  Are the changes to InstCombine/InstSimplify/DAGCombiner and
>     utility classes that go with it acceptable?
>
>     Thanks
>     Simon
>
>     -- 
>
>     Simon Moll
>     Researcher / PhD Student
>
>     Compiler Design Lab (Prof. Hack)
>     Saarland University, Computer Science
>     Building E1.3, Room 4.31
>
>     Tel. +49 (0)681 302-57521 :moll at cs.uni-saarland.de  <mailto:moll
at cs.uni-saarland.de>
>     Fax. +49 (0)681 302-3065 
:http://compilers.cs.uni-saarland.de/people/moll
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-- 

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190201/322abdca/attachment-0001.html>

llvm dev - Feb 2019 - [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication