thr3ads.net - llvm dev - [llvm-dev] [RFC] Vector Predication [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Simon Moll via llvm-dev

2019-Feb-01 09:54 UTC

[llvm-dev] [RFC] Vector Predication

Hi,

On 1/31/19 11:20 PM, Jacob Lifshay wrote:> We're in-progress designing a RISC-V extension 
>
(http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html)
> that would have variable-length vectors of short vectors (1 to 4):
> <VL x <4 x float>>
> where each predicate bit masks out a whole short vector. We're using 
> this extension to vectorize graphics code where where variables in the 
> pre-vectorization code are short vectors.
> So, vectorizing code like:
> for(int i = 0; i < 1000; i++)
> {
>     vec4 color = colors[i];
>     vec3 normal = normals[i];
>     color.rgb *= fmax(0.0, dot(normal, light_dir));
>     colors[i] = color;
> }
>
> I'm planning on passing already vectorized code into LLVM and using 
> LLVM as a backend for optimization and JIT code generation.
>
> Do you think the EVL proposal would support an ISA like this as it's 
> currently written (by pattern matching on predicate expansion and 
> vector-length multiplication)?
> Or, do you think the EVL proposal would need modification to 
> effectively support this (by adding a element group size argument to 
> EVL intrinsics or something)?
We could untie the mask length from the data length:

   %result = call <scalable 4 x float> @llvm.evl.fsub.v4f32(<scalable 4
x float> %x, <scalable 4 x float> %y, <scalable 1 x i1> %M, i32
%L)

would then indicate the the mask %M applies to groups of "4 / 1" float
elements.

- Simon

> Jacob Lifshay
>
> On Thu, Jan 31, 2019, 07:58 Simon Moll via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
wrote:
>
>     Hi,
>
>     There is now an RFC for a roadmap to native vector predication
>     support in LLVM and a prototype implementation:
>
>     https://reviews.llvm.org/D57504
>
>     The prototype demonstrates:
>
>     -  Predicated vector intrinsics with an explicit mask and vector
>     length parameter on IR level.
>     -  First-class predicated SDNodes on ISel level. Mask and vector
>     length are value operands.
>     -  An incremental strategy to generalize
>     PatternMatch/InstCombine/InstSimplify and DAGCombiner to work on
>     both regular instructions and EVL intrinsics.
>     -  DAGCombiner example: FMA fusion.
>     -  InstCombine/InstSimplify example: FSub pattern re-writes.
>     -  Early experiments on the LNT test suite (Clang static release,
>     O3 -ffast-math) indicate that compile time on non-EVL IR is not
>     affected by the API abstractions in PatternMatch, etc.
>
>     We’d like to get your feedback, in particular on the following to
>     move forward:
>
>     -  Can we agree on EVL intrinsics as a transitional step to
>     predicated IR instructions?
>     -  Can we agree on native EVL SDNodes for CodeGen?
>     -  Are the changes to InstCombine/InstSimplify/DAGCombiner and
>     utility classes that go with it acceptable?
>
>     Thanks
>     Simon
>
>     -- 
>
>     Simon Moll
>     Researcher / PhD Student
>
>     Compiler Design Lab (Prof. Hack)
>     Saarland University, Computer Science
>     Building E1.3, Room 4.31
>
>     Tel. +49 (0)681 302-57521 :moll at cs.uni-saarland.de  <mailto:moll
at cs.uni-saarland.de>
>     Fax. +49 (0)681 302-3065 
:http://compilers.cs.uni-saarland.de/people/moll
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-- 

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190201/322abdca/attachment-0001.html>

Jacob Lifshay via llvm-dev

2019-Feb-01 10:10 UTC

head link

[llvm-dev] [RFC] Vector Predication

On Fri, Feb 1, 2019 at 1:54 AM Simon Moll <moll at cs.uni-saarland.de>
wrote:
> Hi,
> On 1/31/19 11:20 PM, Jacob Lifshay wrote:
>
> We're in-progress designing a RISC-V extension (
>
http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html)
> that would have variable-length vectors of short vectors (1 to 4):
> <VL x <4 x float>>
> where each predicate bit masks out a whole short vector. We're using
this
> extension to vectorize graphics code where where variables in the
> pre-vectorization code are short vectors.
> So, vectorizing code like:
> for(int i = 0; i < 1000; i++)
> {
>     vec4 color = colors[i];
>     vec3 normal = normals[i];
>     color.rgb *= fmax(0.0, dot(normal, light_dir));
>     colors[i] = color;
> }
>
> I'm planning on passing already vectorized code into LLVM and using
LLVM
> as a backend for optimization and JIT code generation.
>
> Do you think the EVL proposal would support an ISA like this as it's
> currently written (by pattern matching on predicate expansion and
> vector-length multiplication)?
> Or, do you think the EVL proposal would need modification to effectively
> support this (by adding a element group size argument to EVL intrinsics or
> something)?
>
> We could untie the mask length from the data length:
>
>   %result = call <scalable 4 x float>
@llvm.evl.fsub.v4f32(<scalable 4 x
> float> %x, <scalable 4 x float> %y, <scalable 1 x i1> %M,
i32 %L)
>
> would then indicate the the mask %M applies to groups of "4 / 1"
float
> elements.
>Sounds good to me. I haven't checked if the current code allows for that.

Jacob
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190201/4a503f3d/attachment.html>

Luke Kenneth Casson Leighton via llvm-dev

2019-Feb-02 00:39 UTC

head link

[llvm-dev] [RFC] Vector Predication

On Friday, February 1, 2019, Simon Moll <moll at cs.uni-saarland.de>
wrote:>
> We could untie the mask length from the data length:
>
>   %result = call <scalable 4 x float>
@llvm.evl.fsub.v4f32(<scalable 4 x
> float> %x, <scalable 4 x float> %y, <scalable 1 x i1> %M,
i32 %L)
>
> would then indicate the mask %M applies to groups of "4 / 1"
float
> elements.
>
That would provide the greatest flexibility, as a 1:1 ratio could mean 1
bit per element, covering the normal case.

Question: are there any circumstances under which it is desirable to
underspecify or overspecify the number of bits in the predicate?

ie to deliberately have a FP vector of length 11 and a mask of length 9 or
13?

Or, is that just a runtime error.

L.


-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190202/f05b8b2f/attachment.html>

Simon Moll via llvm-dev

2019-Feb-04 11:46 UTC

head link

[llvm-dev] [RFC] Vector Predication

On 2/2/19 1:39 AM, Luke Kenneth Casson Leighton wrote:>
>
> On Friday, February 1, 2019, Simon Moll <moll at cs.uni-saarland.de 
> <mailto:moll at cs.uni-saarland.de>> wrote:
>
>     We could untie the mask length from the data length:
>
>       %result = call <scalable 4 x float>
>     @llvm.evl.fsub.v4f32(<scalable 4 x float> %x, <scalable 4 x
float>
>     %y, <scalable 1 x i1> %M, i32 %L)
>
>     would then indicate the mask %M applies to groups of "4 / 1"
float
>     elements.
>
>
> That would provide the greatest flexibility, as a 1:1 ratio could mean 
> 1 bit per element, covering the normal case.
>
> Question: are there any circumstances under which it is desirable to 
> underspecify or overspecify the number of bits in the predicate?
>
> ie to deliberately have a FP vector of length 11 and a mask of length 
> 9 or 13?
You are referring to the sub-vector sizes, if i am understanding 
correctly. I'd assume that the mask sub-vector length always has to be 
either 1 or the same as the data sub-vector length. For example, this is ok:

    %result = call <scalable 3 x float> @llvm.evl.fsub.v4f32(<scalable
3
x float> %x, <scalable 3 x float> %y, <scalable 1 x i1> %M, i32
%L)

    %result = call <scalable 5 x float> @llvm.evl.fsub.v4f32(<scalable
5
x float> %x, <scalable 5 x float> %y, <scalable 1 x i1> %M, i32
%L)

    %result = call <16 x float> @llvm.evl.fsub.v4f32(<16 x float>
%x, <4
x float> %y, <4 x i1> %M, i32 %L)

This is invalid IR:

    %result = call <scalable 4 x float> @llvm.evl.fsub.v4f32(<scalable
4
x float> %x, <scalable 4 x float> %y, <scalable 2 x i1> %M, i32
%L)

    %result = call <scalable 11 x float> @llvm.evl.fsub.v4f32(<scalable
11 x float> %x, <scalable 11 x float> %y, <scalable 9 x i1> %M,
i32 %L)

    %result = call <5 x float> @llvm.evl.fsub.v4f32(<5 x float> %x,
<5 x
float> %y, <7 x i1> %M, i32 %L)


In case you are talking about the dynamic vector length (eg what happens 
if the dynamic length's don't match at runtime), i think the key here is
to regard the vector length parameter "vlen %L" as a contract: the 
semantics of the EVL operation is undefined if the runtime lengths of 
the vectors are shorter than indicated by %L. That is the mask has a 
minimum element count of %L * mask sub-vector length, the data has a 
minimum element count of %L * data sub-vector length.

- Simon
>
> Or, is that just a runtime error.
>
> L.
>
>
> -- 
> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
>-- 

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190204/c12093f0/attachment.html>

llvm dev - Feb 2019 - [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication