thr3ads.net - llvm dev - [llvm-dev] [RFC] Vector Predication [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Simon Moll via llvm-dev

2019-Feb-04 11:46 UTC

[llvm-dev] [RFC] Vector Predication

On 2/2/19 1:39 AM, Luke Kenneth Casson Leighton wrote:>
>
> On Friday, February 1, 2019, Simon Moll <moll at cs.uni-saarland.de 
> <mailto:moll at cs.uni-saarland.de>> wrote:
>
>     We could untie the mask length from the data length:
>
>       %result = call <scalable 4 x float>
>     @llvm.evl.fsub.v4f32(<scalable 4 x float> %x, <scalable 4 x
float>
>     %y, <scalable 1 x i1> %M, i32 %L)
>
>     would then indicate the mask %M applies to groups of "4 / 1"
float
>     elements.
>
>
> That would provide the greatest flexibility, as a 1:1 ratio could mean 
> 1 bit per element, covering the normal case.
>
> Question: are there any circumstances under which it is desirable to 
> underspecify or overspecify the number of bits in the predicate?
>
> ie to deliberately have a FP vector of length 11 and a mask of length 
> 9 or 13?
You are referring to the sub-vector sizes, if i am understanding 
correctly. I'd assume that the mask sub-vector length always has to be 
either 1 or the same as the data sub-vector length. For example, this is ok:

    %result = call <scalable 3 x float> @llvm.evl.fsub.v4f32(<scalable
3
x float> %x, <scalable 3 x float> %y, <scalable 1 x i1> %M, i32
%L)

    %result = call <scalable 5 x float> @llvm.evl.fsub.v4f32(<scalable
5
x float> %x, <scalable 5 x float> %y, <scalable 1 x i1> %M, i32
%L)

    %result = call <16 x float> @llvm.evl.fsub.v4f32(<16 x float>
%x, <4
x float> %y, <4 x i1> %M, i32 %L)

This is invalid IR:

    %result = call <scalable 4 x float> @llvm.evl.fsub.v4f32(<scalable
4
x float> %x, <scalable 4 x float> %y, <scalable 2 x i1> %M, i32
%L)

    %result = call <scalable 11 x float> @llvm.evl.fsub.v4f32(<scalable
11 x float> %x, <scalable 11 x float> %y, <scalable 9 x i1> %M,
i32 %L)

    %result = call <5 x float> @llvm.evl.fsub.v4f32(<5 x float> %x,
<5 x
float> %y, <7 x i1> %M, i32 %L)


In case you are talking about the dynamic vector length (eg what happens 
if the dynamic length's don't match at runtime), i think the key here is
to regard the vector length parameter "vlen %L" as a contract: the 
semantics of the EVL operation is undefined if the runtime lengths of 
the vectors are shorter than indicated by %L. That is the mask has a 
minimum element count of %L * mask sub-vector length, the data has a 
minimum element count of %L * data sub-vector length.

- Simon
>
> Or, is that just a runtime error.
>
> L.
>
>
> -- 
> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
>-- 

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190204/c12093f0/attachment.html>

Jacob Lifshay via llvm-dev

2019-Feb-04 12:14 UTC

head link

[llvm-dev] [RFC] Vector Predication

On Mon, Feb 4, 2019, 03:46 Simon Moll <moll at cs.uni-saarland.de wrote:
> On 2/2/19 1:39 AM, Luke Kenneth Casson Leighton wrote:
>
>
>
> On Friday, February 1, 2019, Simon Moll <moll at cs.uni-saarland.de>
wrote:
>>
>> We could untie the mask length from the data length:
>>
>>   %result = call <scalable 4 x float>
@llvm.evl.fsub.v4f32(<scalable 4 x
>> float> %x, <scalable 4 x float> %y, <scalable 1 x i1>
%M, i32 %L)
>>
>> would then indicate the mask %M applies to groups of "4 / 1"
float
>> elements.
>>
>
> That would provide the greatest flexibility, as a 1:1 ratio could mean 1
> bit per element, covering the normal case.
>
> Question: are there any circumstances under which it is desirable to
> underspecify or overspecify the number of bits in the predicate?
>
> ie to deliberately have a FP vector of length 11 and a mask of length 9 or
> 13?
>
> You are referring to the sub-vector sizes, if i am understanding
> correctly. I'd assume that the mask sub-vector length always has to be
> either 1 or the same as the data sub-vector length.
>I think that allowing the mask sub-vector length to be any divisor of the
data sub-vector length will allow the most flexible instructions, enabling
scalable vectors to be emulated by fixed-length simd operations more
easily, simplifying frontends. I don't think non-divisible lengths should
be allowed.
> For example, this is ok:
>
>    %result = call <scalable 3 x float>
@llvm.evl.fsub.v4f32(<scalable 3 x
> float> %x, <scalable 3 x float> %y, <scalable 1 x i1> %M,
i32 %L)
>
>    %result = call <scalable 5 x float>
@llvm.evl.fsub.v4f32(<scalable 5 x
> float> %x, <scalable 5 x float> %y, <scalable 1 x i1> %M,
i32 %L)
>
>    %result = call <16 x float> @llvm.evl.fsub.v4f32(<16 x
float> %x, <4 x
> float> %y, <4 x i1> %M, i32 %L)
>
> This is invalid IR:
>
>    %result = call <scalable 4 x float>
@llvm.evl.fsub.v4f32(<scalable 4 x
> float> %x, <scalable 4 x float> %y, <scalable 2 x i1> %M,
i32 %L)
>
>    %result = call <scalable 11 x float>
@llvm.evl.fsub.v4f32(<scalable 11
> x float> %x, <scalable 11 x float> %y, <scalable 9 x i1> %M,
i32 %L)
>
>    %result = call <5 x float> @llvm.evl.fsub.v4f32(<5 x float>
%x, <5 x
> float> %y, <7 x i1> %M, i32 %L)
>
>
> In case you are talking about the dynamic vector length (eg what happens
> if the dynamic length's don't match at runtime), i think the key
here is to
> regard the vector length parameter "vlen %L" as a contract: the
semantics
> of the EVL operation is undefined if the runtime lengths of the vectors are
> shorter than indicated by %L. That is the mask has a minimum element count
> of %L * mask sub-vector length, the data has a minimum element count of %L
> * data sub-vector length.
>
> - Simon
>
>
> Or, is that just a runtime error.
>
> L.
>
>
> --
> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
>
> --
>
> Simon Moll
> Researcher / PhD Student
>
> Compiler Design Lab (Prof. Hack)
> Saarland University, Computer Science
> Building E1.3, Room 4.31
>
> Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
> Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190204/19a01380/attachment.html>

David Greene via llvm-dev

2019-Feb-04 17:15 UTC

head link

[llvm-dev] [RFC] Vector Predication

Simon Moll <moll at cs.uni-saarland.de> writes:
> You are referring to the sub-vector sizes, if i am understanding
> correctly. I'd assume that the mask sub-vector length always has to be
> either 1 or the same as the data sub-vector length. For example, this
> is ok:
>
> %result = call <scalable 3 x float> @llvm.evl.fsub.v4f32(<scalable
3 x
> float> %x, <scalable 3 x float> %y, <scalable 1 x i1> %M,
i32 %L)
What does <scalable 1 x i1> applied to <scalable 3 x float> mean?  I
would expect a requirement of <scalable 3 x i1>.  At least that's how
I
understood the SVE proposal [1].  The n's in <scalable n x type> have
to
match.

                           -David

[1] http://lists.llvm.org/pipermail/llvm-dev/2016-November/106819.html

Simon Moll via llvm-dev

2019-Feb-04 20:13 UTC

head link

[llvm-dev] [RFC] Vector Predication

On 2/4/19 6:15 PM, David Greene wrote:> Simon Moll <moll at cs.uni-saarland.de> writes:
>
>> You are referring to the sub-vector sizes, if i am understanding
>> correctly. I'd assume that the mask sub-vector length always has to
be
>> either 1 or the same as the data sub-vector length. For example, this
>> is ok:
>>
>> %result = call <scalable 3 x float>
@llvm.evl.fsub.v4f32(<scalable 3 x
>> float> %x, <scalable 3 x float> %y, <scalable 1 x i1>
%M, i32 %L)
> What does <scalable 1 x i1> applied to <scalable 3 x float>
mean?  I
> would expect a requirement of <scalable 3 x i1>.  At least that's
how I
> understood the SVE proposal [1].  The n's in <scalable n x type>
have to
> match.
It would mean that the each mask bit M[i] applies to data lanes D[3*i] to
D[3*i+2]. It has applications in graphics codes where the vector element type
would be a short vector as in [2]:

for(int i = 0; i < 1000; i++)
{
     vec4 color = colors[i];    // <scalable 4 x float>
     vec3 normal = normals[i];  // <scalable 3 x float>
     color.rgb *= fmax(0.0, dot(normal, light_dir));
     colors[i] = color;
}



I don't see any direct conflict with LLVM-SVE [1] but it will add complexity
to the vectorizer, legal and TTI to choose a good sub-vector size/strategy for
each target.

- Simon


[2] https://lists.llvm.org/pipermail/llvm-dev/2019-January/129822.html
>
>                             -David
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2016-November/106819.html
-- 

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190204/41fd2a4a/attachment.html>

Robin Kruppe via llvm-dev

2019-Feb-04 20:18 UTC

head link

[llvm-dev] [RFC] Vector Predication

On Mon, 4 Feb 2019 at 18:15, David Greene via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Simon Moll <moll at cs.uni-saarland.de> writes:
>
> > You are referring to the sub-vector sizes, if i am understanding
> > correctly. I'd assume that the mask sub-vector length always has
to be
> > either 1 or the same as the data sub-vector length. For example, this
> > is ok:
> >
> > %result = call <scalable 3 x float>
@llvm.evl.fsub.v4f32(<scalable 3 x
> > float> %x, <scalable 3 x float> %y, <scalable 1 x i1>
%M, i32 %L)
>
> What does <scalable 1 x i1> applied to <scalable 3 x float>
mean?  I
> would expect a requirement of <scalable 3 x i1>.  At least that's
how I
> understood the SVE proposal [1].  The n's in <scalable n x type>
have to
> match.
>
I believe the idea is to allow each single mask bit to control multiple
consecutive lanes at once, effectively interpreting the vector being
operated on as "many short fixed-length vectors, concatenated" rather
than
a single long vector of scalars. This is a different interpretation of that
type than usual, but it's not crazy, e.g. a similar reinterpretation of
vector types seems to be the favored approach for adding matrix operations
to LLVM IR. It somewhat obscures the point to discuss this only for
scalable vectors, there's no conceptual reason why one couldn't do the
same
with fixed size vectors.

In fact, I would recommend against making almost any new feature or
intrinsic exclusive to scalable vectors, including this one: there
shouldn't be much extra code required to allow and support it, and not
doing so makes the IR less orthogonal. For example, if a <scalable 4 x
float> fadd with a <scalable 1 x i1> mask works, then <4 x float>
fadd with
a <1 x i1> mask, a <8 x float> fadd with a <2 x i1> mask, etc.
should also
be possible overloads of the same intrinsic.

So far, so good. A bit odd, when I think about it, but if hardware out
there has that capability, maybe this is a good way to encode it in IR
(other options might work too, though). The crux, however, is the
interaction with the dynamic vector length: is it in terms of the mask? the
longer data vector? if the latter, what happens if it isn't divisible by
the mask length? There are multiple options and it's not clear to me which
one is "the right one", both for architectures with native support
(hopefully the one brough up here won't be the only one) and for internal
consistency of the IR. If there was an established architecture with this
kind of feature where people have gathered lots of practical experience
with it, we could use that inform the decision (just as we have for
ordinary predication and dynamic vector length). But I'm not aware of any
architecture that does this other than the one Jacob and lkcl are working
on, and as far as I know their project still in the early stages.

Cheers,
Robin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190204/2ec0c964/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Feb 2019 - [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

[llvm-dev] [RFC] Vector Predication

Possibly Parallel Threads