thr3ads.net - llvm dev - [llvm-dev] [RFC] Matrix support (take 2) [Dec 2018]

If this information is useful, please help other people find it:
Share via:

Stephen Canon via llvm-dev

2018-Dec-19 21:31 UTC

[llvm-dev] [RFC] Matrix support (take 2)

> On Dec 19, 2018, at 11:09 AM, Stephen Canon via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
>> On Dec 18, 2018, at 10:18 PM, Adam Nemet <anemet at apple.com
<mailto:anemet at apple.com>> wrote:
>> 
>>> I don’t understand this.  What is the benefit of providing layout
info to element wise operations?  This defeats the goal of having simple
lowering and representation: you are encoding an ND vector form into the IR in a
really ugly way, and this will cause a proliferation of intrinsics that are
redundant with the core ops.
>> 
>> The reason we need that information so that for example we can lower an
operation on a 3-element column into a vector of 2 and a scalar op.  This should
be beneficial for power consumption since for example in the case of a 3x3 with
a single element padding rather than operating on 12 elements you’d operate only
on 9 (vector ops consume more power than their scalar counterparts).
>> 
>> That said we should be able to remove these intrinsics in the long
term.  Once we have masking on the core ops in the IR, we should be able to
express the same semantics without dedicated intrinsics.
> 
> There may be some cases where this holds (maybe with 5x5 or something), but
most of the time I would expect to get better power from doing a four-element
vector op with one wasted lane than doing two arithmetic ops (plus possibly
extracts and inserts, depending on physical layout details).
> 
> Explicit masking or arranging for zero in padding lanes seems like a better
way forward to me.
> – Steve
I spent some time chatting with Adam about this and have a better understanding
of his concerns here. It seems to me that if having masking intrinsics is the
long-term solution we want, we should do that now (for add and sub) rather than
building arbitrary matrix layout info into intrinsics, since a mask has all the
information that we actually need.

– Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181219/3715fd08/attachment.html>

Adam Nemet via llvm-dev

2018-Dec-19 22:07 UTC

head link

[llvm-dev] [RFC] Matrix support (take 2)

> On Dec 19, 2018, at 1:31 PM, Stephen Canon <scanon at apple.com>
wrote:
> 
>> On Dec 19, 2018, at 11:09 AM, Stephen Canon via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>>> On Dec 18, 2018, at 10:18 PM, Adam Nemet <anemet at apple.com
<mailto:anemet at apple.com>> wrote:
>>> 
>>>> I don’t understand this.  What is the benefit of providing
layout info to element wise operations?  This defeats the goal of having simple
lowering and representation: you are encoding an ND vector form into the IR in a
really ugly way, and this will cause a proliferation of intrinsics that are
redundant with the core ops.
>>> 
>>> The reason we need that information so that for example we can
lower an operation on a 3-element column into a vector of 2 and a scalar op. 
This should be beneficial for power consumption since for example in the case of
a 3x3 with a single element padding rather than operating on 12 elements you’d
operate only on 9 (vector ops consume more power than their scalar
counterparts).
>>> 
>>> That said we should be able to remove these intrinsics in the long
term.  Once we have masking on the core ops in the IR, we should be able to
express the same semantics without dedicated intrinsics.
>> 
>> There may be some cases where this holds (maybe with 5x5 or something),
but most of the time I would expect to get better power from doing a
four-element vector op with one wasted lane than doing two arithmetic ops (plus
possibly extracts and inserts, depending on physical layout details).
>> 
>> Explicit masking or arranging for zero in padding lanes seems like a
better way forward to me.
>> – Steve
> 
> I spent some time chatting with Adam about this and have a better
understanding of his concerns here. It seems to me that if having masking
intrinsics is the long-term solution we want, we should do that now (for add and
sub) rather than building arbitrary matrix layout info into intrinsics, since a
mask has all the information that we actually need.
I think that sounds like a reasonable compromise.  We already have masked
load/store intrinsics so adding add and sub just follows that precedent.  If the
decision is made to move masking to the core operations, the new intrinsics
would just move as well.

So an add->multiply for option B + masking intrinsics would look like this:

  %a = load <12 x float>, <12 x float>* %A, align 16
  %b = load <12 x float>, <12 x float>* %B, align 16
  %c = load <8 x float>, <8 x float>* %C, align 16

  %add = call <12 x float> @llvm.masked.fadd(<12 x float> %a, <12
x float> %b,
      					     ; mask, if false element is taken from passthrough
                                             <12 x i1> <i1 true, i1
true, i1 true, i1 false,
                                                        i1 true, i1 true, i1
true, i1 false,
                                                        i1 true, i1 true, i1
true, i1 false >
                                             ; passthrough:
                                             <12 x float> <float undef,
float undef, float undef, float undef,
                                                           float undef, float
undef, float undef, float undef,
                                                           float undef, float
undef, float undef, float undef >)

  %mul = call <8 x float> @llvm.matrix.multiply(<12 x float> %add,
<8 x float> %c,
                                               ;     3 x 3             3 x 2 
column-major:
                                                i32 3, i32 3,     i32 3, i32 2, 
i1 true)
  store <8 x float> %mul, <8 x float>* %MUL, align 16

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181219/00df7388/attachment.html>

David Greene via llvm-dev

2018-Dec-19 22:21 UTC

head link

[llvm-dev] [RFC] Matrix support (take 2)

Adam Nemet via llvm-dev <llvm-dev at lists.llvm.org> writes:
>     I spent some time chatting with Adam about this and have a better
>     understanding of his concerns here. It seems to me that if having
>     masking intrinsics is the long-term solution we want, we should do
>     that now (for add and sub) rather than building arbitrary matrix
>     layout info into intrinsics, since a mask has all the information
>     that we actually need.
>
> I think that sounds like a reasonable compromise. We already have
> masked load/store intrinsics so adding add and sub just follows that
> precedent. If the decision is made to move masking to the core
> operations, the new intrinsics would just move as well.
How will existing passes be taught about the new intrinsics?  For
example, what would have to be done to instcombine to teach it about
these intrinsics?  Let's suppose every existing operation had an
equivalent masked intrinsic.  Would it be easier to teach all of the
passes about them or would it be easier to teach the passes about a mask
operand on the existing Instructions?  Would it be easier to teach isel
about all the intrinsics or would it be easier to teach isel about a
mask operand?

I honestly don't know the answers to these questions.  But I think they
are important to consider, especially if intrinsics are seen as a bridge
to first-class IR support for masking.

                                 -David

Simon Moll via llvm-dev

2018-Dec-19 22:37 UTC

head link

[llvm-dev] [RFC] Matrix support (take 2)

Hi,

On 12/19/18 11:07 PM, Adam Nemet via llvm-dev wrote:>
>
>> On Dec 19, 2018, at 1:31 PM, Stephen Canon <scanon at apple.com 
>> <mailto:scanon at apple.com>> wrote:
>>
>>> On Dec 19, 2018, at 11:09 AM, Stephen Canon via llvm-dev 
>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>>
>>>> On Dec 18, 2018, at 10:18 PM, Adam Nemet <anemet at
apple.com
>>>> <mailto:anemet at apple.com>> wrote:
>>>>
>>>>> I don’t understand this.  What is the benefit of providing
layout
>>>>> info to element wise operations?  This defeats the goal of
having
>>>>> simple lowering and representation: you are encoding an ND
vector
>>>>> form into the IR in a really ugly way, and this will cause
a
>>>>> proliferation of intrinsics that are redundant with the
core ops.
>>>>
>>>> The reason we need that information so that for example we can 
>>>> lower an operation on a 3-element column into a vector of 2 and
a
>>>> scalar op.  This should be beneficial for power consumption
since
>>>> for example in the case of a 3x3 with a single element padding 
>>>> rather than operating on 12 elements you’d operate only on 9 
>>>> (vector ops consume more power than their scalar counterparts).
>>>>
>>>> That said we should be able to remove these intrinsics in the
long
>>>> term.  Once we have masking on the core ops in the IR, we
should be
>>>> able to express the same semantics without dedicated
intrinsics.
>>>
>>> There may be some cases where this holds (maybe with 5x5 or 
>>> something), but most of the time I would expect to get better power
>>> from doing a four-element vector op with one wasted lane than doing
>>> two arithmetic ops (plus possibly extracts and inserts, depending
on
>>> physical layout details).
>>>
>>> Explicit masking or arranging for zero in padding lanes seems like
a
>>> better way forward to me.
>>> – Steve
>>
>> I spent some time chatting with Adam about this and have a better 
>> understanding of his concerns here. It seems to me that if having 
>> masking intrinsics is the long-term solution we want, we should do 
>> that now (for add and sub) rather than building arbitrary matrix 
>> layout info into intrinsics, since a mask has all the information 
>> that we actually need.
>
> I think that sounds like a reasonable compromise.  We already have 
> masked load/store intrinsics so adding add and sub just follows that 
> precedent.  If the decision is made to move masking to the core 
> operations, the new intrinsics would just move as well.
>
> So an add->multiply for option B + masking intrinsics would look like 
> this:
>
> %a = load <12 x float>, <12 x float>* %A, align 16
> %b = load <12 x float>, <12 x float>* %B, align 16
> %c = load <8 x float>, <8 x float>* %C, align 16
>
> %add = call <12 x float> @llvm.masked.fadd(<12 x float> %a,
<12 x
> float> %b,
> ; mask, if false element is taken from passthrough
>     <12 x i1> <i1 true, i1 true, i1 true, i1 false,
> i1 true, i1 true, i1 true, i1 false,
> i1 true, i1 true, i1 true, i1 false >
> ; passthrough:
>     <12 x float> <float undef, float undef, float undef, float
undef,
> float undef, float undef, float undef, float undef,
> float undef, float undef, float undef, float undef >)
>
> %mul = call <8 x float> @llvm.matrix.multiply(<12 x float>
%add, <8 x
> float> %c,
> ;     3 x 3           3 x 2  column-major:
> i32 3, i32 3, i32 3, i32 2, i1 true)
> store <8 x float> %mul, <8 x float>* %MUL, align 16We've started an RFC that proposes exactly this: 
https://reviews.llvm.org/D53613

The RFC proposes intrinsics that take a mask and an explicit vector 
length argument. The explicit vector length is aimed at RISC-V V and NEC 
SX-Aurora and it can be legalized away for targets that do not support 
it (eg AVX512). We also propose a couple of new attributes that should 
help with function call vectorization.

I'll present this in Zurich at the upcoming LLVM Social on January, 10th 
for people who are interested. I also talked about a bit about this at 
the last DevMtg (from ~15:00 in https://youtu.be/BAZClv6nMxY).

- Simon

>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181219/3c852f22/attachment.html>

llvm dev - Dec 2018 - [RFC] Matrix support (take 2)

[llvm-dev] [RFC] Matrix support (take 2)

[llvm-dev] [RFC] Matrix support (take 2)

[llvm-dev] [RFC] Matrix support (take 2)

[llvm-dev] [RFC] Matrix support (take 2)