thr3ads.net - llvm dev - [llvm-dev] [RFC] Matrix support (take 2) [Dec 2018]

If this information is useful, please help other people find it:
Share via:

David Greene via llvm-dev

2018-Dec-20 15:43 UTC

[llvm-dev] [RFC] Matrix support (take 2)

Simon Moll <moll at cs.uni-saarland.de> writes:
>> How will existing passes be taught about the new intrinsics?  For
>> example, what would have to be done to instcombine to teach it about
>> these intrinsics?  Let's suppose every existing operation had an
>> equivalent masked intrinsic.  Would it be easier to teach all of the
>> passes about them or would it be easier to teach the passes about a
mask
>> operand on the existing Instructions?  Would it be easier to teach isel
>> about all the intrinsics or would it be easier to teach isel about a
>> mask operand?
>
> Consider that over night we introduce optional mask parameters to
> vector instructions. Then, since you can not safely ignore the mask,
> every transformation and analysis that is somehow concerned with
> vector instructions is potentially broken and needs to be fixed.
True, but is there a way we could do this incrementally?  Even if we
start with intrinsics and then migrate to first-class support, at some
point passes are going to be broken with respect to masks on
Instructions.
> If you go with masking intrinsics, and set the attributes right, it is
> clear that transformations won't break your code and you will need to
> teach InstCombine, DAGCombiner, etc that a `masked.fadd` is just an
> fadd` with a mask. However, this gives you the opportunity to
> "re-enable" one optimization add a time each time making sure
that the
> mask is handled correctly. In case of InstCombine, the vector
> instruction patterns transfer to mask intrinsics: if all mask
> intrinsics in the pattern have the same mask parameter you can apply
> the transformation, the resulting mask intrinsics will again take the
> same mask parameter.
Right.
> Also, this need not be a hard transition from vector instructions to
> masking intrinsics.. you can add new types of masking intrinsics in
> batches along with the required transformations. Masking intrinsics
> and vector instruction can live side by side (as they do today,
> anyway).
Of course.

>> I honestly don't know the answers to these questions.  But I think
they
>> are important to consider, especially if intrinsics are seen as a
bridge
>> to first-class IR support for masking.
>
> I think its sensible to use masking intrinsics (or EVL
> https://reviews.llvm.org/D53613) on IR level and masked SD nodes in
> the backend. However, i agree that intrinsics should just be a bridge
> to native support mid term.
The biggest question I have is how such a transition would happen.
Let's say we have a full set of masking intrinsics.  Now we want to take
one IR-level operation, say fadd, and add mask support to it.  How do we
do that?  Is it any easier because we have all of the intrinsics, or
does all of the work on masking intrinsics get thrown away at some
point?

I'm reminded of this now decade-old thread on gather/scatter and masking
from Don Gohman, which I also mentioned in an SVE thread earlier this
year:

https://lists.llvm.org/pipermail/llvm-dev/2008-August/016284.html

The applymask idea got worked through a bit and IIRC at some later point
someone found issues with it that need to be addressed, but it's an
interesting idea to consider.  I wasn't too hot on it at the time but it
may be a way forward.

In that thread, Tim Foley posted a summary of options for mask support,
one of which was adding intrinsics:

https://lists.llvm.org/pipermail/llvm-dev/2008-August/016371.html

                                -David

Simon Moll via llvm-dev

2018-Dec-20 16:40 UTC

head link

[llvm-dev] [RFC] Matrix support (take 2)

On 12/20/18 4:43 PM, David Greene wrote:> Simon Moll <moll at cs.uni-saarland.de> writes:
>
>>> How will existing passes be taught about the new intrinsics?  For
>>> example, what would have to be done to instcombine to teach it
about
>>> these intrinsics?  Let's suppose every existing operation had
an
>>> equivalent masked intrinsic.  Would it be easier to teach all of
the
>>> passes about them or would it be easier to teach the passes about a
mask
>>> operand on the existing Instructions?  Would it be easier to teach
isel
>>> about all the intrinsics or would it be easier to teach isel about
a
>>> mask operand?
>> Consider that over night we introduce optional mask parameters to
>> vector instructions. Then, since you can not safely ignore the mask,
>> every transformation and analysis that is somehow concerned with
>> vector instructions is potentially broken and needs to be fixed.
> True, but is there a way we could do this incrementally?  Even if we
> start with intrinsics and then migrate to first-class support, at some
> point passes are going to be broken with respect to masks on
> Instructions.
Here is path an idea for an incremental transition:

a) Create a new, distinct type. Let's say its called the "predicated 
vector type", written "{W x double}".

b) Make the predicate vector type a legal operand type for all binary 
operators and add an optional predicate parameter to them. Now, here is 
the catch: the predicate parameter is only legal if the data type of the 
operation is "predicated vector type". That is "fadd <8 x
double>" will
for ever be unpredicated. However, "fadd {8 x double} %a, %b" may have
an optional predicate argument. Semantically, these two operations would 
be identical:

fadd <8 x double>, %a, %b

fadd {8 x double}, %a, %b, predicate(<8 x i1><1, 1, 1, 1, 1, 1, 1,
1>)

In terms of the LLVM type hierachy, PredicatedVectorType would be 
distinct from VectorType and so no transformation can break it. While 
you are in the transition (from unpredicated to predicated IR), you may 
see codes like this:

%aP = bitcast <8 x  double> %a to {8 x double}
%bP = bitcast <8 x  double> %b to {8 x double}
%cP = fdiv %aP, %bP, mask(11101110) ; predicated fdiv
%c = bitcast <8 x double> %c to %cP
%d = fadd <8 x double> %a, %c   ; no predicated fadd yet

Eventually, when all optimizations/instructions/analyses have been 
migrated to run well with the new type, 1. deprecate the old vector 
type, 2. promote it to PredicatedVectorType when parsing BCand, after a 
grace period, rename {8 x double} to <8 x double>>> If you go with masking intrinsics, and set the attributes right, it is
>> clear that transformations won't break your code and you will need
to
>> teach InstCombine, DAGCombiner, etc that a `masked.fadd` is just an
>> fadd` with a mask. However, this gives you the opportunity to
>> "re-enable" one optimization add a time each time making sure
that the
>> mask is handled correctly. In case of InstCombine, the vector
>> instruction patterns transfer to mask intrinsics: if all mask
>> intrinsics in the pattern have the same mask parameter you can apply
>> the transformation, the resulting mask intrinsics will again take the
>> same mask parameter.
> Right.
>
>> Also, this need not be a hard transition from vector instructions to
>> masking intrinsics.. you can add new types of masking intrinsics in
>> batches along with the required transformations. Masking intrinsics
>> and vector instruction can live side by side (as they do today,
>> anyway).
> Of course.
>
>
>>> I honestly don't know the answers to these questions.  But I
think they
>>> are important to consider, especially if intrinsics are seen as a
bridge
>>> to first-class IR support for masking.
>> I think its sensible to use masking intrinsics (or EVL
>> https://reviews.llvm.org/D53613) on IR level and masked SD nodes in
>> the backend. However, i agree that intrinsics should just be a bridge
>> to native support mid term.
> The biggest question I have is how such a transition would happen.
> Let's say we have a full set of masking intrinsics.  Now we want to
take
> one IR-level operation, say fadd, and add mask support to it.  How do we
> do that?  Is it any easier because we have all of the intrinsics, or
> does all of the work on masking intrinsics get thrown away at some
> point?The masking intrinsics are just a transitional thing. Eg, we could add 
them now and let them mature. Once the intrinsics are stable and proven 
start migrating for core IR support (eg as sketched
above).>
> I'm reminded of this now decade-old thread on gather/scatter and
masking
> from Don Gohman, which I also mentioned in an SVE thread earlier this
> year:
>
> https://lists.llvm.org/pipermail/llvm-dev/2008-August/016284.html
>
> The applymask idea got worked through a bit and IIRC at some later point
> someone found issues with it that need to be addressed, but it's an
> interesting idea to consider.  I wasn't too hot on it at the time but
it
> may be a way forward.
>
> In that thread, Tim Foley posted a summary of options for mask support,
> one of which was adding intrinsics:
>
> https://lists.llvm.org/pipermail/llvm-dev/2008-August/016371.html
>
>                                  -David
Thank for you for the pointer! Is this documented somewhere? (say in a 
wiki or some proposal doc). Otherwise, we are bound to go through these 
discussions again and again until a consensus is reached. Btw, different 
to then, we are also talking about an active vector length now (hence EVL).

AFAIU apply_mask was proposed to have less (redundant) predicate 
arguments. Unless the apply_mask breaks a chain in a matcher pattern, 
the approach should be prone to the issue of transformations breaking 
code as well.

Has something like the PredicatedVectorType approach above been proposed 
before?

- Simon

-- 

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll

David Greene via llvm-dev

2018-Dec-20 17:11 UTC

head link

[llvm-dev] [RFC] Matrix support (take 2)

Simon Moll via llvm-dev <llvm-dev at lists.llvm.org> writes:
>> True, but is there a way we could do this incrementally?  Even if we
>> start with intrinsics and then migrate to first-class support, at some
>> point passes are going to be broken with respect to masks on
>> Instructions.
>
> Here is path an idea for an incremental transition:
>
> a) Create a new, distinct type. Let's say its called the
"predicated
> vector type", written "{W x double}".
>
> b) Make the predicate vector type a legal operand type for all binary
> operators and add an optional predicate parameter to them. Now, here
> is the catch: the predicate parameter is only legal if the data type
> of the operation is "predicated vector type". That is "fadd
<8 x
> double>" will for ever be unpredicated. However, "fadd {8 x
double}
> %a, %b" may have an optional predicate argument. Semantically, these
> two operations would be identical:
>
> fadd <8 x double>, %a, %b
>
> fadd {8 x double}, %a, %b, predicate(<8 x i1><1, 1, 1, 1, 1, 1, 1,
1>)
>
> In terms of the LLVM type hierachy, PredicatedVectorType would be
> distinct from VectorType and so no transformation can break it. While
> you are in the transition (from unpredicated to predicated IR), you
> may see codes like this:
>
> %aP = bitcast <8 x  double> %a to {8 x double}
> %bP = bitcast <8 x  double> %b to {8 x double}
> %cP = fdiv %aP, %bP, mask(11101110) ; predicated fdiv
> %c = bitcast <8 x double> %c to %cP
> %d = fadd <8 x double> %a, %c   ; no predicated fadd yet
>
> Eventually, when all optimizations/instructions/analyses have been
> migrated to run well with the new type, 1. deprecate the old vector
> type, 2. promote it to PredicatedVectorType when parsing BCand, after
> a grace period, rename {8 x double} to <8 x double>
That's an interesting idea.  It strikes me that if we went this route,
we wouldn't need intermediate intrinsics at all.  I'm trying to avoid
throw-away work, which it seems the intrinsics would be if core IR
support is the ultimate goal.

I'm not sure the renaming could ever happen.  What are the rules for
supporting older BC/textual IR files?
> The masking intrinsics are just a transitional thing. Eg, we could add
> them now and let them mature. Once the intrinsics are stable and
> proven start migrating for core IR support (eg as sketched above).
What are the tradeoffs between adding these intrinsics and adding
PredicatedVectorType from the get-go?  The latter is probably harder to
get past review, but if, say, an RFC were developed to gain consensus,
then maybe it would be easier?  I recognize that there are probably time
constraints in getting some kind of better masking support in soon and
intrinsics seems like the easiest way to do that.
> Thank for you for the pointer! Is this documented somewhere? (say in a
> wiki or some proposal doc). Otherwise, we are bound to go through
> these discussions again and again until a consensus is reached. Btw,
> different to then, we are also talking about an active vector length
> now (hence EVL).
It's not documented anywhere else AFAIK.
> AFAIU apply_mask was proposed to have less (redundant) predicate
> arguments. Unless the apply_mask breaks a chain in a matcher pattern,
> the approach should be prone to the issue of transformations breaking
> code as well.
The idea is that applymask of a single value against two different mask
values would result in two new SSA values, so existing pattern matches
would not match if the masks did not match.
> Has something like the PredicatedVectorType approach above been
> proposed before?
I believe Tim's #5 is the closest:

https://lists.llvm.org/pipermail/llvm-dev/2008-August/016371.html

I don't think it was ever fleshed out.

Perhaps we should start on an RFC to explore the design space and get
community input.  In the meantime, if we urgently need intrinsics we
should probably go ahead and start adding them.

                              -David

Roman Lebedev via llvm-dev

2018-Dec-20 17:43 UTC

head link

[llvm-dev] [RFC] Matrix support (take 2)

On Thu, Dec 20, 2018 at 7:40 PM Simon Moll via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> On 12/20/18 4:43 PM, David Greene wrote:
> > Simon Moll <moll at cs.uni-saarland.de> writes:
> >
> >>> How will existing passes be taught about the new intrinsics? 
For
> >>> example, what would have to be done to instcombine to teach it
about
> >>> these intrinsics?  Let's suppose every existing operation
had an
> >>> equivalent masked intrinsic.  Would it be easier to teach all
of the
> >>> passes about them or would it be easier to teach the passes
about a mask
> >>> operand on the existing Instructions?  Would it be easier to
teach isel
> >>> about all the intrinsics or would it be easier to teach isel
about a
> >>> mask operand?
> >> Consider that over night we introduce optional mask parameters to
> >> vector instructions. Then, since you can not safely ignore the
mask,
> >> every transformation and analysis that is somehow concerned with
> >> vector instructions is potentially broken and needs to be fixed.
> > True, but is there a way we could do this incrementally?  Even if we
> > start with intrinsics and then migrate to first-class support, at some
> > point passes are going to be broken with respect to masks on
> > Instructions.
>
> Here is path an idea for an incremental transition:
>
> a) Create a new, distinct type. Let's say its called the
"predicated
> vector type", written "{W x double}".
>
> b) Make the predicate vector type a legal operand type for all binary
> operators and add an optional predicate parameter to them. Now, here is
> the catch: the predicate parameter is only legal if the data type of the
> operation is "predicated vector type". That is "fadd <8 x
double>" will
> for ever be unpredicated. However, "fadd {8 x double} %a, %b" may
have
> an optional predicate argument. Semantically, these two operations would
> be identical:
>
> fadd <8 x double>, %a, %b
>
> fadd {8 x double}, %a, %b, predicate(<8 x i1><1, 1, 1, 1, 1, 1, 1,
1>)
>
> In terms of the LLVM type hierachy, PredicatedVectorType would be
> distinct from VectorType and so no transformation can break it. While
> you are in the transition (from unpredicated to predicated IR), you may
> see codes like this:
>
> %aP = bitcast <8 x  double> %a to {8 x double}
> %bP = bitcast <8 x  double> %b to {8 x double}
> %cP = fdiv %aP, %bP, mask(11101110) ; predicated fdiv
> %c = bitcast <8 x double> %c to %cP
> %d = fadd <8 x double> %a, %c   ; no predicated fadd yet
>
> Eventually, when all optimizations/instructions/analyses have been
> migrated to run well with the new type, 1. deprecate the old vector
> type, 2. promote it to PredicatedVectorType when parsing BCand, after a
> grace period, rename {8 x double} to <8 x double>
I'm likely missing things,
but i strongly suspect that the amount of effort needed is underestimated.

Vector support works because, with some exceptions,
vector is simply interpreted as several scalars concatenated.

Much like as with native fixed-point type support,
a whole new incompatible type is suggested to be added here.
I *suspect*, *every* single transform in instcombine/instsimplify
will need to be *duplicated*.

That is a lot. Intrinsics sound like less intrusive solution, in both cases.
> >> If you go with masking intrinsics, and set the attributes right,
it is
> >> clear that transformations won't break your code and you will
need to
> >> teach InstCombine, DAGCombiner, etc that a `masked.fadd` is just
an
> >> fadd` with a mask. However, this gives you the opportunity to
> >> "re-enable" one optimization add a time each time making
sure that the
> >> mask is handled correctly. In case of InstCombine, the vector
> >> instruction patterns transfer to mask intrinsics: if all mask
> >> intrinsics in the pattern have the same mask parameter you can
apply
> >> the transformation, the resulting mask intrinsics will again take
the
> >> same mask parameter.
> > Right.
> >
> >> Also, this need not be a hard transition from vector instructions
to
> >> masking intrinsics.. you can add new types of masking intrinsics
in
> >> batches along with the required transformations. Masking
intrinsics
> >> and vector instruction can live side by side (as they do today,
> >> anyway).
> > Of course.
> >
> >
> >>> I honestly don't know the answers to these questions.  But
I think they
> >>> are important to consider, especially if intrinsics are seen
as a bridge
> >>> to first-class IR support for masking.
> >> I think its sensible to use masking intrinsics (or EVL
> >> https://reviews.llvm.org/D53613) on IR level and masked SD nodes
in
> >> the backend. However, i agree that intrinsics should just be a
bridge
> >> to native support mid term.
> > The biggest question I have is how such a transition would happen.
> > Let's say we have a full set of masking intrinsics.  Now we want
to take
> > one IR-level operation, say fadd, and add mask support to it.  How do
we
> > do that?  Is it any easier because we have all of the intrinsics, or
> > does all of the work on masking intrinsics get thrown away at some
> > point?
> The masking intrinsics are just a transitional thing. Eg, we could add
> them now and let them mature. Once the intrinsics are stable and proven
> start migrating for core IR support (eg as sketched above).
> >
> > I'm reminded of this now decade-old thread on gather/scatter and
masking
> > from Don Gohman, which I also mentioned in an SVE thread earlier this
> > year:
> >
> > https://lists.llvm.org/pipermail/llvm-dev/2008-August/016284.html
> >
> > The applymask idea got worked through a bit and IIRC at some later
point
> > someone found issues with it that need to be addressed, but it's
an
> > interesting idea to consider.  I wasn't too hot on it at the time
but it
> > may be a way forward.
> >
> > In that thread, Tim Foley posted a summary of options for mask
support,
> > one of which was adding intrinsics:
> >
> > https://lists.llvm.org/pipermail/llvm-dev/2008-August/016371.html
> >
> >                                  -David
>
> Thank for you for the pointer! Is this documented somewhere? (say in a
> wiki or some proposal doc). Otherwise, we are bound to go through these
> discussions again and again until a consensus is reached. Btw, different
> to then, we are also talking about an active vector length now (hence EVL).
>
> AFAIU apply_mask was proposed to have less (redundant) predicate
> arguments. Unless the apply_mask breaks a chain in a matcher pattern,
> the approach should be prone to the issue of transformations breaking
> code as well.
>
> Has something like the PredicatedVectorType approach above been proposed
> before?
>
> - Simon
>
> --
>
> Simon Moll
> Researcher / PhD Student
>
> Compiler Design Lab (Prof. Hack)
> Saarland University, Computer Science
> Building E1.3, Room 4.31
>
> Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
> Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll
Roman.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Dec 2018 - [RFC] Matrix support (take 2)

[llvm-dev] [RFC] Matrix support (take 2)

[llvm-dev] [RFC] Matrix support (take 2)

[llvm-dev] [RFC] Matrix support (take 2)

[llvm-dev] [RFC] Matrix support (take 2)