thr3ads.net - llvm dev - [llvm-dev] RFC: Generic IR reductions [Feb 2017]

If this information is useful, please help other people find it:
Share via:

Demikhovsky, Elena via llvm-dev

2017-Feb-01 11:59 UTC

[llvm-dev] RFC: Generic IR reductions

> My proposal was to have a reduction intrinsic that can infer the type by
the predecessors.
> For example:
  > @llvm.reduce(ext <N x double> ( add <N x float> %a, %b))

And if we don't have %b? We just want to sum all elements of %a? Something
like @llvm.reduce(ext <N x double> ( add <N x float> %a,
zeroinitializer))
Don't we have a problem with constant propagation in this approach?

I proposed a "generic" intrinsic approach on the BOF (Nov, 2016), like
    %scalar = @llvm.reduce(OPCODE, %vector_input) - OPCODE may be a string,
integer or metadata.

-  Elena

-----Original Message-----
From: Renato Golin [mailto:renato.golin at linaro.org] 
Sent: Wednesday, February 01, 2017 12:54
To: Demikhovsky, Elena <elena.demikhovsky at intel.com>
Cc: Amara Emerson <amara.emerson at gmail.com>; Amara Emerson
<Amara.Emerson at arm.com>; llvm-dev at lists.llvm.org; nd <nd at
arm.com>; Simon Pilgrim <llvm-dev at redking.me.uk>
Subject: Re: [llvm-dev] RFC: Generic IR reductions

On 1 February 2017 at 10:30, Demikhovsky, Elena <elena.demikhovsky at
intel.com> wrote:>> If you mean "patterns may not be matched, and reduction
instructions will not be generated, making the code worse", then this is
just a matter of making the patterns obvious and the back-ends robust enough to
cope with it, no?
> The Back-end should be as robust as possible, I agree. The problem that I
see is in adding another kind of complexity to the optimizer that works between
the Vectorizer and the Back-end. It should be able to recognize all
"obvious" patterns in order to preserve them.
Right!

Also, I may have been my own enemy again and muddled the question. Let me try
again... :)

I'm not against a reduction intrinsic. I'm against one reduction
intrinsic for {every kind} x {ordered, unordered}. At least until further
evidence comes to light.

My proposal was to have a reduction intrinsic that can infer the type by the
predecessors.

For example:

  @llvm.reduce(ext <N x double> ( add <N x float> %a, %b))

would generate a widening unordered reduction (fast-math).

> Now we look at a Reduction Phi and if the FP mode requires the
"Ordered" reduction which is not supported, the whole loop remains
scalar.
Right, but this is orthogonal to having separate intrinsics or not.

  %fast = @llvm.reduce(ext <N x double> ( add <N x float> %a, %b))
  %order = llvm.reduce(ext <N x double> ( add <N x float> %a, %b),
double %acc)

If the IR contains %order, then this will *have* to be scalar if the target
doesn't support native ordered or a fast path to it. And this is up to the
cost model.

> If we would leave this decision to the Cost Model, it will provide a cost
of scalarization.  And at the end we may decide to scalarize reduction operation
inside vector loop. Now, once the taken decision is vectorization, inserting an
intrinsic or generating a plain IR should be a Target decision.
Hum, I'm beginning to see your point, I think. I agree this is again a
target decision, but it's also a larger compiler-wide decision, too.

The target's decision is: IR doesn't have the required semantics, I
*must* use intrinsics. It can't be: I'd rather have intrinsics because
it's easier to match in the back-end.

The first is a requirement, the second is a personal choice, and one that can
impact the generic instruction selection between IR and target specific
selection.

cheers,
--renato
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Renato Golin via llvm-dev

2017-Feb-01 12:39 UTC

head link

[llvm-dev] RFC: Generic IR reductions

On 1 February 2017 at 11:59, Demikhovsky, Elena
<elena.demikhovsky at intel.com> wrote:>   > @llvm.reduce(ext <N x double> ( add <N x float> %a, %b))
>
> And if we don't have %b? We just want to sum all elements of %a?
Something like @llvm.reduce(ext <N x double> ( add <N x float> %a,
zeroinitializer))
Hum, that's a good point. My examples were actually wrong, as they
weren't related to simple reductions. Your zeroinit is the thing I was
looking for.

> Don't we have a problem with constant propagation in this approach?
I'm not sure. Can you expand this?

> I proposed a "generic" intrinsic approach on the BOF (Nov, 2016),
like
>     %scalar = @llvm.reduce(OPCODE, %vector_input) - OPCODE may be a string,
integer or metadata.
I wouldn't use metadata. Integer would be cumbersome and lead to
eventual ABI breakages, and "text" would be the same as:

  %scalar = @llvm.reduce.add(%vector)

which is the same thing Amara proposed.

I'm not saying it is wrong, I'm just worried that, by mandating the
encoding of the reduction into an intrinsic, we'll force the
middle-end to convert high-level code patterns to the intrinsic or the
target will ignore it completely.

There is a pattern already for reductions, and the back-ends already
match it. This should not change, unless there is a serious flaw in it
- for the targets that *already* support it. This is an orthogonal
discussion.

SVE has more restrictions, for instance, one cannot know how many
shuffles to do because the vector size is unknown, so the current
representation is insufficient, in which case, we need the intrinsic.

But replace everything else with intrinsics just because one target
can't cope with it doesn't work.

On thing that does happen is that code optimisations expose patterns
that would otherwise not be apparent. This includes potential
reduction or fusion patterns and can lead to massively smaller code or
even eliding the whole block. If you convert a block to an intrinsic
too early you may lose the ability to merge it back again later, as
we're doing today.

These are all hypothetical wrt SVE, but they did happen in NEON in the
past and were the reason why we only have a handful of NEON
intrinsics. Everything else are encoded with sequences of
instructions.

cheers,
--renato

Demikhovsky, Elena via llvm-dev

2017-Feb-01 13:06 UTC

head link

[llvm-dev] RFC: Generic IR reductions

Constant propagation:

%sum = add <N x float> %a, %b
@llvm.reduce(ext <N x double>  %sum)

if %a and %b are vector of constants, the %sum also becomes a vector of
constants.
At this point you have @llvm.reduce(ext <N x double>  %sum) and don't
know what kind of reduction do you need.

-  Elena

-----Original Message-----
From: Renato Golin [mailto:renato.golin at linaro.org] 
Sent: Wednesday, February 01, 2017 14:40
To: Demikhovsky, Elena <elena.demikhovsky at intel.com>
Cc: Amara Emerson <amara.emerson at gmail.com>; Amara Emerson
<Amara.Emerson at arm.com>; llvm-dev at lists.llvm.org; nd <nd at
arm.com>; Simon Pilgrim <llvm-dev at redking.me.uk>
Subject: Re: [llvm-dev] RFC: Generic IR reductions

On 1 February 2017 at 11:59, Demikhovsky, Elena <elena.demikhovsky at
intel.com> wrote:>   > @llvm.reduce(ext <N x double> ( add <N x float> %a, %b))
>
> And if we don't have %b? We just want to sum all elements of %a? 
> Something like @llvm.reduce(ext <N x double> ( add <N x float>
%a,
> zeroinitializer))
Hum, that's a good point. My examples were actually wrong, as they
weren't related to simple reductions. Your zeroinit is the thing I was
looking for.

> Don't we have a problem with constant propagation in this approach?
I'm not sure. Can you expand this?

> I proposed a "generic" intrinsic approach on the BOF (Nov, 2016),
like
>     %scalar = @llvm.reduce(OPCODE, %vector_input) - OPCODE may be a string,
integer or metadata.
I wouldn't use metadata. Integer would be cumbersome and lead to eventual
ABI breakages, and "text" would be the same as:

  %scalar = @llvm.reduce.add(%vector)

which is the same thing Amara proposed.

I'm not saying it is wrong, I'm just worried that, by mandating the
encoding of the reduction into an intrinsic, we'll force the middle-end to
convert high-level code patterns to the intrinsic or the target will ignore it
completely.

There is a pattern already for reductions, and the back-ends already match it.
This should not change, unless there is a serious flaw in it
- for the targets that *already* support it. This is an orthogonal discussion.

SVE has more restrictions, for instance, one cannot know how many shuffles to do
because the vector size is unknown, so the current representation is
insufficient, in which case, we need the intrinsic.

But replace everything else with intrinsics just because one target can't
cope with it doesn't work.

On thing that does happen is that code optimisations expose patterns that would
otherwise not be apparent. This includes potential reduction or fusion patterns
and can lead to massively smaller code or even eliding the whole block. If you
convert a block to an intrinsic too early you may lose the ability to merge it
back again later, as we're doing today.

These are all hypothetical wrt SVE, but they did happen in NEON in the past and
were the reason why we only have a handful of NEON intrinsics. Everything else
are encoded with sequences of instructions.

cheers,
--renato
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Feb 2017 - RFC: Generic IR reductions

[llvm-dev] RFC: Generic IR reductions

[llvm-dev] RFC: Generic IR reductions

[llvm-dev] RFC: Generic IR reductions

Reasonably Related Threads