thr3ads.net - llvm dev - [llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Haidl, Michael via llvm-dev

2017-Aug-04 13:55 UTC

[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics

I am currently working on a transformation pass that transforms 
masked.load and masked.store intrinsics to (hopefully) increase 
performance on targets where masked.load and masked.store are not legal. 
To check if the loads and stores are necessary at all I take the mask 
for the masked operations and want to reduce them to a single value. 
vector.reduce.or seemed very handy to do the job.

I will take a look into the function you suggested. Maybe I can come up 
with something that drives the development of these intrinsics ahead.

Cheers,
Michael

Am 04.08.2017 um 15:25 schrieb Amara Emerson:> Can you tell us what you're looking to do with the intrinsics?
> 
> On all non-AArch64 targets the ExpandReductions pass will convert them 
> to the shuffle pattern as you're seeing. That pass was written in order
> to allow experimentation of the effects of using reduction intrinsics at 
> the IR level only, hence we convert into the shuffles very late in the 
> pass pipeline.
> 
> Since we haven't seen any adverse effects of representing the
reductions
> as intrinsics at the IR level, I think in that respect the intrinsics 
> have probably proven themselves to be stable. However the error you're 
> seeing is because the AArch64 backend still expects to deal with only 
> intrinsics it can *natively* support, and i1 is not a natively supported 
> type for reductions. See the code in 
> AArch64TargetTransformInfo.cpp:useReductionIntrinsic() for where we 
> decide which reduction types we can support.
> 
> For these cases, we need to implement more generic legalization support 
> in order to either promote to a legal type, or in cases where the target 
> cannot support it as a native operation at all, to expand it to a 
> shuffle pattern as a fallback. Once we have all that in place, I think 
> we're in a strong position to move to the intrinsic form as the 
> canonical representation.
> 
> FYI one of the motivating reasons for these to be introduced was to 
> allow non power-of-2 vector architectures like SVE to express reduction 
> operations.
> 
> Amara
> 
> On 4 August 2017 at 13:36, Haidl, Michael <michael.haidl at
uni-muenster.de
> <mailto:michael.haidl at uni-muenster.de>> wrote:
> 
>     Hi Renato,
> 
>     just to make it clear, I didn't implement reductions on x86_64 they
just
>     worked when I tried to lower an
>     llvm.experimentel.vector.reduce.or.i1.v8i1 intrinsic. A shuffle pattern
>     is generated for the intrinsic.
> 
>              vpshufd $78, %xmm0, %xmm1       # xmm1 = xmm0[2,3,0,1]
>              vpor    %xmm1, %xmm0, %xmm0
>              vpshufd $229, %xmm0, %xmm1      # xmm1 = xmm0[1,1,2,3]
>              vpor    %xmm1, %xmm0, %xmm0
>              vpsrld  $16, %xmm0, %xmm1
>              vpor    %xmm1, %xmm0, %xmm0
>              vpextrb $0, %xmm0, %eax
> 
> 
>     However, on AArche64 I encountered an unreachable where codegen does
not
>     know how to promote the i1 type. Since I am more familiar with the
>     midlevel I have to start digging into codegen. Any hints where to start
>     would be awesome.
> 
>     Cheers,
>     Michael
> 
>     Am 04.08.2017 um 08:18 schrieb Renato Golin:
>      > On 3 August 2017 at 19:48, Haidl, Michael via llvm-dev
>      > <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>      >> thank you for the clarification. I tested the intrinsics
x86_64
>     and it
>      >> seemed to work pretty well. Looking forward to try this
>     intrinsics with
>      >> the AArch64 backend. Maybe I find the time to look into
codegen
>     to get
>      >> this intrinsics out of experimental stage. They seem pretty
useful.
>      >
>      > In addition to Amara's point, it'd be good to have it
working and
>      > default for other architectures before we can move out of
>     experimental
>      > if we indeed intend to make it non-arch-specific (which we do).
>      >
>      > So, if you could share your code for the x86 port, that'd be
great.
>      > But if you could help with the final touches on the code-gen
part,
>      > that'd be awesome.
>      >
>      > cheers,
>      > --renato
>      >
> 
>

Amara Emerson via llvm-dev

2017-Aug-04 13:57 UTC

head link

[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics

Actually for mask vectors of i1 values, you don't need to use reductions at
all(although for SVE this is what we'll do). You can instead bitcast the
vector value to an i8/i16/whatever and then compare against zero.

Amara

On 4 August 2017 at 14:55, Haidl, Michael via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> I am currently working on a transformation pass that transforms
> masked.load and masked.store intrinsics to (hopefully) increase
> performance on targets where masked.load and masked.store are not legal.
> To check if the loads and stores are necessary at all I take the mask
> for the masked operations and want to reduce them to a single value.
> vector.reduce.or seemed very handy to do the job.
>
> I will take a look into the function you suggested. Maybe I can come up
> with something that drives the development of these intrinsics ahead.
>
> Cheers,
> Michael
>
> Am 04.08.2017 um 15:25 schrieb Amara Emerson:
> > Can you tell us what you're looking to do with the intrinsics?
> >
> > On all non-AArch64 targets the ExpandReductions pass will convert them
> > to the shuffle pattern as you're seeing. That pass was written in
order
> > to allow experimentation of the effects of using reduction intrinsics
at
> > the IR level only, hence we convert into the shuffles very late in the
> > pass pipeline.
> >
> > Since we haven't seen any adverse effects of representing the
reductions
> > as intrinsics at the IR level, I think in that respect the intrinsics
> > have probably proven themselves to be stable. However the error
you're
> > seeing is because the AArch64 backend still expects to deal with only
> > intrinsics it can *natively* support, and i1 is not a natively
supported
> > type for reductions. See the code in
> > AArch64TargetTransformInfo.cpp:useReductionIntrinsic() for where we
> > decide which reduction types we can support.
> >
> > For these cases, we need to implement more generic legalization
support
> > in order to either promote to a legal type, or in cases where the
target
> > cannot support it as a native operation at all, to expand it to a
> > shuffle pattern as a fallback. Once we have all that in place, I think
> > we're in a strong position to move to the intrinsic form as the
> > canonical representation.
> >
> > FYI one of the motivating reasons for these to be introduced was to
> > allow non power-of-2 vector architectures like SVE to express
reduction
> > operations.
> >
> > Amara
> >
> > On 4 August 2017 at 13:36, Haidl, Michael <michael.haidl at
uni-muenster.de
> > <mailto:michael.haidl at uni-muenster.de>> wrote:
> >
> >     Hi Renato,
> >
> >     just to make it clear, I didn't implement reductions on x86_64
they
> just
> >     worked when I tried to lower an
> >     llvm.experimentel.vector.reduce.or.i1.v8i1 intrinsic. A shuffle
> pattern
> >     is generated for the intrinsic.
> >
> >              vpshufd $78, %xmm0, %xmm1       # xmm1 = xmm0[2,3,0,1]
> >              vpor    %xmm1, %xmm0, %xmm0
> >              vpshufd $229, %xmm0, %xmm1      # xmm1 = xmm0[1,1,2,3]
> >              vpor    %xmm1, %xmm0, %xmm0
> >              vpsrld  $16, %xmm0, %xmm1
> >              vpor    %xmm1, %xmm0, %xmm0
> >              vpextrb $0, %xmm0, %eax
> >
> >
> >     However, on AArche64 I encountered an unreachable where codegen
does
> not
> >     know how to promote the i1 type. Since I am more familiar with the
> >     midlevel I have to start digging into codegen. Any hints where to
> start
> >     would be awesome.
> >
> >     Cheers,
> >     Michael
> >
> >     Am 04.08.2017 um 08:18 schrieb Renato Golin:
> >      > On 3 August 2017 at 19:48, Haidl, Michael via llvm-dev
> >      > <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> >      >> thank you for the clarification. I tested the intrinsics
x86_64
> >     and it
> >      >> seemed to work pretty well. Looking forward to try this
> >     intrinsics with
> >      >> the AArch64 backend. Maybe I find the time to look into
codegen
> >     to get
> >      >> this intrinsics out of experimental stage. They seem
pretty
> useful.
> >      >
> >      > In addition to Amara's point, it'd be good to have
it working and
> >      > default for other architectures before we can move out of
> >     experimental
> >      > if we indeed intend to make it non-arch-specific (which we
do).
> >      >
> >      > So, if you could share your code for the x86 port,
that'd be
> great.
> >      > But if you could help with the final touches on the code-gen
part,
> >      > that'd be awesome.
> >      >
> >      > cheers,
> >      > --renato
> >      >
> >
> >
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170804/2a76dc8c/attachment.html>

Haidl, Michael via llvm-dev

2017-Aug-04 14:03 UTC

head link

[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics

I assume smaller types like <4 x i1> are getting zero extended to e.g.,
i8?

Am 04.08.2017 um 15:58 schrieb Amara Emerson:> Actually for mask vectors of i1 values, you don't need to use
reductions
> at all(although for SVE this is what we'll do). You can instead bitcast
> the vector value to an i8/i16/whatever and then compare against zero.
> 
> Amara
> 
> On 4 August 2017 at 14:55, Haidl, Michael via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> 
> 
>     I am currently working on a transformation pass that transforms
>     masked.load and masked.store intrinsics to (hopefully) increase
>     performance on targets where masked.load and masked.store are not
legal.
>     To check if the loads and stores are necessary at all I take the mask
>     for the masked operations and want to reduce them to a single value.
>     vector.reduce.or seemed very handy to do the job.
> 
>     I will take a look into the function you suggested. Maybe I can come up
>     with something that drives the development of these intrinsics ahead.
> 
>     Cheers,
>     Michael
> 
>     Am 04.08.2017 um 15:25 schrieb Amara Emerson:
>      > Can you tell us what you're looking to do with the
intrinsics?
>      >
>      > On all non-AArch64 targets the ExpandReductions pass will convert
>     them
>      > to the shuffle pattern as you're seeing. That pass was
written in
>     order
>      > to allow experimentation of the effects of using reduction
>     intrinsics at
>      > the IR level only, hence we convert into the shuffles very late
>     in the
>      > pass pipeline.
>      >
>      > Since we haven't seen any adverse effects of representing the
>     reductions
>      > as intrinsics at the IR level, I think in that respect the
intrinsics
>      > have probably proven themselves to be stable. However the error
>     you're
>      > seeing is because the AArch64 backend still expects to deal with
only
>      > intrinsics it can *natively* support, and i1 is not a natively
>     supported
>      > type for reductions. See the code in
>      > AArch64TargetTransformInfo.cpp:useReductionIntrinsic() for where
we
>      > decide which reduction types we can support.
>      >
>      > For these cases, we need to implement more generic legalization
>     support
>      > in order to either promote to a legal type, or in cases where the
>     target
>      > cannot support it as a native operation at all, to expand it to a
>      > shuffle pattern as a fallback. Once we have all that in place, I
>     think
>      > we're in a strong position to move to the intrinsic form as
the
>      > canonical representation.
>      >
>      > FYI one of the motivating reasons for these to be introduced was
to
>      > allow non power-of-2 vector architectures like SVE to express
>     reduction
>      > operations.
>      >
>      > Amara
>      >
>      > On 4 August 2017 at 13:36, Haidl, Michael
>     <michael.haidl at uni-muenster.de <mailto:michael.haidl at
uni-muenster.de>
>      > <mailto:michael.haidl at uni-muenster.de
>     <mailto:michael.haidl at uni-muenster.de>>> wrote:
>      >
>      >     Hi Renato,
>      >
>      >     just to make it clear, I didn't implement reductions on
>     x86_64 they just
>      >     worked when I tried to lower an
>      >     llvm.experimentel.vector.reduce.or.i1.v8i1 intrinsic. A
>     shuffle pattern
>      >     is generated for the intrinsic.
>      >
>      >              vpshufd $78, %xmm0, %xmm1       # xmm1 =
xmm0[2,3,0,1]
>      >              vpor    %xmm1, %xmm0, %xmm0
>      >              vpshufd $229, %xmm0, %xmm1      # xmm1 =
xmm0[1,1,2,3]
>      >              vpor    %xmm1, %xmm0, %xmm0
>      >              vpsrld  $16, %xmm0, %xmm1
>      >              vpor    %xmm1, %xmm0, %xmm0
>      >              vpextrb $0, %xmm0, %eax
>      >
>      >
>      >     However, on AArche64 I encountered an unreachable where
>     codegen does not
>      >     know how to promote the i1 type. Since I am more familiar
>     with the
>      >     midlevel I have to start digging into codegen. Any hints
>     where to start
>      >     would be awesome.
>      >
>      >     Cheers,
>      >     Michael
>      >
>      >     Am 04.08.2017 um 08:18 schrieb Renato Golin:
>      >      > On 3 August 2017 at 19:48, Haidl, Michael via llvm-dev
>      >      > <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>     <mailto:llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>>
>     wrote:
>      >      >> thank you for the clarification. I tested the
intrinsics
>     x86_64
>      >     and it
>      >      >> seemed to work pretty well. Looking forward to try
this
>      >     intrinsics with
>      >      >> the AArch64 backend. Maybe I find the time to look
into
>     codegen
>      >     to get
>      >      >> this intrinsics out of experimental stage. They
seem
>     pretty useful.
>      >      >
>      >      > In addition to Amara's point, it'd be good to
have it
>     working and
>      >      > default for other architectures before we can move out
of
>      >     experimental
>      >      > if we indeed intend to make it non-arch-specific (which
we
>     do).
>      >      >
>      >      > So, if you could share your code for the x86 port,
that'd
>     be great.
>      >      > But if you could help with the final touches on the
>     code-gen part,
>      >      > that'd be awesome.
>      >      >
>      >      > cheers,
>      >      > --renato
>      >      >
>      >
>      >
> 
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Aug 2017 - Status of llvm.experimental.vector.reduce.* intrinsics

[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics

[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics

[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics

Possibly Parallel Threads