thr3ads.net - llvm dev - [llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Haidl, Michael via llvm-dev

2017-Aug-04 14:03 UTC

[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics

I assume smaller types like <4 x i1> are getting zero extended to e.g.,
i8?

Am 04.08.2017 um 15:58 schrieb Amara Emerson:> Actually for mask vectors of i1 values, you don't need to use
reductions
> at all(although for SVE this is what we'll do). You can instead bitcast
> the vector value to an i8/i16/whatever and then compare against zero.
> 
> Amara
> 
> On 4 August 2017 at 14:55, Haidl, Michael via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> 
> 
>     I am currently working on a transformation pass that transforms
>     masked.load and masked.store intrinsics to (hopefully) increase
>     performance on targets where masked.load and masked.store are not
legal.
>     To check if the loads and stores are necessary at all I take the mask
>     for the masked operations and want to reduce them to a single value.
>     vector.reduce.or seemed very handy to do the job.
> 
>     I will take a look into the function you suggested. Maybe I can come up
>     with something that drives the development of these intrinsics ahead.
> 
>     Cheers,
>     Michael
> 
>     Am 04.08.2017 um 15:25 schrieb Amara Emerson:
>      > Can you tell us what you're looking to do with the
intrinsics?
>      >
>      > On all non-AArch64 targets the ExpandReductions pass will convert
>     them
>      > to the shuffle pattern as you're seeing. That pass was
written in
>     order
>      > to allow experimentation of the effects of using reduction
>     intrinsics at
>      > the IR level only, hence we convert into the shuffles very late
>     in the
>      > pass pipeline.
>      >
>      > Since we haven't seen any adverse effects of representing the
>     reductions
>      > as intrinsics at the IR level, I think in that respect the
intrinsics
>      > have probably proven themselves to be stable. However the error
>     you're
>      > seeing is because the AArch64 backend still expects to deal with
only
>      > intrinsics it can *natively* support, and i1 is not a natively
>     supported
>      > type for reductions. See the code in
>      > AArch64TargetTransformInfo.cpp:useReductionIntrinsic() for where
we
>      > decide which reduction types we can support.
>      >
>      > For these cases, we need to implement more generic legalization
>     support
>      > in order to either promote to a legal type, or in cases where the
>     target
>      > cannot support it as a native operation at all, to expand it to a
>      > shuffle pattern as a fallback. Once we have all that in place, I
>     think
>      > we're in a strong position to move to the intrinsic form as
the
>      > canonical representation.
>      >
>      > FYI one of the motivating reasons for these to be introduced was
to
>      > allow non power-of-2 vector architectures like SVE to express
>     reduction
>      > operations.
>      >
>      > Amara
>      >
>      > On 4 August 2017 at 13:36, Haidl, Michael
>     <michael.haidl at uni-muenster.de <mailto:michael.haidl at
uni-muenster.de>
>      > <mailto:michael.haidl at uni-muenster.de
>     <mailto:michael.haidl at uni-muenster.de>>> wrote:
>      >
>      >     Hi Renato,
>      >
>      >     just to make it clear, I didn't implement reductions on
>     x86_64 they just
>      >     worked when I tried to lower an
>      >     llvm.experimentel.vector.reduce.or.i1.v8i1 intrinsic. A
>     shuffle pattern
>      >     is generated for the intrinsic.
>      >
>      >              vpshufd $78, %xmm0, %xmm1       # xmm1 =
xmm0[2,3,0,1]
>      >              vpor    %xmm1, %xmm0, %xmm0
>      >              vpshufd $229, %xmm0, %xmm1      # xmm1 =
xmm0[1,1,2,3]
>      >              vpor    %xmm1, %xmm0, %xmm0
>      >              vpsrld  $16, %xmm0, %xmm1
>      >              vpor    %xmm1, %xmm0, %xmm0
>      >              vpextrb $0, %xmm0, %eax
>      >
>      >
>      >     However, on AArche64 I encountered an unreachable where
>     codegen does not
>      >     know how to promote the i1 type. Since I am more familiar
>     with the
>      >     midlevel I have to start digging into codegen. Any hints
>     where to start
>      >     would be awesome.
>      >
>      >     Cheers,
>      >     Michael
>      >
>      >     Am 04.08.2017 um 08:18 schrieb Renato Golin:
>      >      > On 3 August 2017 at 19:48, Haidl, Michael via llvm-dev
>      >      > <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>     <mailto:llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>>
>     wrote:
>      >      >> thank you for the clarification. I tested the
intrinsics
>     x86_64
>      >     and it
>      >      >> seemed to work pretty well. Looking forward to try
this
>      >     intrinsics with
>      >      >> the AArch64 backend. Maybe I find the time to look
into
>     codegen
>      >     to get
>      >      >> this intrinsics out of experimental stage. They
seem
>     pretty useful.
>      >      >
>      >      > In addition to Amara's point, it'd be good to
have it
>     working and
>      >      > default for other architectures before we can move out
of
>      >     experimental
>      >      > if we indeed intend to make it non-arch-specific (which
we
>     do).
>      >      >
>      >      > So, if you could share your code for the x86 port,
that'd
>     be great.
>      >      > But if you could help with the final touches on the
>     code-gen part,
>      >      > that'd be awesome.
>      >      >
>      >      > cheers,
>      >      > --renato
>      >      >
>      >
>      >
> 
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
>

Renato Golin via llvm-dev

2017-Aug-04 14:11 UTC

head link

[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics

It may not be related, but there was a talk on EuroLLVM regarding i1 types
on x86 vector expansion with some pitfalls. I recommend you to have a look.

Is the aarch64 error an assert/internal one?  If so, we may need better
error handling...

On 4 Aug 2017 11:03 a.m., "Haidl, Michael via llvm-dev" <
llvm-dev at lists.llvm.org> wrote:
> I assume smaller types like <4 x i1> are getting zero extended to
e.g., i8?
>
> Am 04.08.2017 um 15:58 schrieb Amara Emerson:
> > Actually for mask vectors of i1 values, you don't need to use
reductions
> > at all(although for SVE this is what we'll do). You can instead
bitcast
> > the vector value to an i8/i16/whatever and then compare against zero.
> >
> > Amara
> >
> > On 4 August 2017 at 14:55, Haidl, Michael via llvm-dev
> > <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> >
> >
> >     I am currently working on a transformation pass that transforms
> >     masked.load and masked.store intrinsics to (hopefully) increase
> >     performance on targets where masked.load and masked.store are not
> legal.
> >     To check if the loads and stores are necessary at all I take the
mask
> >     for the masked operations and want to reduce them to a single
value.
> >     vector.reduce.or seemed very handy to do the job.
> >
> >     I will take a look into the function you suggested. Maybe I can
come
> up
> >     with something that drives the development of these intrinsics
ahead.
> >
> >     Cheers,
> >     Michael
> >
> >     Am 04.08.2017 um 15:25 schrieb Amara Emerson:
> >      > Can you tell us what you're looking to do with the
intrinsics?
> >      >
> >      > On all non-AArch64 targets the ExpandReductions pass will
convert
> >     them
> >      > to the shuffle pattern as you're seeing. That pass was
written in
> >     order
> >      > to allow experimentation of the effects of using reduction
> >     intrinsics at
> >      > the IR level only, hence we convert into the shuffles very
late
> >     in the
> >      > pass pipeline.
> >      >
> >      > Since we haven't seen any adverse effects of
representing the
> >     reductions
> >      > as intrinsics at the IR level, I think in that respect the
> intrinsics
> >      > have probably proven themselves to be stable. However the
error
> >     you're
> >      > seeing is because the AArch64 backend still expects to deal
with
> only
> >      > intrinsics it can *natively* support, and i1 is not a
natively
> >     supported
> >      > type for reductions. See the code in
> >      > AArch64TargetTransformInfo.cpp:useReductionIntrinsic() for
where
> we
> >      > decide which reduction types we can support.
> >      >
> >      > For these cases, we need to implement more generic
legalization
> >     support
> >      > in order to either promote to a legal type, or in cases
where the
> >     target
> >      > cannot support it as a native operation at all, to expand it
to a
> >      > shuffle pattern as a fallback. Once we have all that in
place, I
> >     think
> >      > we're in a strong position to move to the intrinsic form
as the
> >      > canonical representation.
> >      >
> >      > FYI one of the motivating reasons for these to be introduced
was
> to
> >      > allow non power-of-2 vector architectures like SVE to
express
> >     reduction
> >      > operations.
> >      >
> >      > Amara
> >      >
> >      > On 4 August 2017 at 13:36, Haidl, Michael
> >     <michael.haidl at uni-muenster.de <mailto:michael.haidl at
uni-muenster.de
> >
> >      > <mailto:michael.haidl at uni-muenster.de
> >     <mailto:michael.haidl at uni-muenster.de>>> wrote:
> >      >
> >      >     Hi Renato,
> >      >
> >      >     just to make it clear, I didn't implement reductions
on
> >     x86_64 they just
> >      >     worked when I tried to lower an
> >      >     llvm.experimentel.vector.reduce.or.i1.v8i1 intrinsic. A
> >     shuffle pattern
> >      >     is generated for the intrinsic.
> >      >
> >      >              vpshufd $78, %xmm0, %xmm1       # xmm1 >
xmm0[2,3,0,1]
> >      >              vpor    %xmm1, %xmm0, %xmm0
> >      >              vpshufd $229, %xmm0, %xmm1      # xmm1 >
xmm0[1,1,2,3]
> >      >              vpor    %xmm1, %xmm0, %xmm0
> >      >              vpsrld  $16, %xmm0, %xmm1
> >      >              vpor    %xmm1, %xmm0, %xmm0
> >      >              vpextrb $0, %xmm0, %eax
> >      >
> >      >
> >      >     However, on AArche64 I encountered an unreachable where
> >     codegen does not
> >      >     know how to promote the i1 type. Since I am more
familiar
> >     with the
> >      >     midlevel I have to start digging into codegen. Any hints
> >     where to start
> >      >     would be awesome.
> >      >
> >      >     Cheers,
> >      >     Michael
> >      >
> >      >     Am 04.08.2017 um 08:18 schrieb Renato Golin:
> >      >      > On 3 August 2017 at 19:48, Haidl, Michael via
llvm-dev
> >      >      > <llvm-dev at lists.llvm.org <mailto:llvm-dev
at lists.llvm.org>
> >     <mailto:llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>>
> >     wrote:
> >      >      >> thank you for the clarification. I tested the
intrinsics
> >     x86_64
> >      >     and it
> >      >      >> seemed to work pretty well. Looking forward to
try this
> >      >     intrinsics with
> >      >      >> the AArch64 backend. Maybe I find the time to
look into
> >     codegen
> >      >     to get
> >      >      >> this intrinsics out of experimental stage.
They seem
> >     pretty useful.
> >      >      >
> >      >      > In addition to Amara's point, it'd be good
to have it
> >     working and
> >      >      > default for other architectures before we can move
out of
> >      >     experimental
> >      >      > if we indeed intend to make it non-arch-specific
(which we
> >     do).
> >      >      >
> >      >      > So, if you could share your code for the x86 port,
that'd
> >     be great.
> >      >      > But if you could help with the final touches on
the
> >     code-gen part,
> >      >      > that'd be awesome.
> >      >      >
> >      >      > cheers,
> >      >      > --renato
> >      >      >
> >      >
> >      >
> >
> >     _______________________________________________
> >     LLVM Developers mailing list
> >     llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
> >     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> >
> >
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170804/91b82b2b/attachment.html>

Amara Emerson via llvm-dev

2017-Aug-04 14:20 UTC

head link

[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics

Bitcasting is only valid between types of the same size, so you can bitcast
to i4 and then directly do a cmp i4 %castval, 0 etc.

Amara

On 4 August 2017 at 15:03, Haidl, Michael <michael.haidl at
uni-muenster.de>
wrote:
> I assume smaller types like <4 x i1> are getting zero extended to
e.g., i8?
>
> Am 04.08.2017 um 15:58 schrieb Amara Emerson:
> > Actually for mask vectors of i1 values, you don't need to use
reductions
> > at all(although for SVE this is what we'll do). You can instead
bitcast
> > the vector value to an i8/i16/whatever and then compare against zero.
> >
> > Amara
> >
> > On 4 August 2017 at 14:55, Haidl, Michael via llvm-dev
> > <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> >
> >
> >     I am currently working on a transformation pass that transforms
> >     masked.load and masked.store intrinsics to (hopefully) increase
> >     performance on targets where masked.load and masked.store are not
> legal.
> >     To check if the loads and stores are necessary at all I take the
mask
> >     for the masked operations and want to reduce them to a single
value.
> >     vector.reduce.or seemed very handy to do the job.
> >
> >     I will take a look into the function you suggested. Maybe I can
come
> up
> >     with something that drives the development of these intrinsics
ahead.
> >
> >     Cheers,
> >     Michael
> >
> >     Am 04.08.2017 um 15:25 schrieb Amara Emerson:
> >      > Can you tell us what you're looking to do with the
intrinsics?
> >      >
> >      > On all non-AArch64 targets the ExpandReductions pass will
convert
> >     them
> >      > to the shuffle pattern as you're seeing. That pass was
written in
> >     order
> >      > to allow experimentation of the effects of using reduction
> >     intrinsics at
> >      > the IR level only, hence we convert into the shuffles very
late
> >     in the
> >      > pass pipeline.
> >      >
> >      > Since we haven't seen any adverse effects of
representing the
> >     reductions
> >      > as intrinsics at the IR level, I think in that respect the
> intrinsics
> >      > have probably proven themselves to be stable. However the
error
> >     you're
> >      > seeing is because the AArch64 backend still expects to deal
with
> only
> >      > intrinsics it can *natively* support, and i1 is not a
natively
> >     supported
> >      > type for reductions. See the code in
> >      > AArch64TargetTransformInfo.cpp:useReductionIntrinsic() for
where
> we
> >      > decide which reduction types we can support.
> >      >
> >      > For these cases, we need to implement more generic
legalization
> >     support
> >      > in order to either promote to a legal type, or in cases
where the
> >     target
> >      > cannot support it as a native operation at all, to expand it
to a
> >      > shuffle pattern as a fallback. Once we have all that in
place, I
> >     think
> >      > we're in a strong position to move to the intrinsic form
as the
> >      > canonical representation.
> >      >
> >      > FYI one of the motivating reasons for these to be introduced
was
> to
> >      > allow non power-of-2 vector architectures like SVE to
express
> >     reduction
> >      > operations.
> >      >
> >      > Amara
> >      >
> >      > On 4 August 2017 at 13:36, Haidl, Michael
> >     <michael.haidl at uni-muenster.de <mailto:michael.haidl at
uni-muenster.de
> >
> >      > <mailto:michael.haidl at uni-muenster.de
> >     <mailto:michael.haidl at uni-muenster.de>>> wrote:
> >      >
> >      >     Hi Renato,
> >      >
> >      >     just to make it clear, I didn't implement reductions
on
> >     x86_64 they just
> >      >     worked when I tried to lower an
> >      >     llvm.experimentel.vector.reduce.or.i1.v8i1 intrinsic. A
> >     shuffle pattern
> >      >     is generated for the intrinsic.
> >      >
> >      >              vpshufd $78, %xmm0, %xmm1       # xmm1 >
xmm0[2,3,0,1]
> >      >              vpor    %xmm1, %xmm0, %xmm0
> >      >              vpshufd $229, %xmm0, %xmm1      # xmm1 >
xmm0[1,1,2,3]
> >      >              vpor    %xmm1, %xmm0, %xmm0
> >      >              vpsrld  $16, %xmm0, %xmm1
> >      >              vpor    %xmm1, %xmm0, %xmm0
> >      >              vpextrb $0, %xmm0, %eax
> >      >
> >      >
> >      >     However, on AArche64 I encountered an unreachable where
> >     codegen does not
> >      >     know how to promote the i1 type. Since I am more
familiar
> >     with the
> >      >     midlevel I have to start digging into codegen. Any hints
> >     where to start
> >      >     would be awesome.
> >      >
> >      >     Cheers,
> >      >     Michael
> >      >
> >      >     Am 04.08.2017 um 08:18 schrieb Renato Golin:
> >      >      > On 3 August 2017 at 19:48, Haidl, Michael via
llvm-dev
> >      >      > <llvm-dev at lists.llvm.org <mailto:llvm-dev
at lists.llvm.org>
> >     <mailto:llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>>
> >     wrote:
> >      >      >> thank you for the clarification. I tested the
intrinsics
> >     x86_64
> >      >     and it
> >      >      >> seemed to work pretty well. Looking forward to
try this
> >      >     intrinsics with
> >      >      >> the AArch64 backend. Maybe I find the time to
look into
> >     codegen
> >      >     to get
> >      >      >> this intrinsics out of experimental stage.
They seem
> >     pretty useful.
> >      >      >
> >      >      > In addition to Amara's point, it'd be good
to have it
> >     working and
> >      >      > default for other architectures before we can move
out of
> >      >     experimental
> >      >      > if we indeed intend to make it non-arch-specific
(which we
> >     do).
> >      >      >
> >      >      > So, if you could share your code for the x86 port,
that'd
> >     be great.
> >      >      > But if you could help with the final touches on
the
> >     code-gen part,
> >      >      > that'd be awesome.
> >      >      >
> >      >      > cheers,
> >      >      > --renato
> >      >      >
> >      >
> >      >
> >
> >     _______________________________________________
> >     LLVM Developers mailing list
> >     llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
> >     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> >
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170804/51484f1e/attachment.html>

Haidl, Michael via llvm-dev

2017-Aug-04 14:28 UTC

head link

[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics

Thanks, I already found it out the hard way ;) Now it works and looks 
nice and shiny.

Michael

Am 04.08.2017 um 16:20 schrieb Amara Emerson:> Bitcasting is only valid between types of the same size, so you 
> can bitcast to i4 and then directly do a cmp i4 %castval, 0 etc.
> 
> Amara
> 
> On 4 August 2017 at 15:03, Haidl, Michael <michael.haidl at
uni-muenster.de
> <mailto:michael.haidl at uni-muenster.de>> wrote:
> 
>     I assume smaller types like <4 x i1> are getting zero extended to
>     e.g., i8?
> 
>     Am 04.08.2017 um 15:58 schrieb Amara Emerson:
>     > Actually for mask vectors of i1 values, you don't need to use
reductions
>     > at all(although for SVE this is what we'll do). You can
instead bitcast
>     > the vector value to an i8/i16/whatever and then compare against
zero.
>     >
>     > Amara
>     >
>     > On 4 August 2017 at 14:55, Haidl, Michael via llvm-dev
>      > <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>     <mailto:llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>>
>     wrote:
>      >
>      >
>      >     I am currently working on a transformation pass that
transforms
>      >     masked.load and masked.store intrinsics to (hopefully)
increase
>      >     performance on targets where masked.load and masked.store are
>     not legal.
>      >     To check if the loads and stores are necessary at all I take
>     the mask
>      >     for the masked operations and want to reduce them to a single
>     value.
>      >     vector.reduce.or seemed very handy to do the job.
>      >
>      >     I will take a look into the function you suggested. Maybe I
>     can come up
>      >     with something that drives the development of these
>     intrinsics ahead.
>      >
>      >     Cheers,
>      >     Michael
>      >
>      >     Am 04.08.2017 um 15:25 schrieb Amara Emerson:
>      >      > Can you tell us what you're looking to do with the
intrinsics?
>      >      >
>      >      > On all non-AArch64 targets the ExpandReductions pass
will
>     convert
>      >     them
>      >      > to the shuffle pattern as you're seeing. That pass
was
>     written in
>      >     order
>      >      > to allow experimentation of the effects of using
reduction
>      >     intrinsics at
>      >      > the IR level only, hence we convert into the shuffles
very
>     late
>      >     in the
>      >      > pass pipeline.
>      >      >
>      >      > Since we haven't seen any adverse effects of
representing the
>      >     reductions
>      >      > as intrinsics at the IR level, I think in that respect
the
>     intrinsics
>      >      > have probably proven themselves to be stable. However
the
>     error
>      >     you're
>      >      > seeing is because the AArch64 backend still expects to
>     deal with only
>      >      > intrinsics it can *natively* support, and i1 is not a
natively
>      >     supported
>      >      > type for reductions. See the code in
>      >      > AArch64TargetTransformInfo.cpp:useReductionIntrinsic()
for
>     where we
>      >      > decide which reduction types we can support.
>      >      >
>      >      > For these cases, we need to implement more generic
>     legalization
>      >     support
>      >      > in order to either promote to a legal type, or in cases
>     where the
>      >     target
>      >      > cannot support it as a native operation at all, to
expand
>     it to a
>      >      > shuffle pattern as a fallback. Once we have all that in
>     place, I
>      >     think
>      >      > we're in a strong position to move to the intrinsic
form
>     as the
>      >      > canonical representation.
>      >      >
>      >      > FYI one of the motivating reasons for these to be
>     introduced was to
>      >      > allow non power-of-2 vector architectures like SVE to
express
>      >     reduction
>      >      > operations.
>      >      >
>      >      > Amara
>      >      >
>      >      > On 4 August 2017 at 13:36, Haidl, Michael
>      >     <michael.haidl at uni-muenster.de
>     <mailto:michael.haidl at uni-muenster.de>
>     <mailto:michael.haidl at uni-muenster.de
>     <mailto:michael.haidl at uni-muenster.de>>
>      >      > <mailto:michael.haidl at uni-muenster.de
>     <mailto:michael.haidl at uni-muenster.de>
>      >     <mailto:michael.haidl at uni-muenster.de
>     <mailto:michael.haidl at uni-muenster.de>>>> wrote:
>      >      >
>      >      >     Hi Renato,
>      >      >
>      >      >     just to make it clear, I didn't implement
reductions on
>      >     x86_64 they just
>      >      >     worked when I tried to lower an
>      >      >     llvm.experimentel.vector.reduce.or.i1.v8i1
intrinsic. A
>      >     shuffle pattern
>      >      >     is generated for the intrinsic.
>      >      >
>      >      >              vpshufd $78, %xmm0, %xmm1       # xmm1
>     xmm0[2,3,0,1]
>      >      >              vpor    %xmm1, %xmm0, %xmm0
>      >      >              vpshufd $229, %xmm0, %xmm1      # xmm1
>     xmm0[1,1,2,3]
>      >      >              vpor    %xmm1, %xmm0, %xmm0
>      >      >              vpsrld  $16, %xmm0, %xmm1
>      >      >              vpor    %xmm1, %xmm0, %xmm0
>      >      >              vpextrb $0, %xmm0, %eax
>      >      >
>      >      >
>      >      >     However, on AArche64 I encountered an unreachable
where
>      >     codegen does not
>      >      >     know how to promote the i1 type. Since I am more
familiar
>      >     with the
>      >      >     midlevel I have to start digging into codegen. Any
hints
>      >     where to start
>      >      >     would be awesome.
>      >      >
>      >      >     Cheers,
>      >      >     Michael
>      >      >
>      >      >     Am 04.08.2017 um 08:18 schrieb Renato Golin:
>      >      >      > On 3 August 2017 at 19:48, Haidl, Michael via
llvm-dev
>      >      >      > <llvm-dev at lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org> <mailto:llvm-dev at
lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org>>
>      >     <mailto:llvm-dev at lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org> <mailto:llvm-dev at
lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org>>>>
>      >     wrote:
>      >      >      >> thank you for the clarification. I tested
the
>     intrinsics
>      >     x86_64
>      >      >     and it
>      >      >      >> seemed to work pretty well. Looking
forward to try
>     this
>      >      >     intrinsics with
>      >      >      >> the AArch64 backend. Maybe I find the
time to look
>     into
>      >     codegen
>      >      >     to get
>      >      >      >> this intrinsics out of experimental
stage. They seem
>      >     pretty useful.
>      >      >      >
>      >      >      > In addition to Amara's point, it'd be
good to have it
>      >     working and
>      >      >      > default for other architectures before we can
move
>     out of
>      >      >     experimental
>      >      >      > if we indeed intend to make it
non-arch-specific
>     (which we
>      >     do).
>      >      >      >
>      >      >      > So, if you could share your code for the x86
port,
>     that'd
>      >     be great.
>      >      >      > But if you could help with the final touches
on the
>      >     code-gen part,
>      >      >      > that'd be awesome.
>      >      >      >
>      >      >      > cheers,
>      >      >      > --renato
>      >      >      >
>      >      >
>      >      >
>      >
>      >     _______________________________________________
>      >     LLVM Developers mailing list
>      > llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>
>     <mailto:llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>>
>      > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>      >     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>>
>      >
>      >
> 
>

llvm dev - Aug 2017 - Status of llvm.experimental.vector.reduce.* intrinsics

[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics

[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics

[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics

[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics