thr3ads.net - llvm dev - [llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions? [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Nikita Popov via llvm-dev

2019-Feb-09 17:37 UTC

[llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?

On Sat, Feb 9, 2019 at 6:25 PM Simon Pilgrim <llvm-dev at redking.me.uk>
wrote:
> The add/sub (+mul) overflow intrinsics are being updated to support
> vectors to match the related add/sub saturation intrinsics. We haven't
> updated the docs yet as legalization, vectorization and various minor bits
> of plumbing still need to be finished before it can be officially supported
> (Nikita Popov has been looking at the legalization recently).
>
> Regarding the reduction functions - I think the integer intrinsics at
> least are relatively stable and we can probably investigate dropping the
> experimental tag before the next release (assuming someone has the time to
> take on the work) - it'd be nice to have the SLP vectorizer emit
reduction
> intrinsics directly for these.
>
> The floating point intrinsics are trickier as they (may) have stricter
> ordering constraints that is still causing issues and may need tweaking
> (e.g. see PR36734).
>The vector reduction intrinsics still need quite a lot of work. Apart from
SplitVecOp, all legalizations are currently missing. This is only
noticeable on AArch64 right now, because all other targets expand vector
reductions prior to codegen.

Nikita

> On 09/02/2019 16:17, Sanjay Patel wrote:
>
> The IR update to allow vector types was here:
> https://reviews.llvm.org/D57090
> ...we didn't update the docs at that time because it was not clear what
> the backend would do with that, but that might've changed with some of
the
> more recent patches.
>
> On Sat, Feb 9, 2019 at 1:42 AM Craig Topper via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> I don't think I understand your pseudocode using
>> llvm.experimental.vector.reduce.umax. All of the types you showed are
>> scalar, but that intrinsic doesn't work on scalars so I'm
having a hard
>> time understanding what you're trying to do with it.
>> llvm.experimental.vector.reduce.umax takes a vector input and returns a
>> scalar result. Are you wanting to find if any of the additions
overflowed
>> or a mask of which addition overflowed?
>>
>> The sadd.with.overflow intrinsics are in the process of gaining vector
>> support if not already complete. Simon Pilgrim made some commits
recently.
>> I know the documentation in the LangRef hasn't been updated. It
will return
>> a <X x i1> vector for overflow instead i1 when vectors are used.
>>
>> ~Craig
>>
>>
>> On Fri, Feb 8, 2019 at 11:03 PM Andrew Kelley via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> I'm interested in using
@llvm.experimental.vector.reduce.smax/umax to
>>> implement runtime overflow checking for vectors. Here's an
example
>>> checked addition, without vectors, and then I'll follow the
example with
>>> what I would do for checked addition with vectors.
>>>
>>> Frontend code (zig):
>>>
>>> export fn entry() void {
>>>     var a: i32 = 1;
>>>     var b: i32 = 2;
>>>     var x = a + b;
>>> }
>>>
>>> LLVM IR code:
>>>
>>> define void @entry() #2 !dbg !41 {
>>> Entry:
>>>   %a = alloca i32, align 4
>>>   %b = alloca i32, align 4
>>>   %x = alloca i32, align 4
>>>   store i32 1, i32* %a, align 4, !dbg !52
>>>   call void @llvm.dbg.declare(metadata i32* %a, metadata !45,
metadata
>>> !DIExpression()), !dbg !52
>>>   store i32 2, i32* %b, align 4, !dbg !53
>>>   call void @llvm.dbg.declare(metadata i32* %b, metadata !48,
metadata
>>> !DIExpression()), !dbg !53
>>>   %0 = load i32, i32* %a, align 4, !dbg !54
>>>   %1 = load i32, i32* %b, align 4, !dbg !55
>>>   %2 = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %0, i32
%1),
>>> !dbg !56
>>>   %3 = extractvalue { i32, i1 } %2, 0, !dbg !56
>>>   %4 = extractvalue { i32, i1 } %2, 1, !dbg !56
>>>   br i1 %4, label %OverflowFail, label %OverflowOk, !dbg !56
>>>
>>> OverflowFail:                                     ; preds = %Entry
>>>   tail call fastcc void @panic(%"[]u8"* @2, %StackTrace*
null), !dbg !56
>>>   unreachable, !dbg !56
>>>
>>> OverflowOk:                                       ; preds = %Entry
>>>   store i32 %3, i32* %x, align 4, !dbg !57
>>>   call void @llvm.dbg.declare(metadata i32* %x, metadata !50,
metadata
>>> !DIExpression()), !dbg !57
>>>   ret void, !dbg !58
>>> }
>>>
>>> You can see this takes advantage of @llvm.sadd.with.overflow, which
is
>>> not available with vectors. So here is a different approach
(pseudocode):
>>>
>>> %a_zext = zext %a to i33 # 1 more bit
>>> %b_zext = zext %b to i33 # 1 more bit
>>> %result_zext = add %a_zext, %b_zext
>>> %max_result = @llvm.experimental.vector.reduce.umax(%result_zext)
>>> %overflow = icmp %max_result > @max_i32_value
>>> %result = trunc %result_zext to i32
>>>
>>> You can imagine how this would work for signed integers, replacing
zext
>>> with sext and umax with smax.
>>>
>>> This depends on an "experimental" API. Can anyone advise
on depending on
>>> this API? Is it a bad idea? Is it about to be promoted to
>>> non-experimental soon? Can anyone advise on how to best achieve my
goal?
>>>
>>> Kind regards,
>>> Andrew
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190209/73af8193/attachment.html>

Andrew Kelley via llvm-dev

2019-Feb-09 18:05 UTC

head link

[llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?

>>     On Sat, Feb 9, 2019 at 1:42 AM Craig Topper via llvm-dev
>>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>>
>>         I don't think I understand your pseudocode using
>>         llvm.experimental.vector.reduce.umax. All of the types you
>>         showed are scalar, but that intrinsic doesn't work on
scalars
>>         so I'm having a hard time understanding what you're
trying to
>>         do with it. llvm.experimental.vector.reduce.umax takes a
>>         vector input and returns a scalar result. Are you wanting to
>>         find if any of the additions overflowed or a mask of which
>>         addition overflowed?
Apologies for the confusion - let me try to clarify. Here is frontend
code that works now:

export fn entry() void {
    var a: @Vector(4, i32) = []i32{ 1, 2, 3, 4 };
    var b: @Vector(4, i32) = []i32{ 5, 6, 7, 8 };
    var x = a +% b;
}

This generates the following LLVM IR code:

define void @entry() #2 !dbg !41 {
Entry:
  %a = alloca <4 x i32>, align 16
  %b = alloca <4 x i32>, align 16
  %x = alloca <4 x i32>, align 16
  store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32>* %a,
align 16,
!dbg !55
  call void @llvm.dbg.declare(metadata <4 x i32>* %a, metadata !45,
metadata !DIExpression()), !dbg !55
  store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32>* %b,
align 16,
!dbg !56
  call void @llvm.dbg.declare(metadata <4 x i32>* %b, metadata !51,
metadata !DIExpression()), !dbg !56
  %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57
  %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58
  %2 = add <4 x i32> %0, %1, !dbg !59
  store <4 x i32> %2, <4 x i32>* %x, align 16, !dbg !60
  call void @llvm.dbg.declare(metadata <4 x i32>* %x, metadata !53,
metadata !DIExpression()), !dbg !60
  ret void, !dbg !61
}

However I used the +% operator, which in Zig is wrapping addition. Now I
want to implement the + operator for vectors, which Zig defines to panic
if any of the elements overflowed. Here is how the IR could look for this:

define void @entry() #2 !dbg !41 {
Entry:
  %a = alloca <4 x i32>, align 16
  %b = alloca <4 x i32>, align 16
  %x = alloca <4 x i32>, align 16
  store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32>* %a,
align 16,
!dbg !55
  store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32>* %b,
align 16,
!dbg !56
  %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57
  %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58
  %2 = call { <4 x i32>, <4 x i1> } @llvm.sadd.with.overflow.i32(i32
%0,
i32 %1)
  %3 = extractvalue { <4 x i32>, <4 x i1> } %2, 0, !dbg !56
  %4 = extractvalue { <4 x i32>, <4 x i1> } %2, 1, !dbg !56
  %5 = call i1 @llvm.experimental.vector.reduce.umax.i1.v4i1(%4)
  br i1 %5, label %OverflowFail, label %OverflowOk, !dbg !56

OverflowFail:                                     ; preds = %Entry
  tail call fastcc void @panic(%"[]u8"* @2, %StackTrace* null), !dbg
!56
  unreachable, !dbg !56

OverflowOk:                                       ; preds = %Entry
  store <4 x i32> %3, <4 x i32>* %x, align 16, !dbg !60
  ret void, !dbg !61
}

You can see that it depends on @llvm.sadd.with.overflow working on
vector types, and it relies on @llvm.experimental.vector.reduce.umax. I
will note that my strategy with sign extension and icmp would be a
semantically equivalent alternative to @llvm.sadd.with.overflow.

On 2/9/19 12:37 PM, Nikita Popov wrote:> On Sat, Feb 9, 2019 at 6:25 PM Simon Pilgrim <llvm-dev at redking.me.uk
> <mailto:llvm-dev at redking.me.uk>> wrote:
>     Regarding the reduction functions - I think the integer intrinsics
>     at least are relatively stable and we can probably investigate
>     dropping the experimental tag before the next release (assuming
>     someone has the time to take on the work) - it'd be nice to have
the
>     SLP vectorizer emit reduction intrinsics directly for these.
> 
> The vector reduction intrinsics still need quite a lot of work. Apart
> from SplitVecOp, all legalizations are currently missing. This is only
> noticeable on AArch64 right now, because all other targets expand vector
> reductions prior to codegen.
My follow-up question, then, is this:

What do you recommend, in terms of LLVM IR, in order to obtain the %5
value above?

Thanks for the help,
Andrew


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190209/5feb6cc8/attachment.sig>

Craig Topper via llvm-dev

2019-Feb-09 19:05 UTC

head link

[llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?

Something like this should work I think.

; ModuleID = 'test.ll'
source_filename = "test.ll"

define void @entry(<4 x i32>* %a, <4 x i32>* %b, <4 x i32>*
%x) {
Entry:
  %tmp = load <4 x i32>, <4 x i32>* %a, align 16
  %tmp1 = load <4 x i32>, <4 x i32>* %b, align 16
  %tmp2 = add <4 x i32> %tmp, %tmp1
  %tmpsign = icmp slt <4 x i32> %tmp, zeroinitializer
  %tmp1sign = icmp slt <4 x i32> %tmp1, zeroinitializer
  %sumsign = icmp slt <4 x i32> %tmp2, zeroinitializer
  %signsequal = icmp eq <4 x i1> %tmpsign, %tmp1sign
  %summismatch = icmp ne <4 x i1> %sumsign, %tmpsign
  %overflow = and <4 x i1> %signsequal, %summismatch
  %tmp5 = bitcast <4 x i1> %overflow to i4
  %tmp6 = icmp ne i4 %tmp5, 0
  br i1 %tmp6, label %OverflowFail, label %OverflowOk

OverflowFail:                                     ; preds = %Entry
  tail call fastcc void @panic()
  unreachable

OverflowOk:                                       ; preds = %Entry
  store <4 x i32> %tmp2, <4 x i32>* %x, align 16
  ret void
}

declare fastcc void @panic()


~Craig


On Sat, Feb 9, 2019 at 10:05 AM Andrew Kelley <andrew at ziglang.org>
wrote:
> >>     On Sat, Feb 9, 2019 at 1:42 AM Craig Topper via llvm-dev
> >>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> >>
> >>         I don't think I understand your pseudocode using
> >>         llvm.experimental.vector.reduce.umax. All of the types you
> >>         showed are scalar, but that intrinsic doesn't work on
scalars
> >>         so I'm having a hard time understanding what
you're trying to
> >>         do with it. llvm.experimental.vector.reduce.umax takes a
> >>         vector input and returns a scalar result. Are you wanting
to
> >>         find if any of the additions overflowed or a mask of which
> >>         addition overflowed?
>
> Apologies for the confusion - let me try to clarify. Here is frontend
> code that works now:
>
> export fn entry() void {
>     var a: @Vector(4, i32) = []i32{ 1, 2, 3, 4 };
>     var b: @Vector(4, i32) = []i32{ 5, 6, 7, 8 };
>     var x = a +% b;
> }
>
> This generates the following LLVM IR code:
>
> define void @entry() #2 !dbg !41 {
> Entry:
>   %a = alloca <4 x i32>, align 16
>   %b = alloca <4 x i32>, align 16
>   %x = alloca <4 x i32>, align 16
>   store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x
i32>* %a, align 16,
> !dbg !55
>   call void @llvm.dbg.declare(metadata <4 x i32>* %a, metadata !45,
> metadata !DIExpression()), !dbg !55
>   store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x
i32>* %b, align 16,
> !dbg !56
>   call void @llvm.dbg.declare(metadata <4 x i32>* %b, metadata !51,
> metadata !DIExpression()), !dbg !56
>   %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57
>   %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58
>   %2 = add <4 x i32> %0, %1, !dbg !59
>   store <4 x i32> %2, <4 x i32>* %x, align 16, !dbg !60
>   call void @llvm.dbg.declare(metadata <4 x i32>* %x, metadata !53,
> metadata !DIExpression()), !dbg !60
>   ret void, !dbg !61
> }
>
> However I used the +% operator, which in Zig is wrapping addition. Now I
> want to implement the + operator for vectors, which Zig defines to panic
> if any of the elements overflowed. Here is how the IR could look for this:
>
> define void @entry() #2 !dbg !41 {
> Entry:
>   %a = alloca <4 x i32>, align 16
>   %b = alloca <4 x i32>, align 16
>   %x = alloca <4 x i32>, align 16
>   store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x
i32>* %a, align 16,
> !dbg !55
>   store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x
i32>* %b, align 16,
> !dbg !56
>   %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57
>   %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58
>   %2 = call { <4 x i32>, <4 x i1> }
@llvm.sadd.with.overflow.i32(i32 %0,
> i32 %1)
>   %3 = extractvalue { <4 x i32>, <4 x i1> } %2, 0, !dbg !56
>   %4 = extractvalue { <4 x i32>, <4 x i1> } %2, 1, !dbg !56
>   %5 = call i1 @llvm.experimental.vector.reduce.umax.i1.v4i1(%4)
>   br i1 %5, label %OverflowFail, label %OverflowOk, !dbg !56
>
> OverflowFail:                                     ; preds = %Entry
>   tail call fastcc void @panic(%"[]u8"* @2, %StackTrace* null),
!dbg !56
>   unreachable, !dbg !56
>
> OverflowOk:                                       ; preds = %Entry
>   store <4 x i32> %3, <4 x i32>* %x, align 16, !dbg !60
>   ret void, !dbg !61
> }
>
> You can see that it depends on @llvm.sadd.with.overflow working on
> vector types, and it relies on @llvm.experimental.vector.reduce.umax. I
> will note that my strategy with sign extension and icmp would be a
> semantically equivalent alternative to @llvm.sadd.with.overflow.
>
> On 2/9/19 12:37 PM, Nikita Popov wrote:
> > On Sat, Feb 9, 2019 at 6:25 PM Simon Pilgrim <llvm-dev at
redking.me.uk
> > <mailto:llvm-dev at redking.me.uk>> wrote:
> >     Regarding the reduction functions - I think the integer intrinsics
> >     at least are relatively stable and we can probably investigate
> >     dropping the experimental tag before the next release (assuming
> >     someone has the time to take on the work) - it'd be nice to
have the
> >     SLP vectorizer emit reduction intrinsics directly for these.
> >
> > The vector reduction intrinsics still need quite a lot of work. Apart
> > from SplitVecOp, all legalizations are currently missing. This is only
> > noticeable on AArch64 right now, because all other targets expand
vector
> > reductions prior to codegen.
>
> My follow-up question, then, is this:
>
> What do you recommend, in terms of LLVM IR, in order to obtain the %5
> value above?
>
> Thanks for the help,
> Andrew
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190209/dc04a888/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Feb 2019 - how experimental are the llvm.experimental.vector.reduce.* functions?

[llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?

[llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?

[llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?

Possibly Parallel Threads