Nikita Popov via llvm-dev
2019-Feb-09 17:37 UTC
[llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?
On Sat, Feb 9, 2019 at 6:25 PM Simon Pilgrim <llvm-dev at redking.me.uk> wrote:> The add/sub (+mul) overflow intrinsics are being updated to support > vectors to match the related add/sub saturation intrinsics. We haven't > updated the docs yet as legalization, vectorization and various minor bits > of plumbing still need to be finished before it can be officially supported > (Nikita Popov has been looking at the legalization recently). > > Regarding the reduction functions - I think the integer intrinsics at > least are relatively stable and we can probably investigate dropping the > experimental tag before the next release (assuming someone has the time to > take on the work) - it'd be nice to have the SLP vectorizer emit reduction > intrinsics directly for these. > > The floating point intrinsics are trickier as they (may) have stricter > ordering constraints that is still causing issues and may need tweaking > (e.g. see PR36734). >The vector reduction intrinsics still need quite a lot of work. Apart from SplitVecOp, all legalizations are currently missing. This is only noticeable on AArch64 right now, because all other targets expand vector reductions prior to codegen. Nikita> On 09/02/2019 16:17, Sanjay Patel wrote: > > The IR update to allow vector types was here: > https://reviews.llvm.org/D57090 > ...we didn't update the docs at that time because it was not clear what > the backend would do with that, but that might've changed with some of the > more recent patches. > > On Sat, Feb 9, 2019 at 1:42 AM Craig Topper via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> I don't think I understand your pseudocode using >> llvm.experimental.vector.reduce.umax. All of the types you showed are >> scalar, but that intrinsic doesn't work on scalars so I'm having a hard >> time understanding what you're trying to do with it. >> llvm.experimental.vector.reduce.umax takes a vector input and returns a >> scalar result. Are you wanting to find if any of the additions overflowed >> or a mask of which addition overflowed? >> >> The sadd.with.overflow intrinsics are in the process of gaining vector >> support if not already complete. Simon Pilgrim made some commits recently. >> I know the documentation in the LangRef hasn't been updated. It will return >> a <X x i1> vector for overflow instead i1 when vectors are used. >> >> ~Craig >> >> >> On Fri, Feb 8, 2019 at 11:03 PM Andrew Kelley via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> I'm interested in using @llvm.experimental.vector.reduce.smax/umax to >>> implement runtime overflow checking for vectors. Here's an example >>> checked addition, without vectors, and then I'll follow the example with >>> what I would do for checked addition with vectors. >>> >>> Frontend code (zig): >>> >>> export fn entry() void { >>> var a: i32 = 1; >>> var b: i32 = 2; >>> var x = a + b; >>> } >>> >>> LLVM IR code: >>> >>> define void @entry() #2 !dbg !41 { >>> Entry: >>> %a = alloca i32, align 4 >>> %b = alloca i32, align 4 >>> %x = alloca i32, align 4 >>> store i32 1, i32* %a, align 4, !dbg !52 >>> call void @llvm.dbg.declare(metadata i32* %a, metadata !45, metadata >>> !DIExpression()), !dbg !52 >>> store i32 2, i32* %b, align 4, !dbg !53 >>> call void @llvm.dbg.declare(metadata i32* %b, metadata !48, metadata >>> !DIExpression()), !dbg !53 >>> %0 = load i32, i32* %a, align 4, !dbg !54 >>> %1 = load i32, i32* %b, align 4, !dbg !55 >>> %2 = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %0, i32 %1), >>> !dbg !56 >>> %3 = extractvalue { i32, i1 } %2, 0, !dbg !56 >>> %4 = extractvalue { i32, i1 } %2, 1, !dbg !56 >>> br i1 %4, label %OverflowFail, label %OverflowOk, !dbg !56 >>> >>> OverflowFail: ; preds = %Entry >>> tail call fastcc void @panic(%"[]u8"* @2, %StackTrace* null), !dbg !56 >>> unreachable, !dbg !56 >>> >>> OverflowOk: ; preds = %Entry >>> store i32 %3, i32* %x, align 4, !dbg !57 >>> call void @llvm.dbg.declare(metadata i32* %x, metadata !50, metadata >>> !DIExpression()), !dbg !57 >>> ret void, !dbg !58 >>> } >>> >>> You can see this takes advantage of @llvm.sadd.with.overflow, which is >>> not available with vectors. So here is a different approach (pseudocode): >>> >>> %a_zext = zext %a to i33 # 1 more bit >>> %b_zext = zext %b to i33 # 1 more bit >>> %result_zext = add %a_zext, %b_zext >>> %max_result = @llvm.experimental.vector.reduce.umax(%result_zext) >>> %overflow = icmp %max_result > @max_i32_value >>> %result = trunc %result_zext to i32 >>> >>> You can imagine how this would work for signed integers, replacing zext >>> with sext and umax with smax. >>> >>> This depends on an "experimental" API. Can anyone advise on depending on >>> this API? Is it a bad idea? Is it about to be promoted to >>> non-experimental soon? Can anyone advise on how to best achieve my goal? >>> >>> Kind regards, >>> Andrew >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190209/73af8193/attachment.html>
Andrew Kelley via llvm-dev
2019-Feb-09 18:05 UTC
[llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?
>> On Sat, Feb 9, 2019 at 1:42 AM Craig Topper via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> I don't think I understand your pseudocode using >> llvm.experimental.vector.reduce.umax. All of the types you >> showed are scalar, but that intrinsic doesn't work on scalars >> so I'm having a hard time understanding what you're trying to >> do with it. llvm.experimental.vector.reduce.umax takes a >> vector input and returns a scalar result. Are you wanting to >> find if any of the additions overflowed or a mask of which >> addition overflowed?Apologies for the confusion - let me try to clarify. Here is frontend code that works now: export fn entry() void { var a: @Vector(4, i32) = []i32{ 1, 2, 3, 4 }; var b: @Vector(4, i32) = []i32{ 5, 6, 7, 8 }; var x = a +% b; } This generates the following LLVM IR code: define void @entry() #2 !dbg !41 { Entry: %a = alloca <4 x i32>, align 16 %b = alloca <4 x i32>, align 16 %x = alloca <4 x i32>, align 16 store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32>* %a, align 16, !dbg !55 call void @llvm.dbg.declare(metadata <4 x i32>* %a, metadata !45, metadata !DIExpression()), !dbg !55 store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32>* %b, align 16, !dbg !56 call void @llvm.dbg.declare(metadata <4 x i32>* %b, metadata !51, metadata !DIExpression()), !dbg !56 %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57 %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58 %2 = add <4 x i32> %0, %1, !dbg !59 store <4 x i32> %2, <4 x i32>* %x, align 16, !dbg !60 call void @llvm.dbg.declare(metadata <4 x i32>* %x, metadata !53, metadata !DIExpression()), !dbg !60 ret void, !dbg !61 } However I used the +% operator, which in Zig is wrapping addition. Now I want to implement the + operator for vectors, which Zig defines to panic if any of the elements overflowed. Here is how the IR could look for this: define void @entry() #2 !dbg !41 { Entry: %a = alloca <4 x i32>, align 16 %b = alloca <4 x i32>, align 16 %x = alloca <4 x i32>, align 16 store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32>* %a, align 16, !dbg !55 store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32>* %b, align 16, !dbg !56 %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57 %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58 %2 = call { <4 x i32>, <4 x i1> } @llvm.sadd.with.overflow.i32(i32 %0, i32 %1) %3 = extractvalue { <4 x i32>, <4 x i1> } %2, 0, !dbg !56 %4 = extractvalue { <4 x i32>, <4 x i1> } %2, 1, !dbg !56 %5 = call i1 @llvm.experimental.vector.reduce.umax.i1.v4i1(%4) br i1 %5, label %OverflowFail, label %OverflowOk, !dbg !56 OverflowFail: ; preds = %Entry tail call fastcc void @panic(%"[]u8"* @2, %StackTrace* null), !dbg !56 unreachable, !dbg !56 OverflowOk: ; preds = %Entry store <4 x i32> %3, <4 x i32>* %x, align 16, !dbg !60 ret void, !dbg !61 } You can see that it depends on @llvm.sadd.with.overflow working on vector types, and it relies on @llvm.experimental.vector.reduce.umax. I will note that my strategy with sign extension and icmp would be a semantically equivalent alternative to @llvm.sadd.with.overflow. On 2/9/19 12:37 PM, Nikita Popov wrote:> On Sat, Feb 9, 2019 at 6:25 PM Simon Pilgrim <llvm-dev at redking.me.uk > <mailto:llvm-dev at redking.me.uk>> wrote: > Regarding the reduction functions - I think the integer intrinsics > at least are relatively stable and we can probably investigate > dropping the experimental tag before the next release (assuming > someone has the time to take on the work) - it'd be nice to have the > SLP vectorizer emit reduction intrinsics directly for these. > > The vector reduction intrinsics still need quite a lot of work. Apart > from SplitVecOp, all legalizations are currently missing. This is only > noticeable on AArch64 right now, because all other targets expand vector > reductions prior to codegen.My follow-up question, then, is this: What do you recommend, in terms of LLVM IR, in order to obtain the %5 value above? Thanks for the help, Andrew -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190209/5feb6cc8/attachment.sig>
Craig Topper via llvm-dev
2019-Feb-09 19:05 UTC
[llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?
Something like this should work I think. ; ModuleID = 'test.ll' source_filename = "test.ll" define void @entry(<4 x i32>* %a, <4 x i32>* %b, <4 x i32>* %x) { Entry: %tmp = load <4 x i32>, <4 x i32>* %a, align 16 %tmp1 = load <4 x i32>, <4 x i32>* %b, align 16 %tmp2 = add <4 x i32> %tmp, %tmp1 %tmpsign = icmp slt <4 x i32> %tmp, zeroinitializer %tmp1sign = icmp slt <4 x i32> %tmp1, zeroinitializer %sumsign = icmp slt <4 x i32> %tmp2, zeroinitializer %signsequal = icmp eq <4 x i1> %tmpsign, %tmp1sign %summismatch = icmp ne <4 x i1> %sumsign, %tmpsign %overflow = and <4 x i1> %signsequal, %summismatch %tmp5 = bitcast <4 x i1> %overflow to i4 %tmp6 = icmp ne i4 %tmp5, 0 br i1 %tmp6, label %OverflowFail, label %OverflowOk OverflowFail: ; preds = %Entry tail call fastcc void @panic() unreachable OverflowOk: ; preds = %Entry store <4 x i32> %tmp2, <4 x i32>* %x, align 16 ret void } declare fastcc void @panic() ~Craig On Sat, Feb 9, 2019 at 10:05 AM Andrew Kelley <andrew at ziglang.org> wrote:> >> On Sat, Feb 9, 2019 at 1:42 AM Craig Topper via llvm-dev > >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > >> > >> I don't think I understand your pseudocode using > >> llvm.experimental.vector.reduce.umax. All of the types you > >> showed are scalar, but that intrinsic doesn't work on scalars > >> so I'm having a hard time understanding what you're trying to > >> do with it. llvm.experimental.vector.reduce.umax takes a > >> vector input and returns a scalar result. Are you wanting to > >> find if any of the additions overflowed or a mask of which > >> addition overflowed? > > Apologies for the confusion - let me try to clarify. Here is frontend > code that works now: > > export fn entry() void { > var a: @Vector(4, i32) = []i32{ 1, 2, 3, 4 }; > var b: @Vector(4, i32) = []i32{ 5, 6, 7, 8 }; > var x = a +% b; > } > > This generates the following LLVM IR code: > > define void @entry() #2 !dbg !41 { > Entry: > %a = alloca <4 x i32>, align 16 > %b = alloca <4 x i32>, align 16 > %x = alloca <4 x i32>, align 16 > store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32>* %a, align 16, > !dbg !55 > call void @llvm.dbg.declare(metadata <4 x i32>* %a, metadata !45, > metadata !DIExpression()), !dbg !55 > store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32>* %b, align 16, > !dbg !56 > call void @llvm.dbg.declare(metadata <4 x i32>* %b, metadata !51, > metadata !DIExpression()), !dbg !56 > %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57 > %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58 > %2 = add <4 x i32> %0, %1, !dbg !59 > store <4 x i32> %2, <4 x i32>* %x, align 16, !dbg !60 > call void @llvm.dbg.declare(metadata <4 x i32>* %x, metadata !53, > metadata !DIExpression()), !dbg !60 > ret void, !dbg !61 > } > > However I used the +% operator, which in Zig is wrapping addition. Now I > want to implement the + operator for vectors, which Zig defines to panic > if any of the elements overflowed. Here is how the IR could look for this: > > define void @entry() #2 !dbg !41 { > Entry: > %a = alloca <4 x i32>, align 16 > %b = alloca <4 x i32>, align 16 > %x = alloca <4 x i32>, align 16 > store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32>* %a, align 16, > !dbg !55 > store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32>* %b, align 16, > !dbg !56 > %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57 > %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58 > %2 = call { <4 x i32>, <4 x i1> } @llvm.sadd.with.overflow.i32(i32 %0, > i32 %1) > %3 = extractvalue { <4 x i32>, <4 x i1> } %2, 0, !dbg !56 > %4 = extractvalue { <4 x i32>, <4 x i1> } %2, 1, !dbg !56 > %5 = call i1 @llvm.experimental.vector.reduce.umax.i1.v4i1(%4) > br i1 %5, label %OverflowFail, label %OverflowOk, !dbg !56 > > OverflowFail: ; preds = %Entry > tail call fastcc void @panic(%"[]u8"* @2, %StackTrace* null), !dbg !56 > unreachable, !dbg !56 > > OverflowOk: ; preds = %Entry > store <4 x i32> %3, <4 x i32>* %x, align 16, !dbg !60 > ret void, !dbg !61 > } > > You can see that it depends on @llvm.sadd.with.overflow working on > vector types, and it relies on @llvm.experimental.vector.reduce.umax. I > will note that my strategy with sign extension and icmp would be a > semantically equivalent alternative to @llvm.sadd.with.overflow. > > On 2/9/19 12:37 PM, Nikita Popov wrote: > > On Sat, Feb 9, 2019 at 6:25 PM Simon Pilgrim <llvm-dev at redking.me.uk > > <mailto:llvm-dev at redking.me.uk>> wrote: > > Regarding the reduction functions - I think the integer intrinsics > > at least are relatively stable and we can probably investigate > > dropping the experimental tag before the next release (assuming > > someone has the time to take on the work) - it'd be nice to have the > > SLP vectorizer emit reduction intrinsics directly for these. > > > > The vector reduction intrinsics still need quite a lot of work. Apart > > from SplitVecOp, all legalizations are currently missing. This is only > > noticeable on AArch64 right now, because all other targets expand vector > > reductions prior to codegen. > > My follow-up question, then, is this: > > What do you recommend, in terms of LLVM IR, in order to obtain the %5 > value above? > > Thanks for the help, > Andrew > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190209/dc04a888/attachment.html>