Hi all, The patches linked below introduce a new family of intrinsics, for integer saturation: @llvm.usat, and @llvm.ssat (unsigned/signed). Quoting the added documentation: %r = call i32 @llvm.ssat.i32(i32 %x, i32 %n) is equivalent to the expression min(max(x, -2^(n-1)), 2^(n-1)-1), itself implementable as the following IR: %min_sint_n = i32 ... ; the min. signed integer of bitwidth n, -2^(n-1) %max_sint_n = i32 ... ; the max. signed integer of bitwidth n, 2^(n-1)-1 %0 = icmp slt i32 %x, %min_sint_n %1 = select i1 %0, i32 %min_sint_n, i32 %x %2 = icmp sgt i32 %1, %max_sint_n %r = select i1 %2, i32 %max_sint_n, i32 %1 As a starting point, here are two patches: - http://reviews.llvm.org/D6976 Add Integer Saturation Intrinsics. - http://reviews.llvm.org/D6977 [CodeGen] Add legalization for Integer Saturation Intrinsics.>From there, we can generate several new instructions, more efficientthan their expanded counterpart. Locally, I have worked on: - ARM: the SSAT/USAT instructions (scalar) - AArch64: the SQ/UQ ADD/SUB AArch64 instructions (vector/scalar saturating arithmetic) - X86: PACK SS/US (vector, saturate+truncate) - X86: PADD/SUB S/US (vector, saturating arithmetic) Anyway, let's first agree on the intrinsics, so that further development is done on trunk. Thanks! -Ahmed
At a very high level, why do we need these intrinsics? What is the use case? What are typical values for N? Have you looked at just generating the conditional IR and then pattern matching late? What's the performance trade off involved? My default position is that we shouldn't add these unless there is a compelling use case. We *still* haven't gotten the *.with.overflow intrinsics piped through everywhere in the optimizer. I'm reluctant to keep adding new ones. Philip On 01/14/2015 02:08 PM, Ahmed Bougacha wrote:> Hi all, > > The patches linked below introduce a new family of intrinsics, for > integer saturation: @llvm.usat, and @llvm.ssat (unsigned/signed). > Quoting the added documentation: > > %r = call i32 @llvm.ssat.i32(i32 %x, i32 %n) > > is equivalent to the expression min(max(x, -2^(n-1)), 2^(n-1)-1), itself > implementable as the following IR: > > %min_sint_n = i32 ... ; the min. signed integer of bitwidth n, -2^(n-1) > %max_sint_n = i32 ... ; the max. signed integer of bitwidth n, 2^(n-1)-1 > %0 = icmp slt i32 %x, %min_sint_n > %1 = select i1 %0, i32 %min_sint_n, i32 %x > %2 = icmp sgt i32 %1, %max_sint_n > %r = select i1 %2, i32 %max_sint_n, i32 %1 > > > As a starting point, here are two patches: > - http://reviews.llvm.org/D6976 Add Integer Saturation Intrinsics. > - http://reviews.llvm.org/D6977 [CodeGen] Add legalization for > Integer Saturation Intrinsics. > > From there, we can generate several new instructions, more efficient > than their expanded counterpart. Locally, I have worked on: > - ARM: the SSAT/USAT instructions (scalar) > - AArch64: the SQ/UQ ADD/SUB AArch64 instructions (vector/scalar > saturating arithmetic) > - X86: PACK SS/US (vector, saturate+truncate) > - X86: PADD/SUB S/US (vector, saturating arithmetic) > > Anyway, let's first agree on the intrinsics, so that further > development is done on trunk. > > Thanks! > -Ahmed > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Thu, Jan 15, 2015 at 12:42 AM, Philip Reames <listmail at philipreames.com> wrote:> At a very high level, why do we need these intrinsics?In short, to catch sequences you can't catch in the SelectionDAG.> What is the use case? What are typical values for N?Typically, you get this from (a little overlapping) compression, DSP, or pixel-handling code. Off the top of my head, this occurs in paq8p in the test-suite, as well as a few other tests. You'd have something like: a = x + y; if (a < -128) a = -128; if (a > 127) a = 127;> Have you looked at just generating the conditional IR and then pattern > matching late? What's the performance trade off involved?That's a valid concern. The original problem is, we can't catch this kind of thing in the SelectionDAG, because we're limited by a single basic block. I guess we could (and I gather that's the alternative you're presenting?) canonicalize the control flow to the 2icmp+2select sequence, but I wasn't sure that was "workable". Truth be told, I didn't investigate this very thoroughly, as I didn't expect reluctance on adding intrinsics! I'll look into it some more: avoid adding the intrinsic, keep the codegen additions as is, match the pattern in CGP instead of InstCombines.> My default position is that we shouldn't add these unless there is a > compelling use case. We *still* haven't gotten the *.with.overflow > intrinsics piped through everywhere in the optimizer. I'm reluctant to keep > adding new ones.I get your point, and agree with the general idea, but I think saturation is easier to reason about so wouldn't be as big a problem. At least it was pretty fast to implement the usual suspects: vectorizability, known bits, etc.. Do you have a specific part in mind? Anyway, thanks for the feedback! -Ahmed> Philip > > > On 01/14/2015 02:08 PM, Ahmed Bougacha wrote: >> >> Hi all, >> >> The patches linked below introduce a new family of intrinsics, for >> integer saturation: @llvm.usat, and @llvm.ssat (unsigned/signed). >> Quoting the added documentation: >> >> %r = call i32 @llvm.ssat.i32(i32 %x, i32 %n) >> >> is equivalent to the expression min(max(x, -2^(n-1)), 2^(n-1)-1), itself >> implementable as the following IR: >> >> %min_sint_n = i32 ... ; the min. signed integer of bitwidth n, >> -2^(n-1) >> %max_sint_n = i32 ... ; the max. signed integer of bitwidth n, >> 2^(n-1)-1 >> %0 = icmp slt i32 %x, %min_sint_n >> %1 = select i1 %0, i32 %min_sint_n, i32 %x >> %2 = icmp sgt i32 %1, %max_sint_n >> %r = select i1 %2, i32 %max_sint_n, i32 %1 >> >> >> As a starting point, here are two patches: >> - http://reviews.llvm.org/D6976 Add Integer Saturation Intrinsics. >> - http://reviews.llvm.org/D6977 [CodeGen] Add legalization for >> Integer Saturation Intrinsics. >> >> From there, we can generate several new instructions, more efficient >> than their expanded counterpart. Locally, I have worked on: >> - ARM: the SSAT/USAT instructions (scalar) >> - AArch64: the SQ/UQ ADD/SUB AArch64 instructions (vector/scalar >> saturating arithmetic) >> - X86: PACK SS/US (vector, saturate+truncate) >> - X86: PADD/SUB S/US (vector, saturating arithmetic) >> >> Anyway, let's first agree on the intrinsics, so that further >> development is done on trunk. >> >> Thanks! >> -Ahmed >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
A couple of questions: 1) Should this really be an intrinsic and not a flag on add? The add instruction already allows overflow to be either undefined or defined to wrap. Making it defined to saturate seems a natural extension. 2) How do you imagine this being used and what are the guarantees for sequences of operations with respect to optimisation? If I do a+b-c (or +c where c is negative), and a+b would saturate, but a+(b-c) would not, then is it allowed for an optimiser to generate the second rather than the first? If it's an intrinsic that's opaque to optimisers, then that's not a problem for correctness, but then you'll miss some potentially beneficial optimisations. David> On 14 Jan 2015, at 22:08, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote: > > Hi all, > > The patches linked below introduce a new family of intrinsics, for > integer saturation: @llvm.usat, and @llvm.ssat (unsigned/signed). > Quoting the added documentation: > > %r = call i32 @llvm.ssat.i32(i32 %x, i32 %n) > > is equivalent to the expression min(max(x, -2^(n-1)), 2^(n-1)-1), itself > implementable as the following IR: > > %min_sint_n = i32 ... ; the min. signed integer of bitwidth n, -2^(n-1) > %max_sint_n = i32 ... ; the max. signed integer of bitwidth n, 2^(n-1)-1 > %0 = icmp slt i32 %x, %min_sint_n > %1 = select i1 %0, i32 %min_sint_n, i32 %x > %2 = icmp sgt i32 %1, %max_sint_n > %r = select i1 %2, i32 %max_sint_n, i32 %1 > > > As a starting point, here are two patches: > - http://reviews.llvm.org/D6976 Add Integer Saturation Intrinsics. > - http://reviews.llvm.org/D6977 [CodeGen] Add legalization for > Integer Saturation Intrinsics. > > From there, we can generate several new instructions, more efficient > than their expanded counterpart. Locally, I have worked on: > - ARM: the SSAT/USAT instructions (scalar) > - AArch64: the SQ/UQ ADD/SUB AArch64 instructions (vector/scalar > saturating arithmetic) > - X86: PACK SS/US (vector, saturate+truncate) > - X86: PADD/SUB S/US (vector, saturating arithmetic) > > Anyway, let's first agree on the intrinsics, so that further > development is done on trunk. > > Thanks! > -Ahmed > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Thu, Jan 15, 2015 at 2:33 AM, David Chisnall <David.Chisnall at cl.cam.ac.uk> wrote:> A couple of questions: > > 1) Should this really be an intrinsic and not a flag on add? The add > instruction already allows overflow to be either undefined or defined to > wrap. Making it defined to saturate seems a natural extension. >I don't think this should be a flag on add. Flags are designed such that the middle-end may be ignorant of them and nothing bad might happen, it is always safe to ignore or drop flags when doing so is convenient (for a concrete example, take a look at reassociate). In this case, the saturating nature of the operation does not seem like something that can be safely ignored.> > 2) How do you imagine this being used and what are the guarantees for > sequences of operations with respect to optimisation? If I do a+b-c (or +c > where c is negative), and a+b would saturate, but a+(b-c) would not, then > is it allowed for an optimiser to generate the second rather than the > first? If it's an intrinsic that's opaque to optimisers, then that's not a > problem for correctness, but then you'll miss some potentially beneficial > optimisations. > > David > > > On 14 Jan 2015, at 22:08, Ahmed Bougacha <ahmed.bougacha at gmail.com> > wrote: > > > > Hi all, > > > > The patches linked below introduce a new family of intrinsics, for > > integer saturation: @llvm.usat, and @llvm.ssat (unsigned/signed). > > Quoting the added documentation: > > > > %r = call i32 @llvm.ssat.i32(i32 %x, i32 %n) > > > > is equivalent to the expression min(max(x, -2^(n-1)), 2^(n-1)-1), itself > > implementable as the following IR: > > > > %min_sint_n = i32 ... ; the min. signed integer of bitwidth n, > -2^(n-1) > > %max_sint_n = i32 ... ; the max. signed integer of bitwidth n, > 2^(n-1)-1 > > %0 = icmp slt i32 %x, %min_sint_n > > %1 = select i1 %0, i32 %min_sint_n, i32 %x > > %2 = icmp sgt i32 %1, %max_sint_n > > %r = select i1 %2, i32 %max_sint_n, i32 %1 > > > > > > As a starting point, here are two patches: > > - http://reviews.llvm.org/D6976 Add Integer Saturation Intrinsics. > > - http://reviews.llvm.org/D6977 [CodeGen] Add legalization for > > Integer Saturation Intrinsics. > > > > From there, we can generate several new instructions, more efficient > > than their expanded counterpart. Locally, I have worked on: > > - ARM: the SSAT/USAT instructions (scalar) > > - AArch64: the SQ/UQ ADD/SUB AArch64 instructions (vector/scalar > > saturating arithmetic) > > - X86: PACK SS/US (vector, saturate+truncate) > > - X86: PADD/SUB S/US (vector, saturating arithmetic) > > > > Anyway, let's first agree on the intrinsics, so that further > > development is done on trunk. > > > > Thanks! > > -Ahmed > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150115/e29c2894/attachment.html>
I would recommend that such optimizations not be done. At least where I have seen saturation arithmetic done before is with thing like digital signal processing. In that cse, the programmers will generally be very careful to craft exactly what they want... efficiently. On 1/15/15 5:33 AM, David Chisnall wrote:> 2) How do you imagine this being used and what are the guarantees for sequences of operations with respect to optimisation? If I do a+b-c (or +c where c is negative), and a+b would saturate, but a+(b-c) would not, then is it allowed for an optimiser to generate the second rather than the first? If it's an intrinsic that's opaque to optimisers, then that's not a problem for correctness, but then you'll miss some potentially beneficial optimisations. > > David > >> On 14 Jan 2015, at 22:08, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote: >> >> Hi all, >> >> The patches linked below introduce a new family of intrinsics, for >> integer saturation: @llvm.usat, and @llvm.ssat (unsigned/signed). >> Quoting the added documentation: >> >>