thr3ads.net - llvm dev - [LLVMdev] [RFC] Integer Saturation Intrinsics [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Ahmed Bougacha

2015-Jan-14 22:08 UTC

[LLVMdev] [RFC] Integer Saturation Intrinsics

Hi all,

The patches linked below introduce a new family of intrinsics, for
integer saturation: @llvm.usat, and @llvm.ssat (unsigned/signed).
Quoting the added documentation:

      %r = call i32 @llvm.ssat.i32(i32 %x, i32 %n)

is equivalent to the expression min(max(x, -2^(n-1)), 2^(n-1)-1), itself
implementable as the following IR:

      %min_sint_n = i32 ... ; the min. signed integer of bitwidth n, -2^(n-1)
      %max_sint_n = i32 ... ; the max. signed integer of bitwidth n, 2^(n-1)-1
      %0 = icmp slt i32 %x, %min_sint_n
      %1 = select i1 %0, i32 %min_sint_n, i32 %x
      %2 = icmp sgt i32 %1, %max_sint_n
      %r = select i1 %2, i32 %max_sint_n, i32 %1


As a starting point, here are two patches:
- http://reviews.llvm.org/D6976  Add Integer Saturation Intrinsics.
- http://reviews.llvm.org/D6977  [CodeGen] Add legalization for
Integer Saturation Intrinsics.
>From there, we can generate several new instructions, more efficientthan their expanded counterpart.  Locally, I have worked on:
- ARM: the SSAT/USAT instructions (scalar)
- AArch64: the SQ/UQ ADD/SUB AArch64 instructions (vector/scalar
saturating arithmetic)
- X86: PACK SS/US (vector, saturate+truncate)
- X86: PADD/SUB S/US (vector, saturating arithmetic)

Anyway, let's first agree on the intrinsics, so that further
development is done on trunk.

Thanks!
-Ahmed

Philip Reames

2015-Jan-14 23:42 UTC

head link

[LLVMdev] [RFC] Integer Saturation Intrinsics

At a very high level, why do we need these intrinsics?

What is the use case?  What are typical values for N?

Have you looked at just generating the conditional IR and then pattern 
matching late?  What's the performance trade off involved?

My default position is that we shouldn't add these unless there is a 
compelling use case.  We *still* haven't gotten the *.with.overflow 
intrinsics piped through everywhere in the optimizer.  I'm reluctant to 
keep adding new ones.

Philip

On 01/14/2015 02:08 PM, Ahmed Bougacha wrote:> Hi all,
>
> The patches linked below introduce a new family of intrinsics, for
> integer saturation: @llvm.usat, and @llvm.ssat (unsigned/signed).
> Quoting the added documentation:
>
>        %r = call i32 @llvm.ssat.i32(i32 %x, i32 %n)
>
> is equivalent to the expression min(max(x, -2^(n-1)), 2^(n-1)-1), itself
> implementable as the following IR:
>
>        %min_sint_n = i32 ... ; the min. signed integer of bitwidth n,
-2^(n-1)
>        %max_sint_n = i32 ... ; the max. signed integer of bitwidth n,
2^(n-1)-1
>        %0 = icmp slt i32 %x, %min_sint_n
>        %1 = select i1 %0, i32 %min_sint_n, i32 %x
>        %2 = icmp sgt i32 %1, %max_sint_n
>        %r = select i1 %2, i32 %max_sint_n, i32 %1
>
>
> As a starting point, here are two patches:
> - http://reviews.llvm.org/D6976  Add Integer Saturation Intrinsics.
> - http://reviews.llvm.org/D6977  [CodeGen] Add legalization for
> Integer Saturation Intrinsics.
>
>  From there, we can generate several new instructions, more efficient
> than their expanded counterpart.  Locally, I have worked on:
> - ARM: the SSAT/USAT instructions (scalar)
> - AArch64: the SQ/UQ ADD/SUB AArch64 instructions (vector/scalar
> saturating arithmetic)
> - X86: PACK SS/US (vector, saturate+truncate)
> - X86: PADD/SUB S/US (vector, saturating arithmetic)
>
> Anyway, let's first agree on the intrinsics, so that further
> development is done on trunk.
>
> Thanks!
> -Ahmed
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Ahmed Bougacha

2015-Jan-15 00:16 UTC

head link

[LLVMdev] [RFC] Integer Saturation Intrinsics

On Thu, Jan 15, 2015 at 12:42 AM, Philip Reames
<listmail at philipreames.com> wrote:> At a very high level, why do we need these intrinsics?
In short, to catch sequences you can't catch in the SelectionDAG.
> What is the use case?  What are typical values for N?
Typically, you get this from (a little overlapping) compression, DSP,
or pixel-handling code.
Off the top of my head, this occurs in paq8p in the test-suite, as
well as a few other tests.

You'd have something like:
    a = x + y;
    if (a < -128)
      a = -128;
    if (a > 127)
      a = 127;
> Have you looked at just generating the conditional IR and then pattern
> matching late?  What's the performance trade off involved?
That's a valid concern.  The original problem is, we can't catch this
kind of thing in the SelectionDAG, because we're limited by a single
basic block.  I guess we could (and I gather that's the alternative
you're presenting?) canonicalize the control flow to the 2icmp+2select
sequence, but I wasn't sure that was "workable".  Truth be told, I
didn't investigate this very thoroughly, as I didn't expect reluctance
on adding intrinsics!  I'll look into it some more: avoid adding the
intrinsic, keep the codegen additions as is, match the pattern in CGP
instead of InstCombines.
> My default position is that we shouldn't add these unless there is a
> compelling use case.  We *still* haven't gotten the *.with.overflow
> intrinsics piped through everywhere in the optimizer.  I'm reluctant to
keep
> adding new ones.
I get your point, and agree with the general idea, but I think
saturation is easier to reason about so wouldn't be as big a problem.
At least it was pretty fast to implement the usual suspects:
vectorizability, known bits, etc..  Do you have a specific part in
mind?

Anyway, thanks for the feedback!

-Ahmed
> Philip
>
>
> On 01/14/2015 02:08 PM, Ahmed Bougacha wrote:
>>
>> Hi all,
>>
>> The patches linked below introduce a new family of intrinsics, for
>> integer saturation: @llvm.usat, and @llvm.ssat (unsigned/signed).
>> Quoting the added documentation:
>>
>>        %r = call i32 @llvm.ssat.i32(i32 %x, i32 %n)
>>
>> is equivalent to the expression min(max(x, -2^(n-1)), 2^(n-1)-1),
itself
>> implementable as the following IR:
>>
>>        %min_sint_n = i32 ... ; the min. signed integer of bitwidth n,
>> -2^(n-1)
>>        %max_sint_n = i32 ... ; the max. signed integer of bitwidth n,
>> 2^(n-1)-1
>>        %0 = icmp slt i32 %x, %min_sint_n
>>        %1 = select i1 %0, i32 %min_sint_n, i32 %x
>>        %2 = icmp sgt i32 %1, %max_sint_n
>>        %r = select i1 %2, i32 %max_sint_n, i32 %1
>>
>>
>> As a starting point, here are two patches:
>> - http://reviews.llvm.org/D6976  Add Integer Saturation Intrinsics.
>> - http://reviews.llvm.org/D6977  [CodeGen] Add legalization for
>> Integer Saturation Intrinsics.
>>
>>  From there, we can generate several new instructions, more efficient
>> than their expanded counterpart.  Locally, I have worked on:
>> - ARM: the SSAT/USAT instructions (scalar)
>> - AArch64: the SQ/UQ ADD/SUB AArch64 instructions (vector/scalar
>> saturating arithmetic)
>> - X86: PACK SS/US (vector, saturate+truncate)
>> - X86: PADD/SUB S/US (vector, saturating arithmetic)
>>
>> Anyway, let's first agree on the intrinsics, so that further
>> development is done on trunk.
>>
>> Thanks!
>> -Ahmed
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

David Chisnall

2015-Jan-15 10:33 UTC

head link

[LLVMdev] [RFC] Integer Saturation Intrinsics

A couple of questions:

1) Should this really be an intrinsic and not a flag on add?  The add
instruction already allows overflow to be either undefined or defined to wrap. 
Making it defined to saturate seems a natural extension.

2) How do you imagine this being used and what are the guarantees for sequences
of operations with respect to optimisation?  If I do a+b-c (or +c where c is
negative), and a+b would saturate, but a+(b-c) would not, then is it allowed for
an optimiser to generate the second rather than the first?  If it's an
intrinsic that's opaque to optimisers, then that's not a problem for
correctness, but then you'll miss some potentially beneficial optimisations.

David
> On 14 Jan 2015, at 22:08, Ahmed Bougacha <ahmed.bougacha at
gmail.com> wrote:
> 
> Hi all,
> 
> The patches linked below introduce a new family of intrinsics, for
> integer saturation: @llvm.usat, and @llvm.ssat (unsigned/signed).
> Quoting the added documentation:
> 
>      %r = call i32 @llvm.ssat.i32(i32 %x, i32 %n)
> 
> is equivalent to the expression min(max(x, -2^(n-1)), 2^(n-1)-1), itself
> implementable as the following IR:
> 
>      %min_sint_n = i32 ... ; the min. signed integer of bitwidth n,
-2^(n-1)
>      %max_sint_n = i32 ... ; the max. signed integer of bitwidth n,
2^(n-1)-1
>      %0 = icmp slt i32 %x, %min_sint_n
>      %1 = select i1 %0, i32 %min_sint_n, i32 %x
>      %2 = icmp sgt i32 %1, %max_sint_n
>      %r = select i1 %2, i32 %max_sint_n, i32 %1
> 
> 
> As a starting point, here are two patches:
> - http://reviews.llvm.org/D6976  Add Integer Saturation Intrinsics.
> - http://reviews.llvm.org/D6977  [CodeGen] Add legalization for
> Integer Saturation Intrinsics.
> 
> From there, we can generate several new instructions, more efficient
> than their expanded counterpart.  Locally, I have worked on:
> - ARM: the SSAT/USAT instructions (scalar)
> - AArch64: the SQ/UQ ADD/SUB AArch64 instructions (vector/scalar
> saturating arithmetic)
> - X86: PACK SS/US (vector, saturate+truncate)
> - X86: PADD/SUB S/US (vector, saturating arithmetic)
> 
> Anyway, let's first agree on the intrinsics, so that further
> development is done on trunk.
> 
> Thanks!
> -Ahmed
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

David Majnemer

2015-Jan-15 10:51 UTC

head link

[LLVMdev] [RFC] Integer Saturation Intrinsics

On Thu, Jan 15, 2015 at 2:33 AM, David Chisnall <David.Chisnall at
cl.cam.ac.uk> wrote:
> A couple of questions:
>
> 1) Should this really be an intrinsic and not a flag on add?  The add
> instruction already allows overflow to be either undefined or defined to
> wrap.  Making it defined to saturate seems a natural extension.
>
I don't think this should be a flag on add.  Flags are designed such that
the middle-end may be ignorant of them and nothing bad might happen, it is
always safe to ignore or drop flags when doing so is convenient (for a
concrete example, take a look at reassociate).

In this case, the saturating nature of the operation does not seem like
something that can be safely ignored.

>
> 2) How do you imagine this being used and what are the guarantees for
> sequences of operations with respect to optimisation?  If I do a+b-c (or +c
> where c is negative), and a+b would saturate, but a+(b-c) would not, then
> is it allowed for an optimiser to generate the second rather than the
> first?  If it's an intrinsic that's opaque to optimisers, then
that's not a
> problem for correctness, but then you'll miss some potentially
beneficial
> optimisations.
>
> David
>
> > On 14 Jan 2015, at 22:08, Ahmed Bougacha <ahmed.bougacha at
gmail.com>
> wrote:
> >
> > Hi all,
> >
> > The patches linked below introduce a new family of intrinsics, for
> > integer saturation: @llvm.usat, and @llvm.ssat (unsigned/signed).
> > Quoting the added documentation:
> >
> >      %r = call i32 @llvm.ssat.i32(i32 %x, i32 %n)
> >
> > is equivalent to the expression min(max(x, -2^(n-1)), 2^(n-1)-1),
itself
> > implementable as the following IR:
> >
> >      %min_sint_n = i32 ... ; the min. signed integer of bitwidth n,
> -2^(n-1)
> >      %max_sint_n = i32 ... ; the max. signed integer of bitwidth n,
> 2^(n-1)-1
> >      %0 = icmp slt i32 %x, %min_sint_n
> >      %1 = select i1 %0, i32 %min_sint_n, i32 %x
> >      %2 = icmp sgt i32 %1, %max_sint_n
> >      %r = select i1 %2, i32 %max_sint_n, i32 %1
> >
> >
> > As a starting point, here are two patches:
> > - http://reviews.llvm.org/D6976  Add Integer Saturation Intrinsics.
> > - http://reviews.llvm.org/D6977  [CodeGen] Add legalization for
> > Integer Saturation Intrinsics.
> >
> > From there, we can generate several new instructions, more efficient
> > than their expanded counterpart.  Locally, I have worked on:
> > - ARM: the SSAT/USAT instructions (scalar)
> > - AArch64: the SQ/UQ ADD/SUB AArch64 instructions (vector/scalar
> > saturating arithmetic)
> > - X86: PACK SS/US (vector, saturate+truncate)
> > - X86: PADD/SUB S/US (vector, saturating arithmetic)
> >
> > Anyway, let's first agree on the intrinsics, so that further
> > development is done on trunk.
> >
> > Thanks!
> > -Ahmed
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150115/e29c2894/attachment.html>

Herbie Robinson

2015-Jan-15 17:39 UTC

head link

[LLVMdev] [RFC] Integer Saturation Intrinsics

I would recommend that such optimizations not be done.  At least where I 
have seen saturation arithmetic done before is with thing like digital 
signal processing.  In that cse, the programmers will generally be very 
careful to craft exactly what they want... efficiently.

On 1/15/15 5:33 AM, David Chisnall wrote:> 2) How do you imagine this being used and what are the guarantees for
sequences of operations with respect to optimisation?  If I do a+b-c (or +c
where c is negative), and a+b would saturate, but a+(b-c) would not, then is it
allowed for an optimiser to generate the second rather than the first?  If
it's an intrinsic that's opaque to optimisers, then that's not a
problem for correctness, but then you'll miss some potentially beneficial
optimisations.
>
> David
>
>> On 14 Jan 2015, at 22:08, Ahmed Bougacha <ahmed.bougacha at
gmail.com> wrote:
>>
>> Hi all,
>>
>> The patches linked below introduce a new family of intrinsics, for
>> integer saturation: @llvm.usat, and @llvm.ssat (unsigned/signed).
>> Quoting the added documentation:
>>
>>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Jan 2015 - [LLVMdev] [RFC] Integer Saturation Intrinsics

[LLVMdev] [RFC] Integer Saturation Intrinsics

[LLVMdev] [RFC] Integer Saturation Intrinsics

[LLVMdev] [RFC] Integer Saturation Intrinsics

[LLVMdev] [RFC] Integer Saturation Intrinsics

[LLVMdev] [RFC] Integer Saturation Intrinsics

[LLVMdev] [RFC] Integer Saturation Intrinsics

Reasonably Related Threads