Hi all, I'm proposing integer saturation intrinsics. def int_ssat : Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>, llvm_i32_ty]>; def int_usat : Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>, llvm_i32_ty]>; The first operand is the integer value being saturated, and second is the saturation bit position. For scalar integer types, the semantics are: int_ssat: x < -(1 << (bit-1)) ? -(1 << (bit-1)) : (x > (1 << (bit-1))-1 ? (1 << (bit-1))-1 : x) int_usat: x < 0 ? 0 : (x > (1 << bit)-1 : (1 << bit)-1 : x) e.g. If bit is 8, the range is -128 to 127 for int_ssat; 0 to 255 for int_usat. This is useful for i16 / i32 / i64 to i8 satuation code like the following: (x < -256) ? -256 : (x > 255 ? 255 : x) (x < 0) ? 0 : (x > 255 ? 255 : x) If the source type is an integer vector type, then each element of the vector is being saturated. For ARM, these intrinsics will map to usat / ssat instructions. The semantics matches exactly. X86 doesn't have saturation instructions. However, SSE does have packed add, packed sub, and pack with saturation. So it's possible to instruction select patterns such as (int_{s|u}sat ({add|sub} x, y), c). The plan is to form calls to these intrinsics in InstCombine. Legalizer can expand these intrinsics if they are not legal. The expansion should be fairly straight forward and produce code that is at least as good as what LLVM is currently generating for these code sequence. Comments? Evan
On Fri, Jun 17, 2011 at 3:08 PM, Evan Cheng <evan.cheng at apple.com> wrote:> Hi all, > > I'm proposing integer saturation intrinsics. > > def int_ssat : Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>, llvm_i32_ty]>; > def int_usat : Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>, llvm_i32_ty]>; > > The first operand is the integer value being saturated, and second is the saturation bit position. > > For scalar integer types, the semantics are: > > int_ssat: x < -(1 << (bit-1)) ? -(1 << (bit-1)) : (x > (1 << (bit-1))-1 ? (1 << (bit-1))-1 : x) > int_usat: x < 0 ? 0 : (x > (1 << bit)-1 : (1 << bit)-1 : x) > > e.g. If bit is 8, the range is -128 to 127 for int_ssat; 0 to 255 for int_usat. This is useful for i16 / i32 / i64 to i8 satuation code like the following: > (x < -256) ? -256 : (x > 255 ? 255 : x) > (x < 0) ? 0 : (x > 255 ? 255 : x) > > If the source type is an integer vector type, then each element of the vector is being saturated. > > For ARM, these intrinsics will map to usat / ssat instructions. The semantics matches exactly. X86 doesn't have saturation instructions. However, SSE does have packed add, packed sub, and pack with saturation. So it's possible to instruction select patterns such as (int_{s|u}sat ({add|sub} x, y), c).The stated pattern simply doesn't work. A portable saturating add/subtract intrinsic might be nice given that most vector instruction sets have such an instruction, but this seems completely orthogonal.> The plan is to form calls to these intrinsics in InstCombine. Legalizer can expand these intrinsics if they are not legal. The expansion should be fairly straight forward and produce code that is at least as good as what LLVM is currently generating for these code sequence. > > Comments?Is there some reason why pattern-matching this in an ARM-specific DAGCombine doesn't work? -Eli
On Jun 17, 2011, at 3:08 PM, Evan Cheng wrote:> Hi all, > > I'm proposing integer saturation intrinsics. > > def int_ssat : Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>, llvm_i32_ty]>; > def int_usat : Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>, llvm_i32_ty]>;These will replace the existing int_arm_ssat and int_arm_usat intrinsics, right?
On Jun 17, 2011, at 3:42 PM, Eli Friedman wrote:> On Fri, Jun 17, 2011 at 3:08 PM, Evan Cheng <evan.cheng at apple.com> wrote: >> Hi all, >> >> I'm proposing integer saturation intrinsics. >> >> def int_ssat : Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>, llvm_i32_ty]>; >> def int_usat : Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>, llvm_i32_ty]>; >> >> The first operand is the integer value being saturated, and second is the saturation bit position. >> >> For scalar integer types, the semantics are: >> >> int_ssat: x < -(1 << (bit-1)) ? -(1 << (bit-1)) : (x > (1 << (bit-1))-1 ? (1 << (bit-1))-1 : x) >> int_usat: x < 0 ? 0 : (x > (1 << bit)-1 : (1 << bit)-1 : x) >> >> e.g. If bit is 8, the range is -128 to 127 for int_ssat; 0 to 255 for int_usat. This is useful for i16 / i32 / i64 to i8 satuation code like the following: >> (x < -256) ? -256 : (x > 255 ? 255 : x) >> (x < 0) ? 0 : (x > 255 ? 255 : x) >> >> If the source type is an integer vector type, then each element of the vector is being saturated. >> >> For ARM, these intrinsics will map to usat / ssat instructions. The semantics matches exactly. X86 doesn't have saturation instructions. However, SSE does have packed add, packed sub, and pack with saturation. So it's possible to instruction select patterns such as (int_{s|u}sat ({add|sub} x, y), c). > > The stated pattern simply doesn't work. A portable saturating > add/subtract intrinsic might be nice given that most vector > instruction sets have such an instruction, but this seems completely > orthogonal.Can you explain why you think the pattern (which?) would not work?> >> The plan is to form calls to these intrinsics in InstCombine. Legalizer can expand these intrinsics if they are not legal. The expansion should be fairly straight forward and produce code that is at least as good as what LLVM is currently generating for these code sequence. >> >> Comments? > > Is there some reason why pattern-matching this in an ARM-specific > DAGCombine doesn't work?It's not possible to look beyond a single BB at isel time. Evan> > -Eli
> The plan is to form calls to these intrinsics in InstCombine. Legalizer > can expand these intrinsics if they are not legal. The expansion should > be fairly straight forward and produce code that is at least as good as > what LLVM is currently generating for these code sequence.SSAT/USAT may set the Q (saturation) flag, which might cause problems for anyone relying on explicitly using saturating operations (QADD etc.) and testing the Q flag. So there are several possibilities: - you could do liveness analysis on Q and only introduce SSAT/USAT if Q is not live - you could fall back to not introducing SSAT/USAT if you could tell there was no test of Q in the function (the ARM ABI says Q is not defined across ABI-compliant function boundaries) - you could just go ahead and do it anyway (might be appropriate for a "fast and loose" optimization mode) If you can easily factor Q into LLVM's liveness analysis, that might be the best solution. (It also allows you to discard the instruction if neither its result nor the Q flag is tested - you can't normally drop Q-flag-affecting instructions as they may be executed for side effects). There are similar issues with some other ARM instructions, e.g. SMLABB. Q is a sticky bit, like the floating-point cumulative exception flags, so the issues are a bit similar (the same issue would apply if e.g. you wanted to exploit floating-point hardware for integer division in a mode where the inexact exception was valid). You have the same issues of modelling the use of machine status flags, and understanding what optimizations are/aren't legal at any program point. Al -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CPSR liveness is not a concern. The instructions will be modeled as such and isel and scheduling will do the right thing. Evan On Jun 18, 2011, at 1:48 AM, Alasdair Grant wrote:>> The plan is to form calls to these intrinsics in InstCombine. Legalizer >> can expand these intrinsics if they are not legal. The expansion should >> be fairly straight forward and produce code that is at least as good as >> what LLVM is currently generating for these code sequence. > > SSAT/USAT may set the Q (saturation) flag, which might cause problems > for anyone relying on explicitly using saturating operations (QADD etc.) > and testing the Q flag. So there are several possibilities: > > - you could do liveness analysis on Q and only introduce SSAT/USAT if > Q is not live > > - you could fall back to not introducing SSAT/USAT if you could tell there > was no test of Q in the function (the ARM ABI says Q is not defined across > ABI-compliant function boundaries) > > - you could just go ahead and do it anyway (might be appropriate for a > "fast and loose" optimization mode) > > If you can easily factor Q into LLVM's liveness analysis, that might be > the best solution. (It also allows you to discard the instruction if > neither its result nor the Q flag is tested - you can't normally drop > Q-flag-affecting instructions as they may be executed for side effects). > > There are similar issues with some other ARM instructions, e.g. SMLABB. > > Q is a sticky bit, like the floating-point cumulative exception flags, > so the issues are a bit similar (the same issue would apply if e.g. you > wanted to exploit floating-point hardware for integer division in a mode > where the inexact exception was valid). You have the same issues of > modelling the use of machine status flags, and understanding what > optimizations are/aren't legal at any program point. > > Al > > -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. >