I have been working on a patch to add support for max/min reductions in LoopVectorize. One of the comments that came up in review is that the implementation could be simplified (and less fragile) if max and min intrinsics were recognized rather than looking for compare-select sequences. The suggestion was to change compare-selects into max and min intrinsic calls during instcombine. The intrinsics to add are: declare iN llvm.{smin,smax}.iN(iN %a, iN %b) declare iN llvm.{umin,umax}.iN(iN %a, iN %b) declare fN llvm.{fmin,fmax}.fN(fN %a, fN %b) What does the community think? Paul
Redmond, Paul wrote:> I have been working on a patch to add support for max/min reductions in LoopVectorize. One of the comments that came up in review is that the implementation could be simplified (and less fragile) if max and min intrinsics were recognized rather than looking for compare-select sequences. > > The suggestion was to change compare-selects into max and min intrinsic calls during instcombine.+1 Sebastian -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Hi Paul, On 05/12/12 17:26, Redmond, Paul wrote:> I have been working on a patch to add support for max/min reductions in LoopVectorize. One of the comments that came up in review is that the implementation could be simplified (and less fragile) if max and min intrinsics were recognized rather than looking for compare-select sequences. > > The suggestion was to change compare-selects into max and min intrinsic calls during instcombine. > > The intrinsics to add are: > declare iN llvm.{smin,smax}.iN(iN %a, iN %b) > declare iN llvm.{umin,umax}.iN(iN %a, iN %b) > declare fN llvm.{fmin,fmax}.fN(fN %a, fN %b) > > What does the community think?it seems reasonable to me. Ciao, Duncan.
Min/max certainly makes Loop Nest Optimization (including the innermost loop vectorization) lots easier. However, I like they are "lowered" in lower level scalar opt (sopt). I kinda feel "raw" instructions is bit easier than integrated instruction to optimized in sopt, and the "raw" instruction could expose more opportunities. e.g. In the following snippet, if we break the max() into "raw" instruction, the cost of comparison is reduced thanks to the CSE, and it also reveals that more often than not, Z hold value of min_v + 2. However, max() obscure this info. ----------------------------------------------------------------------------- if (min_v > max_v) { // the branch is highly biased. stuff... t = max(min_v, max_v); z = t + 2; ----------------------------------------------------------------------------- Similar arguments for FMA formation, saturation add/sub recognition etc, etc etc... IMHO, If some passes need to recognize these pattern, they are better proactively call some functions to recognize them, and right after the passes lower them back to the "raw" form if the downstream passes don't like these integrated instructions. On 12/5/12 8:26 AM, Redmond, Paul wrote:> I have been working on a patch to add support for max/min reductions in LoopVectorize. One of the comments that came up in review is that the implementation could be simplified (and less fragile) if max and min intrinsics were recognized rather than looking for compare-select sequences. > > The suggestion was to change compare-selects into max and min intrinsic calls during instcombine. > > The intrinsics to add are: > declare iN llvm.{smin,smax}.iN(iN %a, iN %b) > declare iN llvm.{umin,umax}.iN(iN %a, iN %b) > declare fN llvm.{fmin,fmax}.fN(fN %a, fN %b) > > What does the community think? > > Paul > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Dec 5, 2012, at 8:26 AM, "Redmond, Paul" <paul.redmond at intel.com> wrote:> I have been working on a patch to add support for max/min reductions in LoopVectorize. One of the comments that came up in review is that the implementation could be simplified (and less fragile) if max and min intrinsics were recognized rather than looking for compare-select sequences. > > The suggestion was to change compare-selects into max and min intrinsic calls during instcombine. > > The intrinsics to add are: > declare iN llvm.{smin,smax}.iN(iN %a, iN %b) > declare iN llvm.{umin,umax}.iN(iN %a, iN %b) > declare fN llvm.{fmin,fmax}.fN(fN %a, fN %b) > > What does the community think?It seems inevitable. For the floating point version, please make it very clear what the behavior of max(-0,+0) and related cases are. This also means stuff that matches compare/select idioms (e.g. llvm/Support/PatternMatch.h) will need to be updated. -Chris
> It seems inevitable. For the floating point version, please make it very > clear what the behavior of max(-0,+0) and related cases are.Along these lines, AArch64 has an instruction "FMAXNM". It returns the maximum if neither value is NaN, but returns the number if just one value is NaN. This is in addition to an "FMAX" which propagates NaNs. I suspect you'll just want to consider this as an "oh yes, make sure that the result is NaN if either input is" advisory notice, but I haven't actually thought through the details of implementation yet. Tim.
On Wednesday, December 05, 2012 at 2:48 PM, Chris Lattner wrote:> > What does the community think? > > It seems inevitable. For the floating point version, please make it very clear > what the behavior of max(-0,+0) and related cases are.The following is our current proposal for llvm.fmax/fmin.*: [1] If exactly one argument is a NaN, the intrinsic returns the other argument. [2] If both arguments are NaN, the intrinsic returns a NaN. [3] An SNaN may behave as a QNaN. [4] If the arguments compare equal, the intrinsic returns a value that compares equal to both arguments. [5] Otherwise, the intrinsic returns the greater/lesser of the two arguments. Rationale and notes: Points [1] and [2] match the C/Posix library functions' specs. Point [3] matches the OpenCL library functions, and may permit some implementations to test for NaNs less expensively. Point [4] accounts for fmax(-0,+0) in IEEE 754 arithmetic, and any similar cases that might exist in other systems (LLVM needs a VAX backend). IEEE specifies that comparisons ignore the sign of zero, so requiring fmax to order ±0 would be expensive on many systems, and is not necessary to support common library functions. The intrinsics can replace calls to the C and OpenCL library functions. The intrinsics can be implemented as calls to the C or OpenCL library functions. They can also be implemented by IEEE 754 maxNum()/minNum() operations (but not vice versa). The intrinsics are not equivalent to an fcmp/select sequence. -- Kevin Schoedel, Software Developer, Intel of Canada <kevin.p.schoedel at intel.com> +1 (519) 772-2580 Disclaimer: the above just might possibly contain a statement that is not an official opinion of Intel.