Sanjay Patel via llvm-dev
2016-Nov-08 00:13 UTC
[llvm-dev] should we have IR intrinsics for integer min/max?
Thanks, Hal and Matt for the feedback. As usual, my instincts about canonicalization were probably wrong. :) I thought that @max1 vs. @max3 would be viewed as an unknowable trade-off between reducing the dependency chain and the pseudo-canonical min/max form, so we'd add intrinsics, and defer that decision to the backend. I'll wait to see if there are any other arguments presented. @max2 vs. @max3 is a straightforward commute that we should have been doing anyway, so I can start there. Assuming we go with @max3, we need to add something to DAGCombine to turn that back into @max1 (PPC w/ isel and AArch64 do better with @max1; x86 is the same). On Mon, Nov 7, 2016 at 2:47 PM, Hal Finkel <hfinkel at anl.gov> wrote:> > ------------------------------ > > *From: *"Sanjay Patel via llvm-dev" <llvm-dev at lists.llvm.org> > *To: *"llvm-dev" <llvm-dev at lists.llvm.org> > *Sent: *Monday, November 7, 2016 1:01:27 PM > *Subject: *[llvm-dev] should we have IR intrinsics for integer min/max? > > Hi - > > The answer to this question may help to resolve larger questions about > intrinsics and vectorization that were discussed at the dev mtg last week, > but let's start with the basics: > > Which, if any, of these is the canonical IR? > > ; ret = x < y ? 0 : x-y > define i32 @max1(i32 %x, i32 %y) { > %sub = sub nsw i32 %x, %y > %cmp = icmp slt i32 %x, %y ; cmp is independent of sub > %sel = select i1 %cmp, i32 0, i32 %sub > ret i32 %sel > } > > ; ret = (x-y) < 0 ? 0 : x-y > define i32 @max2(i32 %x, i32 %y) { > %sub = sub nsw i32 %x, %y > %cmp = icmp slt i32 %sub, 0 ; cmp depends on sub, but this looks more > like a max? > %sel = select i1 %cmp, i32 0, i32 %sub > ret i32 %sel > } > > ; ret = (x-y) > 0 ? x-y : 0 > define i32 @max3(i32 %x, i32 %y) { > %sub = sub nsw i32 %x, %y > %cmp = icmp sgt i32 %sub, 0 ; canonicalize cmp+sel - looks even more > like a max? > %sel = select i1 %cmp, i32 %sub, i32 0 > ret i32 %sel > } > > Noting that all of the above use the same number of IR instructions, I > prefer this third option: > > 1. It uses fewer values in the icmp/select, so the live range of the x > and y, individually, is shorter. This seems like a reasonable metric for > simplicity. > 2. Using a comparison of (x-y) against zero likely makes it easier for > computing known bits to simply the answer (you only need to compute the > sign bit). > 3. The constant of the select, 0, is the second argument (which seems to > reflect our general canonical choice). > > > define i32 @max4(i32 %x, i32 %y) { > %sub = sub nsw i32 %x, %y > %max = llvm.smax.i32(i32 %sub, i32 0) ; this intrinsic doesn't exist > today > ret i32 %max > } > > I don't currently see the need for a new intrinsic. > > > FWIW, InstCombine doesn't canonicalize any of the first 3 options > currently. Codegen suffers because of that (depending on the target machine > and data types). Regardless of the IR choice, some backend fixes are needed. > > Another possible consideration is the structure/accuracy of the cost > models used by the vectorizers and other passes. I don't think they ever > special-case the cmp+sel pair as a possibly unified (and therefore cheaper > than the sum of the parts) operation. > > We don't have a facility currently for the target to provide a cost for > combined operations. We should, but there's design work to be done. > > -Hal > > > Note that we added FP variants for min/max ops with: > https://reviews.llvm.org/rL220341 > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161107/3b735331/attachment.html>
Zaks, Ayal via llvm-dev
2016-Nov-14 11:44 UTC
[llvm-dev] should we have IR intrinsics for integer min/max?
Hal> Noting that all of the above use the same number of IR instructions, I prefer this third option: as does RecurrenceDescriptor::isMinMaxSelectCmpPattern(), which looks for “Select(ICmp(X, Y), X, Y)”. Sanjay> Another possible consideration is the structure/accuracy of the cost models used by the vectorizers and other passes. I don't think they ever special-case the cmp+sel pair as a possibly unified (and therefore cheaper than the sum of the parts) operation. They call the above to identify min/max reductions; and use TTI::getCmpSelInstrCost(). Special-casing may be redundant if the costs of scalar-vs-vector min/max correspond to the costs of scalar-vs-vector cmp&sel. Ayal. From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Sanjay Patel via llvm-dev Sent: Tuesday, November 08, 2016 02:13 To: Hal Finkel <hfinkel at anl.gov> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] should we have IR intrinsics for integer min/max? Thanks, Hal and Matt for the feedback. As usual, my instincts about canonicalization were probably wrong. :) I thought that @max1 vs. @max3 would be viewed as an unknowable trade-off between reducing the dependency chain and the pseudo-canonical min/max form, so we'd add intrinsics, and defer that decision to the backend. I'll wait to see if there are any other arguments presented. @max2 vs. @max3 is a straightforward commute that we should have been doing anyway, so I can start there. Assuming we go with @max3, we need to add something to DAGCombine to turn that back into @max1 (PPC w/ isel and AArch64 do better with @max1; x86 is the same). On Mon, Nov 7, 2016 at 2:47 PM, Hal Finkel <hfinkel at anl.gov<mailto:hfinkel at anl.gov>> wrote: ________________________________ From: "Sanjay Patel via llvm-dev" <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> To: "llvm-dev" <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> Sent: Monday, November 7, 2016 1:01:27 PM Subject: [llvm-dev] should we have IR intrinsics for integer min/max? Hi - The answer to this question may help to resolve larger questions about intrinsics and vectorization that were discussed at the dev mtg last week, but let's start with the basics: Which, if any, of these is the canonical IR? ; ret = x < y ? 0 : x-y define i32 @max1(i32 %x, i32 %y) { %sub = sub nsw i32 %x, %y %cmp = icmp slt i32 %x, %y ; cmp is independent of sub %sel = select i1 %cmp, i32 0, i32 %sub ret i32 %sel } ; ret = (x-y) < 0 ? 0 : x-y define i32 @max2(i32 %x, i32 %y) { %sub = sub nsw i32 %x, %y %cmp = icmp slt i32 %sub, 0 ; cmp depends on sub, but this looks more like a max? %sel = select i1 %cmp, i32 0, i32 %sub ret i32 %sel } ; ret = (x-y) > 0 ? x-y : 0 define i32 @max3(i32 %x, i32 %y) { %sub = sub nsw i32 %x, %y %cmp = icmp sgt i32 %sub, 0 ; canonicalize cmp+sel - looks even more like a max? %sel = select i1 %cmp, i32 %sub, i32 0 ret i32 %sel } Noting that all of the above use the same number of IR instructions, I prefer this third option: 1. It uses fewer values in the icmp/select, so the live range of the x and y, individually, is shorter. This seems like a reasonable metric for simplicity. 2. Using a comparison of (x-y) against zero likely makes it easier for computing known bits to simply the answer (you only need to compute the sign bit). 3. The constant of the select, 0, is the second argument (which seems to reflect our general canonical choice). define i32 @max4(i32 %x, i32 %y) { %sub = sub nsw i32 %x, %y %max = llvm.smax.i32(i32 %sub, i32 0) ; this intrinsic doesn't exist today ret i32 %max } I don't currently see the need for a new intrinsic. FWIW, InstCombine doesn't canonicalize any of the first 3 options currently. Codegen suffers because of that (depending on the target machine and data types). Regardless of the IR choice, some backend fixes are needed. Another possible consideration is the structure/accuracy of the cost models used by the vectorizers and other passes. I don't think they ever special-case the cmp+sel pair as a possibly unified (and therefore cheaper than the sum of the parts) operation. We don't have a facility currently for the target to provide a cost for combined operations. We should, but there's design work to be done. -Hal Note that we added FP variants for min/max ops with: https://reviews.llvm.org/rL220341 _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161114/e8584092/attachment.html>
Sanjay Patel via llvm-dev
2016-Nov-14 14:54 UTC
[llvm-dev] should we have IR intrinsics for integer min/max?
Thanks, Ayal. I had not seen that API until now. So...it's a bit of a mess right now. We have: 1. RecurrenceDescriptor::isMinMaxSelectCmpPattern() 2. ValueTracking's llvm::matchSelectPattern() 3. At least 3 places in InstCombine that check for select(icmp(x,y) x, y) OR select(icmp(x, y), y, x); grep for: "instruction is used exclusively by a select as" ... // part of a minimum or maximum operation. If so, refrain from doing // any other folding. This helps out other analyses which understand // non-obfuscated minimum and maximum idioms, such as ScalarEvolution // and CodeGen. I posted a patch for a first step of improving canonicalization here: https://reviews.llvm.org/D26525 And I've enhanced ValueTracking in these commits: https://reviews.llvm.org/rL285499 https://reviews.llvm.org/rL286318 https://reviews.llvm.org/rL286776 Should everyone be using the ValueTracking function as the point-of-truth about whether something is min/max? Even with that, the InstCombine bail-out logic seems shaky to me - we probably need to refine where exactly we want to *not* do a transform in order to prevent breaking the min/max idiom. Does this change anyone's opinion about whether we need min/max intrinsics? On Mon, Nov 14, 2016 at 4:44 AM, Zaks, Ayal <ayal.zaks at intel.com> wrote:> Hal> Noting that all of the above use the same number of IR instructions, > I prefer this third option: > > > > as does RecurrenceDescriptor::isMinMaxSelectCmpPattern(), which looks for > “Select(ICmp(X, Y), X, Y)”. > > > > > > Sanjay> Another possible consideration is the structure/accuracy of the > cost models used by the vectorizers and other passes. I don't think they > ever special-case the cmp+sel pair as a possibly unified (and therefore > cheaper than the sum of the parts) operation. > > > > They call the above to identify min/max reductions; and use > TTI::getCmpSelInstrCost(). Special-casing may be redundant if the costs of > scalar-vs-vector min/max correspond to the costs of scalar-vs-vector > cmp&sel. > > > > Ayal. > > > > *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Sanjay > Patel via llvm-dev > *Sent:* Tuesday, November 08, 2016 02:13 > *To:* Hal Finkel <hfinkel at anl.gov> > *Cc:* llvm-dev <llvm-dev at lists.llvm.org> > *Subject:* Re: [llvm-dev] should we have IR intrinsics for integer > min/max? > > > > Thanks, Hal and Matt for the feedback. As usual, my instincts about > canonicalization were probably wrong. :) > > > I thought that @max1 vs. @max3 would be viewed as an unknowable trade-off > between reducing the dependency chain and the pseudo-canonical min/max > form, so we'd add intrinsics, and defer that decision to the backend. > > I'll wait to see if there are any other arguments presented. > > > > @max2 vs. @max3 is a straightforward commute that we should have been > doing anyway, so I can start there. Assuming we go with @max3, we need to > add something to DAGCombine to turn that back into @max1 (PPC w/ isel and > AArch64 do better with @max1; x86 is the same). > > > > > > On Mon, Nov 7, 2016 at 2:47 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > > ------------------------------ > > *From: *"Sanjay Patel via llvm-dev" <llvm-dev at lists.llvm.org> > *To: *"llvm-dev" <llvm-dev at lists.llvm.org> > *Sent: *Monday, November 7, 2016 1:01:27 PM > *Subject: *[llvm-dev] should we have IR intrinsics for integer min/max? > > Hi - > > The answer to this question may help to resolve larger questions about > intrinsics and vectorization that were discussed at the dev mtg last week, > but let's start with the basics: > > Which, if any, of these is the canonical IR? > > > ; ret = x < y ? 0 : x-y > > define i32 @max1(i32 %x, i32 %y) { > %sub = sub nsw i32 %x, %y > %cmp = icmp slt i32 %x, %y ; cmp is independent of sub > %sel = select i1 %cmp, i32 0, i32 %sub > ret i32 %sel > } > > ; ret = (x-y) < 0 ? 0 : x-y > > define i32 @max2(i32 %x, i32 %y) { > %sub = sub nsw i32 %x, %y > %cmp = icmp slt i32 %sub, 0 ; cmp depends on sub, but this looks more > like a max? > %sel = select i1 %cmp, i32 0, i32 %sub > ret i32 %sel > } > > ; ret = (x-y) > 0 ? x-y : 0 > > define i32 @max3(i32 %x, i32 %y) { > %sub = sub nsw i32 %x, %y > %cmp = icmp sgt i32 %sub, 0 ; canonicalize cmp+sel - looks even more > like a max? > %sel = select i1 %cmp, i32 %sub, i32 0 > ret i32 %sel > } > > Noting that all of the above use the same number of IR instructions, I > prefer this third option: > > 1. It uses fewer values in the icmp/select, so the live range of the x > and y, individually, is shorter. This seems like a reasonable metric for > simplicity. > 2. Using a comparison of (x-y) against zero likely makes it easier for > computing known bits to simply the answer (you only need to compute the > sign bit). > 3. The constant of the select, 0, is the second argument (which seems to > reflect our general canonical choice). > > > define i32 @max4(i32 %x, i32 %y) { > %sub = sub nsw i32 %x, %y > > %max = llvm.smax.i32(i32 %sub, i32 0) ; this intrinsic doesn't exist > today > > ret i32 %max > } > > I don't currently see the need for a new intrinsic. > > > > FWIW, InstCombine doesn't canonicalize any of the first 3 options > currently. Codegen suffers because of that (depending on the target machine > and data types). Regardless of the IR choice, some backend fixes are needed. > > > Another possible consideration is the structure/accuracy of the cost > models used by the vectorizers and other passes. I don't think they ever > special-case the cmp+sel pair as a possibly unified (and therefore cheaper > than the sum of the parts) operation. > > We don't have a facility currently for the target to provide a cost for > combined operations. We should, but there's design work to be done. > > -Hal > > > Note that we added FP variants for min/max ops with: > https://reviews.llvm.org/rL220341 > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > -- > > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > > > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161114/537228de/attachment.html>