thr3ads.net - llvm dev - [llvm-dev] should we have IR intrinsics for integer min/max? [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Sanjay Patel via llvm-dev

2016-Nov-07 19:01 UTC

[llvm-dev] should we have IR intrinsics for integer min/max?

Hi -

The answer to this question may help to resolve larger questions about
intrinsics and vectorization that were discussed at the dev mtg last week,
but let's start with the basics:

Which, if any, of these is the canonical IR?

; ret = x < y ? 0 : x-y
define i32 @max1(i32 %x, i32 %y) {
  %sub = sub nsw i32 %x, %y
  %cmp = icmp slt i32 %x, %y ; cmp is independent of sub
  %sel = select i1 %cmp, i32 0, i32 %sub
  ret i32 %sel
}

; ret = (x-y) < 0 ? 0 : x-y
define i32 @max2(i32 %x, i32 %y) {
  %sub = sub nsw i32 %x, %y
  %cmp = icmp slt i32 %sub, 0 ; cmp depends on sub, but this looks more
like a max?
  %sel = select i1 %cmp, i32 0, i32 %sub
  ret i32 %sel
}

; ret = (x-y) > 0 ? x-y : 0
define i32 @max3(i32 %x, i32 %y) {
  %sub = sub nsw i32 %x, %y
  %cmp = icmp sgt i32 %sub, 0 ; canonicalize cmp+sel - looks even more like
a max?
  %sel = select i1 %cmp, i32 %sub, i32 0
  ret i32 %sel
}

define i32 @max4(i32 %x, i32 %y) {
  %sub = sub nsw i32 %x, %y
  %max = llvm.smax.i32(i32 %sub, i32 0) ; this intrinsic doesn't exist today
  ret i32 %max
}


FWIW, InstCombine doesn't canonicalize any of the first 3 options
currently. Codegen suffers because of that (depending on the target machine
and data types). Regardless of the IR choice, some backend fixes are needed.

Another possible consideration is the structure/accuracy of the cost models
used by the vectorizers and other passes. I don't think they ever
special-case the cmp+sel pair as a possibly unified (and therefore cheaper
than the sum of the parts) operation.

Note that we added FP variants for min/max ops with:
https://reviews.llvm.org/rL220341
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161107/eaa508e6/attachment.html>

Matt Arsenault via llvm-dev

2016-Nov-07 19:06 UTC

head link

[llvm-dev] should we have IR intrinsics for integer min/max?

On 11/07/2016 11:01 AM, Sanjay Patel via llvm-dev wrote:>
> FWIW, InstCombine doesn't canonicalize any of the first 3 options 
> currently. Codegen suffers because of that (depending on the target 
> machine and data types). Regardless of the IR choice, some backend 
> fixes are needed.
>
> Another possible consideration is the structure/accuracy of the cost 
> models used by the vectorizers and other passes. I don't think they 
> ever special-case the cmp+sel pair as a possibly unified (and 
> therefore cheaper than the sum of the parts) operation.
>
> Note that we added FP variants for min/max ops with:
> https://reviews.llvm.org/rL220341
FP min/max is different and more complicated due to the special NaN 
handling behavior. Integer min/max is representable with only a compare 
and select, so I think it would be preferable to just canonicalize to 
using those two instructions

-Matt

Manuel Jacob via llvm-dev

2016-Nov-07 20:20 UTC

head link

[llvm-dev] should we have IR intrinsics for integer min/max?

On 2016-11-07 20:01, Sanjay Patel via llvm-dev wrote:> FWIW, InstCombine doesn't canonicalize any of the first 3 options
> currently. Codegen suffers because of that (depending on the target 
> machine
> and data types). Regardless of the IR choice, some backend fixes are 
> needed.
I'm missing context here.  Can you describe in more detail how the IR 
choice affects the code generation?  In case the target has special 
integer min / max instructions, why is matching all three variants 
difficult?

-Manuel

Sanjay Patel via llvm-dev

2016-Nov-07 20:33 UTC

head link

[llvm-dev] should we have IR intrinsics for integer min/max?

Codegen is not the primary motivation here, so maybe I shouldn't have even
mentioned that. However, you can find more context in:
https://reviews.llvm.org/D26091
https://reviews.llvm.org/D26096 (note how the optimizer can regress codegen)

The main concern is that we should choose a canonical form for IR that is
easiest to reason about, and then we should transform all IR to that form.
The backend shouldn't have to pattern match all of these variants -
that's
what IR is for.

On Mon, Nov 7, 2016 at 1:20 PM, Manuel Jacob <me at manueljacob.de> wrote:
> On 2016-11-07 20:01, Sanjay Patel via llvm-dev wrote:
>
>> FWIW, InstCombine doesn't canonicalize any of the first 3 options
>> currently. Codegen suffers because of that (depending on the target
>> machine
>> and data types). Regardless of the IR choice, some backend fixes are
>> needed.
>>
>
> I'm missing context here.  Can you describe in more detail how the IR
> choice affects the code generation?  In case the target has special integer
> min / max instructions, why is matching all three variants difficult?
>
> -Manuel
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161107/afede71b/attachment.html>

Hal Finkel via llvm-dev

2016-Nov-07 21:47 UTC

head link

[llvm-dev] should we have IR intrinsics for integer min/max?

----- Original Message -----
> From: "Sanjay Patel via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Monday, November 7, 2016 1:01:27 PM
> Subject: [llvm-dev] should we have IR intrinsics for integer min/max?
> Hi -
> The answer to this question may help to resolve larger questions
> about intrinsics and vectorization that were discussed at the dev
> mtg last week, but let's start with the basics:
> Which, if any, of these is the canonical IR?
> ; ret = x < y ? 0 : x-y
> define i32 @max1(i32 %x, i32 %y) {
> %sub = sub nsw i32 %x, %y
> %cmp = icmp slt i32 %x, %y ; cmp is independent of sub
> %sel = select i1 %cmp, i32 0, i32 %sub
> ret i32 %sel
> }
> ; ret = (x-y) < 0 ? 0 : x-y
> define i32 @max2(i32 %x, i32 %y) {
> %sub = sub nsw i32 %x, %y
> %cmp = icmp slt i32 %sub, 0 ; cmp depends on sub, but this looks more
> like a max?
> %sel = select i1 %cmp, i32 0, i32 %sub
> ret i32 %sel
> }
> ; ret = (x-y) > 0 ? x-y : 0
> define i32 @max3(i32 %x, i32 %y) {
> %sub = sub nsw i32 %x, %y
> %cmp = icmp sgt i32 %sub, 0 ; canonicalize cmp+sel - looks even more
> like a max?
> %sel = select i1 %cmp, i32 %sub, i32 0
> ret i32 %sel
> }
Noting that all of the above use the same number of IR instructions, I prefer
this third option:

1. It uses fewer values in the icmp/select, so the live range of the x and y,
individually, is shorter. This seems like a reasonable metric for simplicity.
2. Using a comparison of (x-y) against zero likely makes it easier for computing
known bits to simply the answer (you only need to compute the sign bit).
3. The constant of the select, 0, is the second argument (which seems to reflect
our general canonical choice).
> define i32 @max4(i32 %x, i32 %y) {
> %sub = sub nsw i32 %x, %y
> %max = llvm.smax.i32(i32 %sub, i32 0) ; this intrinsic doesn't exist
> today
> ret i32 %max
> }
I don't currently see the need for a new intrinsic. 
> FWIW, InstCombine doesn't canonicalize any of the first 3 options
> currently. Codegen suffers because of that (depending on the target
> machine and data types). Regardless of the IR choice, some backend
> fixes are needed.
> Another possible consideration is the structure/accuracy of the cost
> models used by the vectorizers and other passes. I don't think they
> ever special-case the cmp+sel pair as a possibly unified (and
> therefore cheaper than the sum of the parts) operation.
We don't have a facility currently for the target to provide a cost for
combined operations. We should, but there's design work to be done.

-Hal 
> Note that we added FP variants for min/max ops with:
> https://reviews.llvm.org/rL220341
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161107/4a2c2b91/attachment.html>

Sanjay Patel via llvm-dev

2016-Nov-08 00:13 UTC

head link

[llvm-dev] should we have IR intrinsics for integer min/max?

Thanks, Hal and Matt for the feedback. As usual, my instincts about
canonicalization were probably wrong. :)

I thought that @max1 vs. @max3 would be viewed as an unknowable trade-off
between reducing the dependency chain and the pseudo-canonical min/max
form, so we'd add intrinsics, and defer that decision to the backend.

I'll wait to see if there are any other arguments presented.

@max2 vs. @max3 is a straightforward commute that we should have been doing
anyway, so I can start there. Assuming we go with @max3, we need to add
something to DAGCombine to turn that back into @max1 (PPC w/ isel and
AArch64 do better with @max1; x86 is the same).


On Mon, Nov 7, 2016 at 2:47 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
> ------------------------------
>
> *From: *"Sanjay Patel via llvm-dev" <llvm-dev at
lists.llvm.org>
> *To: *"llvm-dev" <llvm-dev at lists.llvm.org>
> *Sent: *Monday, November 7, 2016 1:01:27 PM
> *Subject: *[llvm-dev] should we have IR intrinsics for integer min/max?
>
> Hi -
>
> The answer to this question may help to resolve larger questions about
> intrinsics and vectorization that were discussed at the dev mtg last week,
> but let's start with the basics:
>
> Which, if any, of these is the canonical IR?
>
> ; ret = x < y ? 0 : x-y
> define i32 @max1(i32 %x, i32 %y) {
>   %sub = sub nsw i32 %x, %y
>   %cmp = icmp slt i32 %x, %y ; cmp is independent of sub
>   %sel = select i1 %cmp, i32 0, i32 %sub
>   ret i32 %sel
> }
>
> ; ret = (x-y) < 0 ? 0 : x-y
> define i32 @max2(i32 %x, i32 %y) {
>   %sub = sub nsw i32 %x, %y
>   %cmp = icmp slt i32 %sub, 0 ; cmp depends on sub, but this looks more
> like a max?
>   %sel = select i1 %cmp, i32 0, i32 %sub
>   ret i32 %sel
> }
>
> ; ret = (x-y) > 0 ? x-y : 0
> define i32 @max3(i32 %x, i32 %y) {
>   %sub = sub nsw i32 %x, %y
>   %cmp = icmp sgt i32 %sub, 0 ; canonicalize cmp+sel - looks even more
> like a max?
>   %sel = select i1 %cmp, i32 %sub, i32 0
>   ret i32 %sel
> }
>
> Noting that all of the above use the same number of IR instructions, I
> prefer this third option:
>
>  1. It uses fewer values in the icmp/select, so the live range of the x
> and y, individually, is shorter. This seems like a reasonable metric for
> simplicity.
>  2. Using a comparison of (x-y) against zero likely makes it easier for
> computing known bits to simply the answer (you only need to compute the
> sign bit).
>  3. The constant of the select, 0, is the second argument (which seems to
> reflect our general canonical choice).
>
>
> define i32 @max4(i32 %x, i32 %y) {
>   %sub = sub nsw i32 %x, %y
>   %max = llvm.smax.i32(i32 %sub, i32 0) ; this intrinsic doesn't exist
> today
>   ret i32 %max
> }
>
> I don't currently see the need for a new intrinsic.
>
>
> FWIW, InstCombine doesn't canonicalize any of the first 3 options
> currently. Codegen suffers because of that (depending on the target machine
> and data types). Regardless of the IR choice, some backend fixes are
needed.
>
> Another possible consideration is the structure/accuracy of the cost
> models used by the vectorizers and other passes. I don't think they
ever
> special-case the cmp+sel pair as a possibly unified (and therefore cheaper
> than the sum of the parts) operation.
>
> We don't have a facility currently for the target to provide a cost for
> combined operations. We should, but there's design work to be done.
>
>  -Hal
>
>
> Note that we added FP variants for min/max ops with:
> https://reviews.llvm.org/rL220341
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161107/3b735331/attachment.html>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Nov 2016 - should we have IR intrinsics for integer min/max?

[llvm-dev] should we have IR intrinsics for integer min/max?

[llvm-dev] should we have IR intrinsics for integer min/max?

[llvm-dev] should we have IR intrinsics for integer min/max?

[llvm-dev] should we have IR intrinsics for integer min/max?

[llvm-dev] should we have IR intrinsics for integer min/max?

[llvm-dev] should we have IR intrinsics for integer min/max?

Reasonably Related Threads