thr3ads.net - llvm dev - [llvm-dev] [RFC] Canonicalization of unsigned subtraction with saturation [May 2017]

If this information is useful, please help other people find it:
Share via:

Friedman, Eli via llvm-dev

2017-May-16 18:18 UTC

[llvm-dev] [RFC] Canonicalization of unsigned subtraction with saturation

On 5/16/2017 6:30 AM, Sanjay Patel wrote:> Thanks for posting this question, Julia.
>
> I had a similar question about a signed min/max variant here:
> http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html
>
> The 2nd version in each case contains a canonical max/min 
> representation in IR, and this could enable more IR analysis.
> A secondary advantage is that the backend recognizes the max/min in 
> the second IR form when creating DAG nodes,
> and this directly affects isel for many targets.
This seems important.  And pattern-matching max(x,y)-y to a saturating 
subtract seems easy in the backend.
> A possibly important difference between the earlier example and the 
> current unsigned case:
> is a select with a zero constant operand easier to reason about in IR 
> than the canonical min/max?
It might be in some cases... maybe?  I mean, it might be easier to 
analyze in ComputeMaskedBits or something, but we don't really do much 
to optimize selects involving zero.

-Eli
> On Tue, May 16, 2017 at 5:30 AM, Koval, Julia <julia.koval at intel.com 
> <mailto:julia.koval at intel.com>> wrote:
>
>     (1.16)
>     %cmp = icmp ugt i16 %x, %y
>     %sub2 = sub i16 %y, %x
>     %res = select i1 %cmp, i16 0, i16 %sub2
>
>     or
>
>     (2.16)
>     %cmp = icmp ugt i16 %x, %y
>     %sel = select i1 %cmp, i16 %x, i16 %y
>     %sub = sub i16 %sel, %x
>
>     Which of these versions is canonical? I think first version is
>     better, because it can be converted to unsigned saturation
>     instruction(i.e. PSUBUS), using existing backend code.
>
>
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170516/2346d49b/attachment.html>

Sanjay Patel via llvm-dev

2017-May-16 19:49 UTC

head link

[llvm-dev] [RFC] Canonicalization of unsigned subtraction with saturation

On Tue, May 16, 2017 at 12:18 PM, Friedman, Eli <efriedma at
codeaurora.org>
wrote:
> On 5/16/2017 6:30 AM, Sanjay Patel wrote:
>
> Thanks for posting this question, Julia.
>
> I had a similar question about a signed min/max variant here:
> http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html
>
> The 2nd version in each case contains a canonical max/min representation
> in IR, and this could enable more IR analysis.
> A secondary advantage is that the backend recognizes the max/min in the
> second IR form when creating DAG nodes,
> and this directly affects isel for many targets.
>
>
> This seems important.  And pattern-matching max(x,y)-y to a saturating
> subtract seems easy in the backend.
>
> A possibly important difference between the earlier example and the
> current unsigned case:
> is a select with a zero constant operand easier to reason about in IR than
> the canonical min/max?
>
>
> It might be in some cases... maybe?  I mean, it might be easier to analyze
> in ComputeMaskedBits or something, but we don't really do much to
optimize
> selects involving zero.
>
Because of my CPU upbringing, I always see this:
select A, B, 0

as:
and (sext A), B

Any chance of control-flow is bad!
...but now I know that's wrong for IR. :)

So forming the min/max sounds like the right answer to me.

Note that we don't actually canonicalize the signed min/max cases from the
earlier thread yet. We detect those in value tracking, and that was good
enough to produce the ideal backend results, but I haven't gotten back to
doing the transform in IR.





> -Eli
>
> On Tue, May 16, 2017 at 5:30 AM, Koval, Julia <julia.koval at
intel.com>
> wrote:
>
>> (1.16)
>> %cmp = icmp ugt i16 %x, %y
>> %sub2 = sub i16 %y, %x
>> %res = select i1 %cmp, i16 0, i16 %sub2
>>
>> or
>>
>> (2.16)
>> %cmp = icmp ugt i16 %x, %y
>> %sel = select i1 %cmp, i16 %x, i16 %y
>> %sub = sub i16 %sel, %x
>>
>> Which of these versions is canonical? I think first version is better,
>> because it can be converted to unsigned saturation instruction(i.e.
>> PSUBUS), using existing backend code.
>>
>
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170516/0ebac87d/attachment.html>

Koval, Julia via llvm-dev

2017-May-26 09:46 UTC

head link

[llvm-dev] [RFC] Canonicalization of unsigned subtraction with saturation

So, can we declare the second version as a cannonical? 
Second version also has another advantage. If operands of select are different
types, it still works as max. In the first case it is separated by trunk and not
converts to max instruction in the end. Here is an example for 32 and 16 bit:

(1)
void foo(unsigned short *p, int max, int n) {
  int i;
unsigned m;
for (i = 0; i < n; i++) {
    m = *--p;
    *p = (unsigned short)(m >= max ? m-max : 0);
  }
}

(2)
void goo(unsigned short *p, int max, int n) {
  int i;
  unsigned m;
  for (i = 0; i < n; i++) {
    m = *--p;
    unsigned umax = m > max ? m : max;
    *p = (unsigned short)(umax - max);
  }
}

(1)
%cmp = icmp ugt i32 %x, %y
%sub2 = sub i32 %y, %x
%tr = trunc i32 %sub2 to i16
%res = select i1 %cmp, i16 0, i16 %tr

 or 

(2)
%cmp = icmp ugt i32 %x, %y
%sel = select i1 %cmp, i32 %x, i32 %y
%sub = sub i32 %sel, %x
%res = trunc i32 %sub to i16
 

-Julia
> -----Original Message-----
> From: Sanjay Patel [mailto:spatel at rotateright.com]
> Sent: Tuesday, May 16, 2017 9:49 PM
> To: Friedman, Eli <efriedma at codeaurora.org>
> Cc: Koval, Julia <julia.koval at intel.com>; llvm-dev at
lists.llvm.org; Bozhenov,
> Nikolai <nikolai.bozhenov at intel.com>; Elovikov, Andrei
> <andrei.elovikov at intel.com>; Hal Finkel <hfinkel at
anl.gov>; David Majnemer
> <david.majnemer at gmail.com>
> Subject: Re: [RFC] Canonicalization of unsigned subtraction with saturation
> 
> 
> 
> On Tue, May 16, 2017 at 12:18 PM, Friedman, Eli <efriedma at
codeaurora.org
> <mailto:efriedma at codeaurora.org> > wrote:
> 
> 
> 
> 	On 5/16/2017 6:30 AM, Sanjay Patel wrote:
> 
> 
> 		Thanks for posting this question, Julia.
> 
> 		I had a similar question about a signed min/max variant here:
> 		http://lists.llvm.org/pipermail/llvm-dev/2016-
> November/106868.html <http://lists.llvm.org/pipermail/llvm-dev/2016-
> November/106868.html>
> 
> 		The 2nd version in each case contains a canonical max/min
> representation in IR, and this could enable more IR analysis.
> 		A secondary advantage is that the backend recognizes the
> max/min in the second IR form when creating DAG nodes,
> 		and this directly affects isel for many targets.
> 
> 
> 
> 	This seems important.  And pattern-matching max(x,y)-y to a saturating
> subtract seems easy in the backend.
> 
> 
> 
> 		A possibly important difference between the earlier example
> and the current unsigned case:
> 		is a select with a zero constant operand easier to reason about
> in IR than the canonical min/max?
> 
> 
> 
> 	It might be in some cases... maybe?  I mean, it might be easier to
> analyze in ComputeMaskedBits or something, but we don't really do much
to
> optimize selects involving zero.
> 
> 
> 
> Because of my CPU upbringing, I always see this:
> 
> select A, B, 0
> 
> 
> as:
> 
> and (sext A), B
> 
> 
> Any chance of control-flow is bad!
> ...but now I know that's wrong for IR. :)
> 
> 
> So forming the min/max sounds like the right answer to me.
> 
> 
> Note that we don't actually canonicalize the signed min/max cases from
the
> earlier thread yet. We detect those in value tracking, and that was good
enough
> to produce the ideal backend results, but I haven't gotten back to
doing the
> transform in IR.
> 
> 
> 
> 
> 
> 
> 
> 	-Eli
> 
> 
> 
> 		On Tue, May 16, 2017 at 5:30 AM, Koval, Julia
> <julia.koval at intel.com <mailto:julia.koval at intel.com> >
wrote:
> 
> 
> 			(1.16)
> 			%cmp = icmp ugt i16 %x, %y
> 			%sub2 = sub i16 %y, %x
> 			%res = select i1 %cmp, i16 0, i16 %sub2
> 
> 			or
> 
> 			(2.16)
> 			%cmp = icmp ugt i16 %x, %y
> 			%sel = select i1 %cmp, i16 %x, i16 %y
> 			%sub = sub i16 %sel, %x
> 
> 			Which of these versions is canonical? I think first
> version is better, because it can be converted to unsigned saturation
> instruction(i.e. PSUBUS), using existing backend code.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 	--
> 	Employee of Qualcomm Innovation Center, Inc.
> 	Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

llvm dev - May 2017 - [RFC] Canonicalization of unsigned subtraction with saturation

[llvm-dev] [RFC] Canonicalization of unsigned subtraction with saturation

[llvm-dev] [RFC] Canonicalization of unsigned subtraction with saturation

[llvm-dev] [RFC] Canonicalization of unsigned subtraction with saturation