thr3ads.net - llvm dev - [llvm-dev] IR canonicalization: shufflevector or vector trunc? [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Sanjay Patel via llvm-dev

2017-Jan-17 16:38 UTC

[llvm-dev] IR canonicalization: shufflevector or vector trunc?

We use InstCombiner::ShouldChangeType() to prevent transforms to illegal
integer types, but I'm not sure how that would apply to vector types.

Ie, let's say v256 is a legal type in your example. DataLayout doesn't
appear to specify what configurations of a 256-bit vector are legal, so I
don't think we can currently use that to say v2i128 should be treated
differently than v16i16.

Is this a valid argument to not canonicalize the IR?

On Mon, Jan 16, 2017 at 10:16 AM, Rackover, Zvi <zvi.rackover at
intel.com>
wrote:
> Suppose we prefer the ‘trunc’ form, then what about cases such as:
>
> define <2 x i16> @shuffle(<16 x i16> %x) {
>
>   %shuf = shufflevector <16 x i16> %x, <16 x i16> undef, <2
x i32> <i32 0,
> i32 8>
>
>   ret <2 x i16> %shuf
>
> }
>
>
>
> Will the ‘shufflevector’ be canonicalized to a ‘trunc’ of a vector of i128?
>
> define <2 x i16> @trunc(<16 x i16> %x) {
>
>   %bc = bitcast <16 x i16> %x to <2 x i128>
>
>   %tr = trunc <2 x i128> %bc to <2 x i16>
>
>   ret <2 x i16> %tr
>
> }
>
> This may challenge the Legalizer downstream.
>
>
>
> --Zvi
>
>
>
> *From:* Sanjay Patel [mailto:spatel at rotateright.com]
> *Sent:* Friday, January 13, 2017 18:19
> *To:* Rackover, Zvi <zvi.rackover at intel.com>
> *Cc:* Friedman, Eli <efriedma at codeaurora.org>; llvm-dev <
> llvm-dev at lists.llvm.org>
>
> *Subject:* Re: [llvm-dev] IR canonicalization: shufflevector or vector
> trunc?
>
>
>
> Right - I think that case looks like this for little endian:
>
> define <2 x i32> @zextshuffle(<2 x i16> %x) {
>   %zext_shuffle = shufflevector <2 x i16> %x, <2 x i16>
zeroinitializer,
> <4 x i32> <i32 0, i32 2, i32 1, i32 2>
>   %bc = bitcast <4 x i16> %zext_shuffle to <2 x i32>
>   ret <2 x i32> %bc
> }
>
> define <2 x i32> @zextvec(<2 x i16> %x) {
>   %zext = zext <2 x i16> %x to <2 x i32>
>   ret <2 x i32> %zext
> }
>
> IMO, the fact that we have to take endianness into account with the
> shuffles makes the trunc/zext forms the better choice. That way, we limit
> the endian dependency to one place in InstCombine, and other transforms
> don't have to worry about it. We also have lots of existing folds for
> trunc/zext and hardly any for shuffles.
>
>
>
>
>
> On Thu, Jan 12, 2017 at 1:14 PM, Rackover, Zvi <zvi.rackover at
intel.com>
> wrote:
>
> Just to add, there is also the ‘zext’ – ‘shuffle with zero’ duality which
> can broaden the discussion.
>
>
>
> --Zvi
>
>
>
> *From:* Sanjay Patel [mailto:spatel at rotateright.com]
> *Sent:* Thursday, January 12, 2017 20:19
> *To:* Friedman, Eli <efriedma at codeaurora.org>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; Rackover, Zvi <
> zvi.rackover at intel.com>
> *Subject:* Re: [llvm-dev] IR canonicalization: shufflevector or vector
> trunc?
>
>
>
>
>
>
>
> On Thu, Jan 12, 2017 at 11:06 AM, Friedman, Eli <efriedma at
codeaurora.org>
> wrote:
>
> On 1/12/2017 9:04 AM, Sanjay Patel via llvm-dev wrote:
>
> It's time for another round of "What is the canonical IR?"
>
> Credit for this episode to Zvi and PR31551. :)
> https://llvm.org/bugs/show_bug.cgi?id=31551
>
> define <4 x i16> @shuffle(<16 x i16> %x) {
>
>   %shuf = shufflevector <16 x i16> %x, <16 x i16> undef, <4
x i32> <i32 0, i32 4, i32 8, i32 12>
>
>   ret <4 x i16> %shuf
>
> }
>
>
>
> define <4 x i16> @trunc(<16 x i16> %x) {
>
>   %bc = bitcast <16 x i16> %x to <4 x i64>
>
>   %tr = trunc <4 x i64> %bc to <4 x i16>
>
>   ret <4 x i16> %tr
>
> }
>
>
>
> Potential reasons to prefer one or the other:
> 1. Shuffle is the most compact.
> 2. Trunc is easier to read.
> 3. One of these is easier for value tracking.
> 4. Compatibility with existing IR transforms (eg, InterleavedAccess
> recognizes the shuffle form).
>
> 5. We don't create arbitrary shuffle masks in IR because that's bad
for a
> lot of targets (but maybe this mask pattern should always be recognized as
> special?).
>
>
>
> Hmm... not sure what the right answer is, but a couple more observations:
> 1. If we're going to canonicalize, we should probably canonicalize the
> same way independent of the original argument type (so we would introduce
> bitcasts either way).
>
>
>
> Ah, right - kill #1 in my list.
>
>
>
> 2. Those two functions are only equivalent on little-endian platforms.
>
>
>
> I was wondering about that. So yes, if we do want to canonicalize (until
> the recent compile-time complaints, I always thought this was the objective
> of InstCombine...maybe it still is), then the masks we're matching or
> generating will differ based on endianness.
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170117/16824136/attachment-0001.html>

Rackover, Zvi via llvm-dev

2017-Jan-19 16:17 UTC

head link

[llvm-dev] IR canonicalization: shufflevector or vector trunc?

Hi Sanjay,

I agree we should also discuss *if* this canonicalization is beneficial.
For starters, do we have a concrete case where we would benefit from
canonicalizing shuffles <-> truncates in LLVM IR?
IMO, we should not count benefits for codegen because that alone does not
justify transforming the IR ; we could always do this on the SelectionDAG.

--Zvi

From: Sanjay Patel [mailto:spatel at rotateright.com]
Sent: Tuesday, January 17, 2017 18:38
To: Rackover, Zvi <zvi.rackover at intel.com>
Cc: Friedman, Eli <efriedma at codeaurora.org>; llvm-dev <llvm-dev at
lists.llvm.org>
Subject: Re: [llvm-dev] IR canonicalization: shufflevector or vector trunc?

We use InstCombiner::ShouldChangeType() to prevent transforms to illegal integer
types, but I'm not sure how that would apply to vector types.

Ie, let's say v256 is a legal type in your example. DataLayout doesn't
appear to specify what configurations of a 256-bit vector are legal, so I
don't think we can currently use that to say v2i128 should be treated
differently than v16i16.
Is this a valid argument to not canonicalize the IR?

On Mon, Jan 16, 2017 at 10:16 AM, Rackover, Zvi <zvi.rackover at
intel.com<mailto:zvi.rackover at intel.com>> wrote:
Suppose we prefer the ‘trunc’ form, then what about cases such as:
define <2 x i16> @shuffle(<16 x i16> %x) {
  %shuf = shufflevector <16 x i16> %x, <16 x i16> undef, <2 x
i32> <i32 0, i32 8>
  ret <2 x i16> %shuf
}

Will the ‘shufflevector’ be canonicalized to a ‘trunc’ of a vector of i128?
define <2 x i16> @trunc(<16 x i16> %x) {
  %bc = bitcast <16 x i16> %x to <2 x i128>
  %tr = trunc <2 x i128> %bc to <2 x i16>
  ret <2 x i16> %tr
}
This may challenge the Legalizer downstream.

--Zvi

From: Sanjay Patel [mailto:spatel at rotateright.com<mailto:spatel at
rotateright.com>]
Sent: Friday, January 13, 2017 18:19
To: Rackover, Zvi <zvi.rackover at intel.com<mailto:zvi.rackover at
intel.com>>
Cc: Friedman, Eli <efriedma at codeaurora.org<mailto:efriedma at
codeaurora.org>>; llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>

Subject: Re: [llvm-dev] IR canonicalization: shufflevector or vector trunc?

Right - I think that case looks like this for little endian:

define <2 x i32> @zextshuffle(<2 x i16> %x) {
  %zext_shuffle = shufflevector <2 x i16> %x, <2 x i16>
zeroinitializer, <4 x i32> <i32 0, i32 2, i32 1, i32 2>
  %bc = bitcast <4 x i16> %zext_shuffle to <2 x i32>
  ret <2 x i32> %bc
}

define <2 x i32> @zextvec(<2 x i16> %x) {
  %zext = zext <2 x i16> %x to <2 x i32>
  ret <2 x i32> %zext
}
IMO, the fact that we have to take endianness into account with the shuffles
makes the trunc/zext forms the better choice. That way, we limit the endian
dependency to one place in InstCombine, and other transforms don't have to
worry about it. We also have lots of existing folds for trunc/zext and hardly
any for shuffles.


On Thu, Jan 12, 2017 at 1:14 PM, Rackover, Zvi <zvi.rackover at
intel.com<mailto:zvi.rackover at intel.com>> wrote:
Just to add, there is also the ‘zext’ – ‘shuffle with zero’ duality which can
broaden the discussion.

--Zvi

From: Sanjay Patel [mailto:spatel at rotateright.com<mailto:spatel at
rotateright.com>]
Sent: Thursday, January 12, 2017 20:19
To: Friedman, Eli <efriedma at codeaurora.org<mailto:efriedma at
codeaurora.org>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>; Rackover, Zvi <zvi.rackover at
intel.com<mailto:zvi.rackover at intel.com>>
Subject: Re: [llvm-dev] IR canonicalization: shufflevector or vector trunc?



On Thu, Jan 12, 2017 at 11:06 AM, Friedman, Eli <efriedma at
codeaurora.org<mailto:efriedma at codeaurora.org>> wrote:
On 1/12/2017 9:04 AM, Sanjay Patel via llvm-dev wrote:
It's time for another round of "What is the canonical IR?"

Credit for this episode to Zvi and PR31551. :)
https://llvm.org/bugs/show_bug.cgi?id=31551

define <4 x i16> @shuffle(<16 x i16> %x) {

  %shuf = shufflevector <16 x i16> %x, <16 x i16> undef, <4 x
i32> <i32 0, i32 4, i32 8, i32 12>

  ret <4 x i16> %shuf

}



define <4 x i16> @trunc(<16 x i16> %x) {

  %bc = bitcast <16 x i16> %x to <4 x i64>

  %tr = trunc <4 x i64> %bc to <4 x i16>

  ret <4 x i16> %tr

}


Potential reasons to prefer one or the other:
1. Shuffle is the most compact.
2. Trunc is easier to read.
3. One of these is easier for value tracking.
4. Compatibility with existing IR transforms (eg, InterleavedAccess recognizes
the shuffle form).
5. We don't create arbitrary shuffle masks in IR because that's bad for
a lot of targets (but maybe this mask pattern should always be recognized as
special?).

Hmm... not sure what the right answer is, but a couple more observations:
1. If we're going to canonicalize, we should probably canonicalize the same
way independent of the original argument type (so we would introduce bitcasts
either way).

Ah, right - kill #1 in my list.

2. Those two functions are only equivalent on little-endian platforms.

I was wondering about that. So yes, if we do want to canonicalize (until the
recent compile-time complaints, I always thought this was the objective of
InstCombine...maybe it still is), then the masks we're matching or
generating will differ based on endianness.


---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170119/051ff2e3/attachment.html>

Sanjay Patel via llvm-dev

2017-Jan-21 19:30 UTC

head link

[llvm-dev] IR canonicalization: shufflevector or vector trunc?

On Thu, Jan 19, 2017 at 9:17 AM, Rackover, Zvi <zvi.rackover at intel.com>
wrote:
> Hi Sanjay,
>
>
>
> I agree we should also discuss **if** this canonicalization is beneficial.
>
> For starters, do we have a concrete case where we would benefit from
> canonicalizing shuffles <-> truncates in LLVM IR?
>
> IMO, we should not count benefits for codegen because that alone does not
> justify transforming the IR ; we could always do this on the SelectionDAG.
>
>Agreed. If we're just talking about IR benefits, then it's easy to
demonstrate a win for trunc/zext based on value tracking:

target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" ;
little-endian

define <4 x i32> @shuffle(<4 x i64> %x) {
  %y = shl <4 x i64> %x, <i64 32, i64 32, i64 32, i64 32> ; low half
of
each elt is zero
  %bc = bitcast <4 x i64> %y to <8 x i32> ; even index elements are
all zero
  %trunc = shufflevector <8 x i32> %bc, <8 x i32> undef, <4 x
i32> <i32 0,
i32 2, i32 4, i32 6>
  ret <4 x i32> %trunc
}

define <4 x i32> @trunc(<4 x i64> %x) {
  %y = shl <4 x i64> %x <i64 32, i64 32, i64 32, i64 32> ; low half
of each
elt is zero
  %trunc = trunc <4 x i64> %y to <4 x i32> ; so this must be zero...
  ret <4 x i32> %trunc
}


$ ./opt -instsimplify 31551.ll -S
...
define <4 x i32> @shuffle(<4 x i64> %x) {
  %y = shl <4 x i64> %x, <i64 32, i64 32, i64 32, i64 32>
  %bc = bitcast <4 x i64> %y to <8 x i32>
  %trunc = shufflevector <8 x i32> %bc, <8 x i32> undef, <4 x
i32> <i32 0,
i32 2, i32 4, i32 6>
  ret <4 x i32> %trunc
}

define <4 x i32> @trunc(<4 x i64> %x) {
  ret <4 x i32> zeroinitializer
}

Of course, this is something I invented as an example, but AFAIK we have
better value tracking for trunc/zext than shuffle, so we'll have an easier
time folding the IR if that is possible.


>
>
> --Zvi
>
>
>
> *From:* Sanjay Patel [mailto:spatel at rotateright.com]
> *Sent:* Tuesday, January 17, 2017 18:38
>
> *To:* Rackover, Zvi <zvi.rackover at intel.com>
> *Cc:* Friedman, Eli <efriedma at codeaurora.org>; llvm-dev <
> llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] IR canonicalization: shufflevector or vector
> trunc?
>
>
>
> We use InstCombiner::ShouldChangeType() to prevent transforms to illegal
> integer types, but I'm not sure how that would apply to vector types.
>
> Ie, let's say v256 is a legal type in your example. DataLayout
doesn't
> appear to specify what configurations of a 256-bit vector are legal, so I
> don't think we can currently use that to say v2i128 should be treated
> differently than v16i16.
>
> Is this a valid argument to not canonicalize the IR?
>
>
>
> On Mon, Jan 16, 2017 at 10:16 AM, Rackover, Zvi <zvi.rackover at
intel.com>
> wrote:
>
> Suppose we prefer the ‘trunc’ form, then what about cases such as:
>
> define <2 x i16> @shuffle(<16 x i16> %x) {
>
>   %shuf = shufflevector <16 x i16> %x, <16 x i16> undef, <2
x i32> <i32 0,
> i32 8>
>
>   ret <2 x i16> %shuf
>
> }
>
>
>
> Will the ‘shufflevector’ be canonicalized to a ‘trunc’ of a vector of i128?
>
> define <2 x i16> @trunc(<16 x i16> %x) {
>
>   %bc = bitcast <16 x i16> %x to <2 x i128>
>
>   %tr = trunc <2 x i128> %bc to <2 x i16>
>
>   ret <2 x i16> %tr
>
> }
>
> This may challenge the Legalizer downstream.
>
>
>
> --Zvi
>
>
>
> *From:* Sanjay Patel [mailto:spatel at rotateright.com]
> *Sent:* Friday, January 13, 2017 18:19
> *To:* Rackover, Zvi <zvi.rackover at intel.com>
> *Cc:* Friedman, Eli <efriedma at codeaurora.org>; llvm-dev <
> llvm-dev at lists.llvm.org>
>
>
> *Subject:* Re: [llvm-dev] IR canonicalization: shufflevector or vector
> trunc?
>
>
>
> Right - I think that case looks like this for little endian:
>
> define <2 x i32> @zextshuffle(<2 x i16> %x) {
>   %zext_shuffle = shufflevector <2 x i16> %x, <2 x i16>
zeroinitializer,
> <4 x i32> <i32 0, i32 2, i32 1, i32 2>
>   %bc = bitcast <4 x i16> %zext_shuffle to <2 x i32>
>   ret <2 x i32> %bc
> }
>
> define <2 x i32> @zextvec(<2 x i16> %x) {
>   %zext = zext <2 x i16> %x to <2 x i32>
>   ret <2 x i32> %zext
> }
>
> IMO, the fact that we have to take endianness into account with the
> shuffles makes the trunc/zext forms the better choice. That way, we limit
> the endian dependency to one place in InstCombine, and other transforms
> don't have to worry about it. We also have lots of existing folds for
> trunc/zext and hardly any for shuffles.
>
>
>
>
>
> On Thu, Jan 12, 2017 at 1:14 PM, Rackover, Zvi <zvi.rackover at
intel.com>
> wrote:
>
> Just to add, there is also the ‘zext’ – ‘shuffle with zero’ duality which
> can broaden the discussion.
>
>
>
> --Zvi
>
>
>
> *From:* Sanjay Patel [mailto:spatel at rotateright.com]
> *Sent:* Thursday, January 12, 2017 20:19
> *To:* Friedman, Eli <efriedma at codeaurora.org>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; Rackover, Zvi <
> zvi.rackover at intel.com>
> *Subject:* Re: [llvm-dev] IR canonicalization: shufflevector or vector
> trunc?
>
>
>
>
>
>
>
> On Thu, Jan 12, 2017 at 11:06 AM, Friedman, Eli <efriedma at
codeaurora.org>
> wrote:
>
> On 1/12/2017 9:04 AM, Sanjay Patel via llvm-dev wrote:
>
> It's time for another round of "What is the canonical IR?"
>
> Credit for this episode to Zvi and PR31551. :)
> https://llvm.org/bugs/show_bug.cgi?id=31551
>
> define <4 x i16> @shuffle(<16 x i16> %x) {
>
>   %shuf = shufflevector <16 x i16> %x, <16 x i16> undef, <4
x i32> <i32 0, i32 4, i32 8, i32 12>
>
>   ret <4 x i16> %shuf
>
> }
>
>
>
> define <4 x i16> @trunc(<16 x i16> %x) {
>
>   %bc = bitcast <16 x i16> %x to <4 x i64>
>
>   %tr = trunc <4 x i64> %bc to <4 x i16>
>
>   ret <4 x i16> %tr
>
> }
>
>
>
> Potential reasons to prefer one or the other:
> 1. Shuffle is the most compact.
> 2. Trunc is easier to read.
> 3. One of these is easier for value tracking.
> 4. Compatibility with existing IR transforms (eg, InterleavedAccess
> recognizes the shuffle form).
>
> 5. We don't create arbitrary shuffle masks in IR because that's bad
for a
> lot of targets (but maybe this mask pattern should always be recognized as
> special?).
>
>
>
> Hmm... not sure what the right answer is, but a couple more observations:
> 1. If we're going to canonicalize, we should probably canonicalize the
> same way independent of the original argument type (so we would introduce
> bitcasts either way).
>
>
>
> Ah, right - kill #1 in my list.
>
>
>
> 2. Those two functions are only equivalent on little-endian platforms.
>
>
>
> I was wondering about that. So yes, if we do want to canonicalize (until
> the recent compile-time complaints, I always thought this was the objective
> of InstCombine...maybe it still is), then the masks we're matching or
> generating will differ based on endianness.
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170121/15c7abf2/attachment-0001.html>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Jan 2017 - IR canonicalization: shufflevector or vector trunc?

[llvm-dev] IR canonicalization: shufflevector or vector trunc?

[llvm-dev] IR canonicalization: shufflevector or vector trunc?

[llvm-dev] IR canonicalization: shufflevector or vector trunc?

Reasonably Related Threads