thr3ads.net - llvm dev - [llvm-dev] InstructionSimplify: adding a hook for shufflevector instructions [Mar 2017]

If this information is useful, please help other people find it:
Share via:

Rackover, Zvi via llvm-dev

2017-Mar-30 02:37 UTC

[llvm-dev] InstructionSimplify: adding a hook for shufflevector instructions

As Sanjay noted in D31426<https://reviews.llvm.org/D31426#712701>,
InstructionSimplify is missing the following simplification:

This function:
define <4 x i32> @splat_operand(<4 x i32> %x) {
   %splat = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x
i32> zeroinitializer
   %shuf = shufflevector <4 x i32> %splat, <4 x i32> undef, <4 x
i32> <i32 0, i32 3, i32 2, i32 1>
   ret <4 x i32> %shuf
}

can be simplified to:
define <4 x i32> @splat_operand(<4 x i32> %x) {
  %shuf = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x
i32> zeroinitializer
  ret <4 x i32> %shuf
}

InstCombine covers this case inefficiently.

I noticed that InstructionSimplify does not do any simplifications for
shufflevector's other than constant folding. I just wanted to be sure there
is no compelling reason for this before I start streaming patches. I assume that
this is not related to our conservative approach of refraining from creation of
new shuffle masks that may hurt some target.

Here are some more opportunities that can be added to InstructionSimplify, all
of which are covered by InstCombine:

define <4 x i32> @undef_mask(<4 x i32> %x) {
   %shuf = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x
i32> undef
   ret <4 x i32> %shuf
}
-->
define <4 x i32> @undef_mask(<4 x i32> %x) {
  ret <4 x i32> undef
}

define <4 x i32> @identity_mask_0(<4 x i32> %x) {
   %shuf = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x
i32> <i32 0, i32 1, i32 2, i32 3>
   ret <4 x i32> %shuf
}
-->
define <4 x i32> @identity_mask_0(<4 x i32> %x) {
  ret <4 x i32> %x
}

define <4 x i32> @identity_mask_1(<4 x i32> %x) {
   %shuf = shufflevector <4 x i32> undef, <4 x i32> %x, <4 x
i32> <i32 4, i32 5, i32 6, i32 7>
   ret <4 x i32> %shuf
}
-->
define <4 x i32> @identity_mask_1(<4 x i32> %x) {
  ret <4 x i32> %x
}

define <4 x i32> @pseudo_identity_mask(<4 x i32> %x) {
   %shuf = shufflevector <4 x i32> %x, <4 x i32> %x, <4 x i32>
<i32 0, i32 1, i32 2, i32 7>
   ret <4 x i32> %shuf
}
-->
define <4 x i32> @pseudo_identity_mask(<4 x i32> %x) {
  ret <4 x i32> %x
}

define <4 x i32> @const_operand(<4 x i32> %x) {
   %shuf = shufflevector <4 x i32> <i32 42, i32 43, i32 44, i32 45>,
<4 x i32> %x, <4 x i32> <i32 0, i32 3, i32 2, i32 1>
   ret <4 x i32> %shuf
}
-->
define <4 x i32> @const_operand(<4 x i32> %x) {
  ret <4 x i32> <i32 42, i32 45, i32 44, i32 43>
}

define <4 x i32> @merge(<4 x i32> %x) {
   %lower = shufflevector <4 x i32> %x, <4 x i32> undef, <2 x
i32> <i32 1, i32 0>
   %upper = shufflevector <4 x i32> %x, <4 x i32> undef, <2 x
i32> <i32 2, i32 3>
   %merged = shufflevector <2 x i32> %upper, <2 x i32> %lower, <4
x i32> <i32 3, i32 2, i32 0, i32 1>
   ret <4 x i32> %merged
}
-->
define <4 x i32> @merge(<4 x i32> %x) {
  ret <4 x i32> %x
}

Would appreciate your comments and feedback.

Thanks, Zvi

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170330/9faa9bc6/attachment.html>

Sanjay Patel via llvm-dev

2017-Mar-30 15:30 UTC

head link

[llvm-dev] InstructionSimplify: adding a hook for shufflevector instructions

My grasp of LLVM history isn't great, but I think these are missing because
there wasn't much need for vector optimization in IR because there just
weren't that many vector opportunities in IR. Ie, the vectorizers are
relatively new, and hand-written vector code (eg, SSE intrinsics in source)
generally went straight to the backend as target-specific IR intrinsics.

Now that we're vectorizing more aggressively (and plan to do even more) and
we're converting target-specific vector source to generic vector IR
whenever possible, it makes sense to add these kinds of optimizations.

One frequently visible sign of scalar privilege in instcombine is the use
of "m_ConstantInt". In many cases, this can be converted to
"m_APInt"
without much effort, and the transform will auto-magically apply to splat
vector constants too.



On Wed, Mar 29, 2017 at 8:37 PM, Rackover, Zvi <zvi.rackover at intel.com>
wrote:
> As Sanjay noted in D31426 <https://reviews.llvm.org/D31426#712701>,
> InstructionSimplify is missing the following simplification:
>
>
>
> This function:
>
> define <4 x i32> @splat_operand(<4 x i32> %x) {
>
>    %splat = shufflevector <4 x i32> %x, <4 x i32> undef, <4
x i32>
> zeroinitializer
>
>    %shuf = shufflevector <4 x i32> %splat, <4 x i32> undef,
<4 x i32> <i32
> 0, i32 3, i32 2, i32 1>
>
>    ret <4 x i32> %shuf
>
> }
>
>
>
> can be simplified to:
>
> define <4 x i32> @splat_operand(<4 x i32> %x) {
>
>   %shuf = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x
i32>
> zeroinitializer
>
>   ret <4 x i32> %shuf
>
> }
>
>
>
> InstCombine covers this case inefficiently.
>
>
>
> I noticed that InstructionSimplify does not do any simplifications for
> shufflevector’s other than constant folding. I just wanted to be sure there
> is no compelling reason for this before I start streaming patches. I assume
> that this is not related to our conservative approach of refraining from
> creation of new shuffle masks that may hurt some target.
>
>
>
> Here are some more opportunities that can be added to InstructionSimplify,
> all of which are covered by InstCombine:
>
>
>
> define <4 x i32> @undef_mask(<4 x i32> %x) {
>
>    %shuf = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x
i32> undef
>
>    ret <4 x i32> %shuf
>
> }
>
> à
>
> define <4 x i32> @undef_mask(<4 x i32> %x) {
>
>   ret <4 x i32> undef
>
> }
>
>
>
> define <4 x i32> @identity_mask_0(<4 x i32> %x) {
>
>    %shuf = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x
i32> <i32 0,
> i32 1, i32 2, i32 3>
>
>    ret <4 x i32> %shuf
>
> }
>
> à
>
> define <4 x i32> @identity_mask_0(<4 x i32> %x) {
>
>   ret <4 x i32> %x
>
> }
>
>
>
> define <4 x i32> @identity_mask_1(<4 x i32> %x) {
>
>    %shuf = shufflevector <4 x i32> undef, <4 x i32> %x, <4 x
i32> <i32 4,
> i32 5, i32 6, i32 7>
>
>    ret <4 x i32> %shuf
>
> }
>
> à
>
> define <4 x i32> @identity_mask_1(<4 x i32> %x) {
>
>   ret <4 x i32> %x
>
> }
>
>
>
> define <4 x i32> @pseudo_identity_mask(<4 x i32> %x) {
>
>    %shuf = shufflevector <4 x i32> %x, <4 x i32> %x, <4 x
i32> <i32 0, i32
> 1, i32 2, i32 7>
>
>    ret <4 x i32> %shuf
>
> }
>
> à
>
> define <4 x i32> @pseudo_identity_mask(<4 x i32> %x) {
>
>   ret <4 x i32> %x
>
> }
>
>
>
> define <4 x i32> @const_operand(<4 x i32> %x) {
>
>    %shuf = shufflevector <4 x i32> <i32 42, i32 43, i32 44, i32
45>, <4 x
> i32> %x, <4 x i32> <i32 0, i32 3, i32 2, i32 1>
>
>    ret <4 x i32> %shuf
>
> }
>
> à
>
> define <4 x i32> @const_operand(<4 x i32> %x) {
>
>   ret <4 x i32> <i32 42, i32 45, i32 44, i32 43>
>
> }
>
>
>
> define <4 x i32> @merge(<4 x i32> %x) {
>
>    %lower = shufflevector <4 x i32> %x, <4 x i32> undef, <2
x i32> <i32 1,
> i32 0>
>
>    %upper = shufflevector <4 x i32> %x, <4 x i32> undef, <2
x i32> <i32 2,
> i32 3>
>
>    %merged = shufflevector <2 x i32> %upper, <2 x i32> %lower,
<4 x i32>
> <i32 3, i32 2, i32 0, i32 1>
>
>    ret <4 x i32> %merged
>
> }
>
> à
>
> define <4 x i32> @merge(<4 x i32> %x) {
>
>   ret <4 x i32> %x
>
> }
>
>
>
> Would appreciate your comments and feedback.
>
>
>
> Thanks, Zvi
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170330/02747e7c/attachment.html>

Rackover, Zvi via llvm-dev

2017-Mar-30 16:28 UTC

head link

[llvm-dev] InstructionSimplify: adding a hook for shufflevector instructions

Thanks, Sanjay, that makes sense. The opportunity for improving instcombining
splat sounds promising.

Another question about shuffle simplification. This is a testcase from
test/Transforms/InstCombine/vec_shuffle.ll:

define <4 x i32> @test10(<4 x i32> %tmp5) nounwind {
   %tmp6 = shufflevector <4 x i32> %tmp5, <4 x i32> undef, <4 x
i32> <i32 1, i32 undef, i32 undef, i32 undef>
   %tmp7 = shufflevector <4 x i32> %tmp6, <4 x i32> undef, <4 x
i32> zeroinitializer
   ret <4 x i32> %tmp7
}

opt –instcombine will combine to:
define <4 x i32> @test10(<4 x i32> %tmp5) nounwind {
   %tmp7 = shufflevector <4 x i32> %tmp5, <4 x i32> undef, <4 x
i32> <i32 1, i32 1, i32 1, i32 1>
   ret <4 x i32> %tmp7
}

Would it be ok to simplify the original function to the following?
define <4 x i32> @test10(<4 x i32> %tmp5) nounwind {
   %tmp7 = shufflevector <4 x i32> %tmp5, <4 x i32> undef, <4 x
i32> <i32 1, i32 undef, i32 undef, i32 undef>
   ret <4 x i32> %tmp7
}

If the function is required to return a splat value, then I believe the answer
is no, because the undef indices allow returning a value that is not a splat.

Thanks, Zvi
From: Sanjay Patel [mailto:spatel at rotateright.com]
Sent: Thursday, March 30, 2017 18:31
To: Rackover, Zvi <zvi.rackover at intel.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: InstructionSimplify: adding a hook for shufflevector instructions

My grasp of LLVM history isn't great, but I think these are missing because
there wasn't much need for vector optimization in IR because there just
weren't that many vector opportunities in IR. Ie, the vectorizers are
relatively new, and hand-written vector code (eg, SSE intrinsics in source)
generally went straight to the backend as target-specific IR intrinsics.

Now that we're vectorizing more aggressively (and plan to do even more) and
we're converting target-specific vector source to generic vector IR whenever
possible, it makes sense to add these kinds of optimizations.
One frequently visible sign of scalar privilege in instcombine is the use of
"m_ConstantInt". In many cases, this can be converted to
"m_APInt" without much effort, and the transform will auto-magically
apply to splat vector constants too.



---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170330/431a8637/attachment.html>

llvm dev - Mar 2017 - InstructionSimplify: adding a hook for shufflevector instructions

[llvm-dev] InstructionSimplify: adding a hook for shufflevector instructions

[llvm-dev] InstructionSimplify: adding a hook for shufflevector instructions

[llvm-dev] InstructionSimplify: adding a hook for shufflevector instructions