thr3ads.net - llvm dev - [llvm-dev] Vector Shuffle chain lowering to X86 instructions simplification inconsistencies [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Charith Mendis via llvm-dev

2016-Oct-28 21:25 UTC

[llvm-dev] Vector Shuffle chain lowering to X86 instructions simplification inconsistencies

Hi all,

Attached herewith is a fairly simple LLVM file (shuffle.ll) with lots of
vector shuffles.

When I use llc with -O3 -mcpu=core-avx2 the first shuffle sequence
containing types of 128 wide gets reduced a single shuffle, where as the
second shuffle sequence containing types of 256 wide doesn't get reduced to
a single shuffle instruction in the resulting X86 code (Shuffle.s attached).

The second sequence is identical to first and is a rewidening of the
sequence for a higher vector length.

Can this be explained and where in the machine lowering passes does this
simplification happen?

Thanks

-- 
Kind regards,
Charith Mendis

Graduate Student,
CSAIL,
Massachusetts Institute of Technology
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161028/fb2f054e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: shuffle.ll
Type: application/octet-stream
Size: 12448 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161028/fb2f054e/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: shuffle.s
Type: application/octet-stream
Size: 8733 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161028/fb2f054e/attachment-0001.obj>

Rackover, Zvi via llvm-dev

2016-Oct-29 07:17 UTC

head link

[llvm-dev] Vector Shuffle chain lowering to X86 instructions simplification inconsistencies

Hi Charith,

After taking a quick look it seems we could do better for the 256-bit shuffles.
Can you please open a bug report (https://llvm.org/bugs, product=libraries,
component=backend: X86) for this? It would be helpful if you minimized
shuffle.ll to say two functions. One function will perform the 128-bit shuffles
and 256-bit shuffles in the second.

Thanks, Zvi

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Charith
Mendis via llvm-dev
Sent: Saturday, October 29, 2016 00:26
To: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Vector Shuffle chain lowering to X86 instructions
simplification inconsistencies

Hi all,

Attached herewith is a fairly simple LLVM file (shuffle.ll) with lots of vector
shuffles.

When I use llc with -O3 -mcpu=core-avx2 the first shuffle sequence containing
types of 128 wide gets reduced a single shuffle, where as the second shuffle
sequence containing types of 256 wide doesn't get reduced to a single
shuffle instruction in the resulting X86 code (Shuffle.s attached).

The second sequence is identical to first and is a rewidening of the sequence
for a higher vector length.

Can this be explained and where in the machine lowering passes does this
simplification happen?

Thanks

--
Kind regards,
Charith Mendis

Graduate Student,
CSAIL,
Massachusetts Institute of Technology
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161029/cdf6fe2b/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Oct 2016 - Vector Shuffle chain lowering to X86 instructions simplification inconsistencies

[llvm-dev] Vector Shuffle chain lowering to X86 instructions simplification inconsistencies

[llvm-dev] Vector Shuffle chain lowering to X86 instructions simplification inconsistencies

Possibly Parallel Threads