thr3ads.net - llvm dev - [llvm-dev] enabling interleaved access loop vectorization [May 2016]

If this information is useful, please help other people find it:
Share via:

Sanjay Patel via llvm-dev

2016-May-26 18:12 UTC

[llvm-dev] enabling interleaved access loop vectorization

Is there a compile-time and/or potential runtime cost that makes
enableInterleavedAccessVectorization() default to 'false'?

I notice that this is set to true for ARM, AArch64, and PPC.

In particular, I'm wondering if there's a reason it's not enabled
for x86
in relation to PR27881:
https://llvm.org/bugs/show_bug.cgi?id=27881
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160526/4b4ba2cc/attachment.html>

Renato Golin via llvm-dev

2016-May-26 18:25 UTC

head link

[llvm-dev] enabling interleaved access loop vectorization

On 26 May 2016 at 19:12, Sanjay Patel via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Is there a compile-time and/or potential runtime cost that makes
> enableInterleavedAccessVectorization() default to 'false'?
>
> I notice that this is set to true for ARM, AArch64, and PPC.
>
> In particular, I'm wondering if there's a reason it's not
enabled for x86 in
> relation to PR27881:
> https://llvm.org/bugs/show_bug.cgi?id=27881
Hi Sanjay,

The feature was originally developed for ARM's VLDn/VSTn instructions
and then extended to AArch64 and PPC, but not x86/64 yet.

I believe Elena was working on that, but needed to get the
scatter/gather intrinsics working first. I just copied her in case I'm
wrong. :)

cheers,
--renato

Demikhovsky, Elena via llvm-dev

2016-May-26 19:35 UTC

head link

[llvm-dev] enabling interleaved access loop vectorization

Interleaved access is not enabled on X86 yet.
We looked at this feature and got into conclusion that interleaving (as loads +
shuffles) is not always profitable on X86. We should provide the right cost
which depends on number of shuffles. Number of shuffles depends on permutations
(shuffle mask). And even if we estimate the number of shuffles, the shuffles are
not generated in-place. Vectorizer produces a long queue of "extracts"
and "inserts" that hopefully will be coupled into shuffles on a later
instcombine pass.

-  Elena

   >-----Original Message-----
   >From: Renato Golin [mailto:renato.golin at linaro.org]
   >Sent: Thursday, May 26, 2016 21:25
   >To: Sanjay Patel <spatel at rotateright.com>; Demikhovsky, Elena
   ><elena.demikhovsky at intel.com>
   >Cc: llvm-dev <llvm-dev at lists.llvm.org>
   >Subject: Re: [llvm-dev] enabling interleaved access loop vectorization
   >
   >On 26 May 2016 at 19:12, Sanjay Patel via llvm-dev <llvm-
   >dev at lists.llvm.org> wrote:
   >> Is there a compile-time and/or potential runtime cost that makes
   >> enableInterleavedAccessVectorization() default to 'false'?
   >>
   >> I notice that this is set to true for ARM, AArch64, and PPC.
   >>
   >> In particular, I'm wondering if there's a reason it's
not enabled for
   >> x86 in relation to PR27881:
   >> https://llvm.org/bugs/show_bug.cgi?id=27881
   >
   >Hi Sanjay,
   >
   >The feature was originally developed for ARM's VLDn/VSTn instructions
   >and then extended to AArch64 and PPC, but not x86/64 yet.
   >
   >I believe Elena was working on that, but needed to get the scatter/gather
   >intrinsics working first. I just copied her in case I'm wrong. :)
   >
   >cheers,
   >--renato
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - May 2016 - enabling interleaved access loop vectorization

[llvm-dev] enabling interleaved access loop vectorization

[llvm-dev] enabling interleaved access loop vectorization

[llvm-dev] enabling interleaved access loop vectorization

Possibly Parallel Threads