Sanjay Patel via llvm-dev
2016-May-26 18:12 UTC
[llvm-dev] enabling interleaved access loop vectorization
Is there a compile-time and/or potential runtime cost that makes enableInterleavedAccessVectorization() default to 'false'? I notice that this is set to true for ARM, AArch64, and PPC. In particular, I'm wondering if there's a reason it's not enabled for x86 in relation to PR27881: https://llvm.org/bugs/show_bug.cgi?id=27881 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160526/4b4ba2cc/attachment.html>
Renato Golin via llvm-dev
2016-May-26 18:25 UTC
[llvm-dev] enabling interleaved access loop vectorization
On 26 May 2016 at 19:12, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Is there a compile-time and/or potential runtime cost that makes > enableInterleavedAccessVectorization() default to 'false'? > > I notice that this is set to true for ARM, AArch64, and PPC. > > In particular, I'm wondering if there's a reason it's not enabled for x86 in > relation to PR27881: > https://llvm.org/bugs/show_bug.cgi?id=27881Hi Sanjay, The feature was originally developed for ARM's VLDn/VSTn instructions and then extended to AArch64 and PPC, but not x86/64 yet. I believe Elena was working on that, but needed to get the scatter/gather intrinsics working first. I just copied her in case I'm wrong. :) cheers, --renato
Demikhovsky, Elena via llvm-dev
2016-May-26 19:35 UTC
[llvm-dev] enabling interleaved access loop vectorization
Interleaved access is not enabled on X86 yet. We looked at this feature and got into conclusion that interleaving (as loads + shuffles) is not always profitable on X86. We should provide the right cost which depends on number of shuffles. Number of shuffles depends on permutations (shuffle mask). And even if we estimate the number of shuffles, the shuffles are not generated in-place. Vectorizer produces a long queue of "extracts" and "inserts" that hopefully will be coupled into shuffles on a later instcombine pass. - Elena >-----Original Message----- >From: Renato Golin [mailto:renato.golin at linaro.org] >Sent: Thursday, May 26, 2016 21:25 >To: Sanjay Patel <spatel at rotateright.com>; Demikhovsky, Elena ><elena.demikhovsky at intel.com> >Cc: llvm-dev <llvm-dev at lists.llvm.org> >Subject: Re: [llvm-dev] enabling interleaved access loop vectorization > >On 26 May 2016 at 19:12, Sanjay Patel via llvm-dev <llvm- >dev at lists.llvm.org> wrote: >> Is there a compile-time and/or potential runtime cost that makes >> enableInterleavedAccessVectorization() default to 'false'? >> >> I notice that this is set to true for ARM, AArch64, and PPC. >> >> In particular, I'm wondering if there's a reason it's not enabled for >> x86 in relation to PR27881: >> https://llvm.org/bugs/show_bug.cgi?id=27881 > >Hi Sanjay, > >The feature was originally developed for ARM's VLDn/VSTn instructions >and then extended to AArch64 and PPC, but not x86/64 yet. > >I believe Elena was working on that, but needed to get the scatter/gather >intrinsics working first. I just copied her in case I'm wrong. :) > >cheers, >--renato --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.