thr3ads.net - search: "pextrq"

Displaying 4 results from an estimated 4 matches for "pextrq".

Did you mean: extrq

enabling interleaved access loop vectorization

2016 Aug 05

enabling interleaved access loop vectorization

...# =>This Inner Loop Header: Depth=1 movdqu (%rdi,%rax,4), %xmm3 movd %xmm0, %rcx movdqu 4(%rdi,%rcx,4), %xmm4 paddd %xmm3, %xmm4 movdqu 8(%rdi,%rcx,4), %xmm3 paddd %xmm4, %xmm3 movdqa %xmm1, %xmm4 paddq %xmm4, %xmm4 movdqa %xmm0, %xmm5 paddq %xmm5, %xmm5 movd %xmm5, %rcx pextrq $1, %xmm5, %rdx movd %xmm4, %r8 pextrq $1, %xmm4, %r9 movd (%rdi,%rcx,4), %xmm4 # xmm4 = mem[0],zero,zero,zero pinsrd $1, (%rdi,%rdx,4), %xmm4 pinsrd $2, (%rdi,%r8,4), %xmm4 pinsrd $3, (%rdi,%r9,4), %xmm4 paddd %xmm3, %xmm4 movdqu %xmm4, (%rsi,%rax,4) addq $4, %rax paddq %xmm2, %xmm0 paddq %xmm2...

enabling interleaved access loop vectorization

2016 May 26

enabling interleaved access loop vectorization

Interleaved access is not enabled on X86 yet. We looked at this feature and got into conclusion that interleaving (as loads + shuffles) is not always profitable on X86. We should provide the right cost which depends on number of shuffles. Number of shuffles depends on permutations (shuffle mask). And even if we estimate the number of shuffles, the shuffles are not generated in-place. Vectorizer

enabling interleaved access loop vectorization

2016 Aug 05

enabling interleaved access loop vectorization

...m0, %rcx > > movdqu 4(%rdi,%rcx,4), %xmm4 > > paddd %xmm3, %xmm4 > > movdqu 8(%rdi,%rcx,4), %xmm3 > > paddd %xmm4, %xmm3 > > movdqa %xmm1, %xmm4 > > paddq %xmm4, %xmm4 > > movdqa %xmm0, %xmm5 > > paddq %xmm5, %xmm5 > > movd %xmm5, %rcx > > pextrq $1, %xmm5, %rdx > > movd %xmm4, %r8 > > pextrq $1, %xmm4, %r9 > > movd (%rdi,%rcx,4), %xmm4 # xmm4 = mem[0],zero,zero,zero > > pinsrd $1, (%rdi,%rdx,4), %xmm4 > > pinsrd $2, (%rdi,%r8,4), %xmm4 > > pinsrd $3, (%rdi,%r9,4), %xmm4 > > paddd %xmm3, %xmm4 &g...

enabling interleaved access loop vectorization

2016 May 26

enabling interleaved access loop vectorization

On 26 May 2016 at 19:12, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Is there a compile-time and/or potential runtime cost that makes > enableInterleavedAccessVectorization() default to 'false'? > > I notice that this is set to true for ARM, AArch64, and PPC. > > In particular, I'm wondering if there's a reason it's not enabled for

search for: pextrq