Hi, I'm reading "http://llvm.org/docs/Vectorizers.html" and have few question. Hope someone has answers on it. The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions that scatter/gathers memory. ( http://llvm.org/docs/Vectorizers.html#scatter-gather) int foo(int *A, int *B, int n, int k) { for (int i = 0; i < n; ++i) A[i*7] += B[i*k]; } I replaced "int *A"/"int *B" into "double *A"/"double *B" and then compiled the sample with $> ./clang -Ofast -ffast-math test.c -std=c99 -march=core-avx2 -S -o bb.S -fslp-vectorize-aggressive and loop body looks like: .LBB1_2: # %for.body # =>This Inner Loop Header: Depth=1 cltq vmovsd (%rsi,%rax,8), %xmm0 movq %r9, %r10 sarq $32, %r10 vaddsd (%rdi,%r10,8), %xmm0, %xmm0 vmovsd %xmm0, (%rdi,%r10,8) addq %r8, %r9 addl %ecx, %eax decl %edx jne .LBB1_2 so vector instructions for scalars (vaddsd, vmovsd) were used in the loop and no real gather/scatter emitted. The question is why this loop was not vectorized? Typo in docs? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140312/d8bec5b2/attachment.html>
Hi Zinovy, The loop vectorizer probably decided that it was not profitable to vectorize the function. You can force the vectorization of the function by setting a low threshold. Thanks, Nadav On Mar 12, 2014, at 3:34 AM, Zinovy Nis <zinovy.nis at gmail.com> wrote:> Hi, > > I'm reading "http://llvm.org/docs/Vectorizers.html" and have few question. Hope someone has answers on it. > > > The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions that scatter/gathers memory. (http://llvm.org/docs/Vectorizers.html#scatter-gather) > > int foo(int *A, int *B, int n, int k) { > for (int i = 0; i < n; ++i) > A[i*7] += B[i*k]; > } > > I replaced "int *A"/"int *B" into "double *A"/"double *B" and then compiled the sample with > > $> ./clang -Ofast -ffast-math test.c -std=c99 -march=core-avx2 -S -o bb.S -fslp-vectorize-aggressive > > and loop body looks like: > > .LBB1_2: # %for.body > # =>This Inner Loop Header: Depth=1 > cltq > vmovsd (%rsi,%rax,8), %xmm0 > movq %r9, %r10 > sarq $32, %r10 > vaddsd (%rdi,%r10,8), %xmm0, %xmm0 > vmovsd %xmm0, (%rdi,%r10,8) > addq %r8, %r9 > addl %ecx, %eax > decl %edx > jne .LBB1_2 > > so vector instructions for scalars (vaddsd, vmovsd) were used in the loop and no real gather/scatter emitted. > > The question is why this loop was not vectorized? Typo in docs? > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140312/69e0a8d8/attachment.html>
In order to vectorize code like this LLVM needs to prove that “A[i*7]” does not wrap in the address space. It fails to do so and so LLVM doesn’t vectorize this loop even if we try to force it. The following loop will be vectorized if we force it: int foo(int * A, int * B, int n, int k) { for (int i = 0; i < 1024; ++i) A[i] += B[i*k]; } So will this loop: int foo(int * restrict A, int * restrict B, int n, int k) { for (int i = 0; i < n; ++i) A[i] += B[i*k]; } I will update the example. Thanks, Arnold On Mar 12, 2014, at 1:54 PM, Nadav Rotem <nrotem at apple.com> wrote:> Hi Zinovy, > > The loop vectorizer probably decided that it was not profitable to vectorize the function. You can force the vectorization of the function by setting a low threshold. > > Thanks, > Nadav > > On Mar 12, 2014, at 3:34 AM, Zinovy Nis <zinovy.nis at gmail.com> wrote: > >> Hi, >> >> I'm reading "http://llvm.org/docs/Vectorizers.html" and have few question. Hope someone has answers on it. >> >> >> The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions that scatter/gathers memory. (http://llvm.org/docs/Vectorizers.html#scatter-gather) >> >> int foo(int *A, int *B, int n, int k) { >> for (int i = 0; i < n; ++i) >> A[i*7] += B[i*k]; >> } >> >> I replaced "int *A"/"int *B" into "double *A"/"double *B" and then compiled the sample with >> >> $> ./clang -Ofast -ffast-math test.c -std=c99 -march=core-avx2 -S -o bb.S -fslp-vectorize-aggressive >> >> and loop body looks like: >> >> .LBB1_2: # %for.body >> # =>This Inner Loop Header: Depth=1 >> cltq >> vmovsd (%rsi,%rax,8), %xmm0 >> movq %r9, %r10 >> sarq $32, %r10 >> vaddsd (%rdi,%r10,8), %xmm0, %xmm0 >> vmovsd %xmm0, (%rdi,%r10,8) >> addq %r8, %r9 >> addl %ecx, %eax >> decl %edx >> jne .LBB1_2 >> >> so vector instructions for scalars (vaddsd, vmovsd) were used in the loop and no real gather/scatter emitted. >> >> The question is why this loop was not vectorized? Typo in docs? >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev