In order to vectorize code like this LLVM needs to prove that “A[i*7]” does not wrap in the address space. It fails to do so and so LLVM doesn’t vectorize this loop even if we try to force it. The following loop will be vectorized if we force it: int foo(int * A, int * B, int n, int k) { for (int i = 0; i < 1024; ++i) A[i] += B[i*k]; } So will this loop: int foo(int * restrict A, int * restrict B, int n, int k) { for (int i = 0; i < n; ++i) A[i] += B[i*k]; } I will update the example. Thanks, Arnold On Mar 12, 2014, at 1:54 PM, Nadav Rotem <nrotem at apple.com> wrote:> Hi Zinovy, > > The loop vectorizer probably decided that it was not profitable to vectorize the function. You can force the vectorization of the function by setting a low threshold. > > Thanks, > Nadav > > On Mar 12, 2014, at 3:34 AM, Zinovy Nis <zinovy.nis at gmail.com> wrote: > >> Hi, >> >> I'm reading "http://llvm.org/docs/Vectorizers.html" and have few question. Hope someone has answers on it. >> >> >> The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions that scatter/gathers memory. (http://llvm.org/docs/Vectorizers.html#scatter-gather) >> >> int foo(int *A, int *B, int n, int k) { >> for (int i = 0; i < n; ++i) >> A[i*7] += B[i*k]; >> } >> >> I replaced "int *A"/"int *B" into "double *A"/"double *B" and then compiled the sample with >> >> $> ./clang -Ofast -ffast-math test.c -std=c99 -march=core-avx2 -S -o bb.S -fslp-vectorize-aggressive >> >> and loop body looks like: >> >> .LBB1_2: # %for.body >> # =>This Inner Loop Header: Depth=1 >> cltq >> vmovsd (%rsi,%rax,8), %xmm0 >> movq %r9, %r10 >> sarq $32, %r10 >> vaddsd (%rdi,%r10,8), %xmm0, %xmm0 >> vmovsd %xmm0, (%rdi,%r10,8) >> addq %r8, %r9 >> addl %ecx, %eax >> decl %edx >> jne .LBB1_2 >> >> so vector instructions for scalars (vaddsd, vmovsd) were used in the loop and no real gather/scatter emitted. >> >> The question is why this loop was not vectorized? Typo in docs? >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Wed, Mar 12, 2014 at 3:50 PM, Arnold Schwaighofer < aschwaighofer at apple.com> wrote:> In order to vectorize code like this LLVM needs to prove that “A[i*7]” > does not wrap in the address space. It fails to do soBut, why? I'm moderately sure that neither C nor C++ allow wrapping around the end of the address space. If they do, we will fix C++ at least to disallow this. 'i' is a signed integer, so we can't wrap in the index space either. So why can't LLVM prove this? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140312/1dd9a485/attachment.html>
On Mar 12, 2014, at 4:05 PM, Chandler Carruth <chandlerc at google.com> wrote:> > On Wed, Mar 12, 2014 at 3:50 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote: > In order to vectorize code like this LLVM needs to prove that “A[i*7]” does not wrap in the address space. It fails to do so > > But, why? > > I'm moderately sure that neither C nor C++ allow wrapping around the end of the address space. If they do, we will fix C++ at least to disallow this. 'i' is a signed integer, so we can't wrap in the index space either. So why can't LLVM prove this?The loop vectorizer relies on scev’s nowrap flags. We need to improve SCEV for this. %conv = sext i32 %k to i64 --> (sext i32 %k to i64) %i.06 = phi i64 [ 0, %entry ], [ %inc, %for.body ] --> {0,+,1}<nuw><nsw><%for.body> Exits: 1023 %mul1 = mul nsw i64 %i.06, 7 --> {0,+,7}<%for.body> Exits: 7161 %arrayidx2 = getelementptr inbounds i32* %A, i64 %mul1 --> {%A,+,28}<%for.body> <== we want to see a nw flag here. Scev sometimes drops new flags for safety (cannonicalization can make them invalid if the same expression is used in different contexts) . See past discussions on this. We are thinking about doing something like described here: http://permalink.gmane.org/gmane.comp.compilers.llvm.devel/67476 or in this thread:(http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20131007/190703.html.
Zinovy, to clarify: the code is vectorizable. But LLVM currently fails to prove it is. On Mar 12, 2014, at 3:50 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:> In order to vectorize code like this LLVM needs to prove that “A[i*7]” does not wrap in the address space. It fails to do so and so LLVM doesn’t vectorize this loop even if we try to force it. > > The following loop will be vectorized if we force it: > > int foo(int * A, int * B, int n, int k) { > for (int i = 0; i < 1024; ++i) > A[i] += B[i*k]; > } > > So will this loop: > > int foo(int * restrict A, int * restrict B, int n, int k) { > for (int i = 0; i < n; ++i) > A[i] += B[i*k]; > } > > I will update the example. > > Thanks, > Arnold > > On Mar 12, 2014, at 1:54 PM, Nadav Rotem <nrotem at apple.com> wrote: > >> Hi Zinovy, >> >> The loop vectorizer probably decided that it was not profitable to vectorize the function. You can force the vectorization of the function by setting a low threshold. >> >> Thanks, >> Nadav >> >> On Mar 12, 2014, at 3:34 AM, Zinovy Nis <zinovy.nis at gmail.com> wrote: >> >>> Hi, >>> >>> I'm reading "http://llvm.org/docs/Vectorizers.html" and have few question. Hope someone has answers on it. >>> >>> >>> The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions that scatter/gathers memory. (http://llvm.org/docs/Vectorizers.html#scatter-gather) >>> >>> int foo(int *A, int *B, int n, int k) { >>> for (int i = 0; i < n; ++i) >>> A[i*7] += B[i*k]; >>> } >>> >>> I replaced "int *A"/"int *B" into "double *A"/"double *B" and then compiled the sample with >>> >>> $> ./clang -Ofast -ffast-math test.c -std=c99 -march=core-avx2 -S -o bb.S -fslp-vectorize-aggressive >>> >>> and loop body looks like: >>> >>> .LBB1_2: # %for.body >>> # =>This Inner Loop Header: Depth=1 >>> cltq >>> vmovsd (%rsi,%rax,8), %xmm0 >>> movq %r9, %r10 >>> sarq $32, %r10 >>> vaddsd (%rdi,%r10,8), %xmm0, %xmm0 >>> vmovsd %xmm0, (%rdi,%r10,8) >>> addq %r8, %r9 >>> addl %ecx, %eax >>> decl %edx >>> jne .LBB1_2 >>> >>> so vector instructions for scalars (vaddsd, vmovsd) were used in the loop and no real gather/scatter emitted. >>> >>> The question is why this loop was not vectorized? Typo in docs? >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Even without wrapping around the end of the address space, without restrict you still have to worry about A and B overlapping on interesting ways. Will LLVM do some runtime dependence checks to discount such potential overlap? Just curious.... On Wed, Mar 12, 2014 at 5:01 PM, Arnold Schwaighofer < aschwaighofer at apple.com> wrote:> Zinovy, > > to clarify: the code is vectorizable. But LLVM currently fails to prove it > is. > > On Mar 12, 2014, at 3:50 PM, Arnold Schwaighofer <aschwaighofer at apple.com> > wrote: > > > In order to vectorize code like this LLVM needs to prove that “A[i*7]” > does not wrap in the address space. It fails to do so and so LLVM doesn’t > vectorize this loop even if we try to force it. > > > > The following loop will be vectorized if we force it: > > > > int foo(int * A, int * B, int n, int k) { > > for (int i = 0; i < 1024; ++i) > > A[i] += B[i*k]; > > } > > > > So will this loop: > > > > int foo(int * restrict A, int * restrict B, int n, int k) { > > for (int i = 0; i < n; ++i) > > A[i] += B[i*k]; > > } > > > > I will update the example. > > > > Thanks, > > Arnold > > > > On Mar 12, 2014, at 1:54 PM, Nadav Rotem <nrotem at apple.com> wrote: > > > >> Hi Zinovy, > >> > >> The loop vectorizer probably decided that it was not profitable to > vectorize the function. You can force the vectorization of the function by > setting a low threshold. > >> > >> Thanks, > >> Nadav > >> > >> On Mar 12, 2014, at 3:34 AM, Zinovy Nis <zinovy.nis at gmail.com> wrote: > >> > >>> Hi, > >>> > >>> I'm reading "http://llvm.org/docs/Vectorizers.html" and have few > question. Hope someone has answers on it. > >>> > >>> > >>> The Loop Vectorizer can vectorize code that becomes a sequence of > scalar instructions that scatter/gathers memory. ( > http://llvm.org/docs/Vectorizers.html#scatter-gather) > >>> > >>> int foo(int *A, int *B, int n, int k) { > >>> for (int i = 0; i < n; ++i) > >>> A[i*7] += B[i*k]; > >>> } > >>> > >>> I replaced "int *A"/"int *B" into "double *A"/"double *B" and then > compiled the sample with > >>> > >>> $> ./clang -Ofast -ffast-math test.c -std=c99 -march=core-avx2 -S -o > bb.S -fslp-vectorize-aggressive > >>> > >>> and loop body looks like: > >>> > >>> .LBB1_2: # %for.body > >>> # =>This Inner Loop Header: > Depth=1 > >>> cltq > >>> vmovsd (%rsi,%rax,8), %xmm0 > >>> movq %r9, %r10 > >>> sarq $32, %r10 > >>> vaddsd (%rdi,%r10,8), %xmm0, %xmm0 > >>> vmovsd %xmm0, (%rdi,%r10,8) > >>> addq %r8, %r9 > >>> addl %ecx, %eax > >>> decl %edx > >>> jne .LBB1_2 > >>> > >>> so vector instructions for scalars (vaddsd, vmovsd) were used in the > loop and no real gather/scatter emitted. > >>> > >>> The question is why this loop was not vectorized? Typo in docs? > >>> > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Raúl E. Silvera -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140312/86527c55/attachment.html>