thr3ads.net - llvm dev - [LLVMdev] Autovectorization questions [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Arnold Schwaighofer

2014-Mar-12 22:50 UTC

[LLVMdev] Autovectorization questions

In order to vectorize code like this LLVM needs to prove that “A[i*7]” does not
wrap in the address space. It fails to do so and so LLVM doesn’t vectorize this
loop even if we try to force it.

The following loop will be vectorized if we force it:

int foo(int * A, int * B, int n, int k) {
  for (int i = 0; i < 1024; ++i)
    A[i] += B[i*k];
}

So will this loop:

int foo(int * restrict A, int * restrict B, int n, int k) {
  for (int i = 0; i < n; ++i)
    A[i] += B[i*k];
}

I will update the example.

Thanks,
Arnold

On Mar 12, 2014, at 1:54 PM, Nadav Rotem <nrotem at apple.com> wrote:
> Hi Zinovy, 
> 
> The loop vectorizer probably decided that it was not profitable to
vectorize the function. You can force the vectorization of the function by
setting a low threshold.
> 
> Thanks,
> Nadav
> 
> On Mar 12, 2014, at 3:34 AM, Zinovy Nis <zinovy.nis at gmail.com>
wrote:
> 
>> Hi,
>> 
>> I'm reading "http://llvm.org/docs/Vectorizers.html" and
have few question. Hope someone has answers on it.
>> 
>> 
>> The Loop Vectorizer can vectorize code that becomes a sequence of
scalar instructions that scatter/gathers memory.
(http://llvm.org/docs/Vectorizers.html#scatter-gather)
>> 
>> int foo(int *A, int *B, int n, int k) {
>>   for (int i = 0; i < n; ++i)
>>     A[i*7] += B[i*k];
>> }
>> 
>> I replaced "int *A"/"int *B" into "double
*A"/"double *B" and then compiled the sample with
>> 
>> $> ./clang -Ofast -ffast-math test.c -std=c99 -march=core-avx2 -S -o
bb.S  -fslp-vectorize-aggressive
>> 
>> and loop body looks like:
>> 
>> .LBB1_2:                                # %for.body
>>                                         # =>This Inner Loop Header:
Depth=1
>>         cltq
>>         vmovsd  (%rsi,%rax,8), %xmm0
>>         movq    %r9, %r10
>>         sarq    $32, %r10
>>         vaddsd  (%rdi,%r10,8), %xmm0, %xmm0
>>         vmovsd  %xmm0, (%rdi,%r10,8)
>>         addq    %r8, %r9
>>         addl    %ecx, %eax
>>         decl    %edx
>>         jne     .LBB1_2
>> 
>> so vector instructions for scalars (vaddsd, vmovsd) were used in the
loop and no real gather/scatter emitted.
>> 
>> The question is why this loop was not vectorized? Typo in docs?
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Chandler Carruth

2014-Mar-12 23:05 UTC

head link

[LLVMdev] Autovectorization questions

On Wed, Mar 12, 2014 at 3:50 PM, Arnold Schwaighofer <
aschwaighofer at apple.com> wrote:
> In order to vectorize code like this LLVM needs to prove that “A[i*7]”
> does not wrap in the address space. It fails to do so

But, why?

I'm moderately sure that neither C nor C++ allow wrapping around the end of
the address space. If they do, we will fix C++ at least to disallow this.
'i' is a signed integer, so we can't wrap in the index space either.
So why
can't LLVM prove this?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140312/1dd9a485/attachment.html>

Arnold Schwaighofer

2014-Mar-12 23:45 UTC

head link

[LLVMdev] Autovectorization questions

On Mar 12, 2014, at 4:05 PM, Chandler Carruth <chandlerc at google.com>
wrote:
> 
> On Wed, Mar 12, 2014 at 3:50 PM, Arnold Schwaighofer <aschwaighofer at
apple.com> wrote:
> In order to vectorize code like this LLVM needs to prove that “A[i*7]” does
not wrap in the address space. It fails to do so
> 
> But, why?
> 
> I'm moderately sure that neither C nor C++ allow wrapping around the
end of the address space. If they do, we will fix C++ at least to disallow this.
'i' is a signed integer, so we can't wrap in the index space either.
So why can't LLVM prove this?
The loop vectorizer relies on scev’s nowrap flags. We need to improve SCEV for
this.

  %conv = sext i32 %k to i64
  -->  (sext i32 %k to i64)
  %i.06 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
  -->  {0,+,1}<nuw><nsw><%for.body>             Exits: 1023
  %mul1 = mul nsw i64 %i.06, 7
  -->  {0,+,7}<%for.body>               Exits: 7161
  %arrayidx2 = getelementptr inbounds i32* %A, i64 %mul1
  -->  {%A,+,28}<%for.body>           <== we want to see a nw flag
here.


Scev sometimes drops new flags for safety (cannonicalization can make them
invalid if the same expression is used in different contexts) . See past
discussions on this.

We are thinking about doing something like described here:
http://permalink.gmane.org/gmane.comp.compilers.llvm.devel/67476 or in this
thread:(http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20131007/190703.html.

Arnold Schwaighofer

2014-Mar-13 00:01 UTC

head link

[LLVMdev] Autovectorization questions

Zinovy,

to clarify: the code is vectorizable. But LLVM currently fails to prove it is.

On Mar 12, 2014, at 3:50 PM, Arnold Schwaighofer <aschwaighofer at
apple.com> wrote:
> In order to vectorize code like this LLVM needs to prove that “A[i*7]” does
not wrap in the address space. It fails to do so and so LLVM doesn’t vectorize
this loop even if we try to force it.
> 
> The following loop will be vectorized if we force it:
> 
> int foo(int * A, int * B, int n, int k) {
>  for (int i = 0; i < 1024; ++i)
>    A[i] += B[i*k];
> }
> 
> So will this loop:
> 
> int foo(int * restrict A, int * restrict B, int n, int k) {
>  for (int i = 0; i < n; ++i)
>    A[i] += B[i*k];
> }
> 
> I will update the example.
> 
> Thanks,
> Arnold
> 
> On Mar 12, 2014, at 1:54 PM, Nadav Rotem <nrotem at apple.com> wrote:
> 
>> Hi Zinovy, 
>> 
>> The loop vectorizer probably decided that it was not profitable to
vectorize the function. You can force the vectorization of the function by
setting a low threshold.
>> 
>> Thanks,
>> Nadav
>> 
>> On Mar 12, 2014, at 3:34 AM, Zinovy Nis <zinovy.nis at gmail.com>
wrote:
>> 
>>> Hi,
>>> 
>>> I'm reading "http://llvm.org/docs/Vectorizers.html"
and have few question. Hope someone has answers on it.
>>> 
>>> 
>>> The Loop Vectorizer can vectorize code that becomes a sequence of
scalar instructions that scatter/gathers memory.
(http://llvm.org/docs/Vectorizers.html#scatter-gather)
>>> 
>>> int foo(int *A, int *B, int n, int k) {
>>>  for (int i = 0; i < n; ++i)
>>>    A[i*7] += B[i*k];
>>> }
>>> 
>>> I replaced "int *A"/"int *B" into "double
*A"/"double *B" and then compiled the sample with
>>> 
>>> $> ./clang -Ofast -ffast-math test.c -std=c99 -march=core-avx2
-S -o bb.S  -fslp-vectorize-aggressive
>>> 
>>> and loop body looks like:
>>> 
>>> .LBB1_2:                                # %for.body
>>>                                        # =>This Inner Loop
Header: Depth=1
>>>        cltq
>>>        vmovsd  (%rsi,%rax,8), %xmm0
>>>        movq    %r9, %r10
>>>        sarq    $32, %r10
>>>        vaddsd  (%rdi,%r10,8), %xmm0, %xmm0
>>>        vmovsd  %xmm0, (%rdi,%r10,8)
>>>        addq    %r8, %r9
>>>        addl    %ecx, %eax
>>>        decl    %edx
>>>        jne     .LBB1_2
>>> 
>>> so vector instructions for scalars (vaddsd, vmovsd) were used in
the loop and no real gather/scatter emitted.
>>> 
>>> The question is why this loop was not vectorized? Typo in docs?
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Raul Silvera

2014-Mar-13 00:26 UTC

head link

[LLVMdev] Autovectorization questions

Even without wrapping around the end of the address space, without restrict
you still have to worry about A and B overlapping on interesting ways. Will
LLVM do some runtime dependence checks to discount such potential overlap?
Just curious....


On Wed, Mar 12, 2014 at 5:01 PM, Arnold Schwaighofer <
aschwaighofer at apple.com> wrote:
> Zinovy,
>
> to clarify: the code is vectorizable. But LLVM currently fails to prove it
> is.
>
> On Mar 12, 2014, at 3:50 PM, Arnold Schwaighofer <aschwaighofer at
apple.com>
> wrote:
>
> > In order to vectorize code like this LLVM needs to prove that “A[i*7]”
> does not wrap in the address space. It fails to do so and so LLVM doesn’t
> vectorize this loop even if we try to force it.
> >
> > The following loop will be vectorized if we force it:
> >
> > int foo(int * A, int * B, int n, int k) {
> >  for (int i = 0; i < 1024; ++i)
> >    A[i] += B[i*k];
> > }
> >
> > So will this loop:
> >
> > int foo(int * restrict A, int * restrict B, int n, int k) {
> >  for (int i = 0; i < n; ++i)
> >    A[i] += B[i*k];
> > }
> >
> > I will update the example.
> >
> > Thanks,
> > Arnold
> >
> > On Mar 12, 2014, at 1:54 PM, Nadav Rotem <nrotem at apple.com>
wrote:
> >
> >> Hi Zinovy,
> >>
> >> The loop vectorizer probably decided that it was not profitable to
> vectorize the function. You can force the vectorization of the function by
> setting a low threshold.
> >>
> >> Thanks,
> >> Nadav
> >>
> >> On Mar 12, 2014, at 3:34 AM, Zinovy Nis <zinovy.nis at
gmail.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> I'm reading
"http://llvm.org/docs/Vectorizers.html" and have few
> question. Hope someone has answers on it.
> >>>
> >>>
> >>> The Loop Vectorizer can vectorize code that becomes a sequence
of
> scalar instructions that scatter/gathers memory. (
> http://llvm.org/docs/Vectorizers.html#scatter-gather)
> >>>
> >>> int foo(int *A, int *B, int n, int k) {
> >>>  for (int i = 0; i < n; ++i)
> >>>    A[i*7] += B[i*k];
> >>> }
> >>>
> >>> I replaced "int *A"/"int *B" into
"double *A"/"double *B" and then
> compiled the sample with
> >>>
> >>> $> ./clang -Ofast -ffast-math test.c -std=c99
-march=core-avx2 -S -o
> bb.S  -fslp-vectorize-aggressive
> >>>
> >>> and loop body looks like:
> >>>
> >>> .LBB1_2:                                # %for.body
> >>>                                        # =>This Inner Loop
Header:
> Depth=1
> >>>        cltq
> >>>        vmovsd  (%rsi,%rax,8), %xmm0
> >>>        movq    %r9, %r10
> >>>        sarq    $32, %r10
> >>>        vaddsd  (%rdi,%r10,8), %xmm0, %xmm0
> >>>        vmovsd  %xmm0, (%rdi,%r10,8)
> >>>        addq    %r8, %r9
> >>>        addl    %ecx, %eax
> >>>        decl    %edx
> >>>        jne     .LBB1_2
> >>>
> >>> so vector instructions for scalars (vaddsd, vmovsd) were used
in the
> loop and no real gather/scatter emitted.
> >>>
> >>> The question is why this loop was not vectorized? Typo in
docs?
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>


-- 
 Raúl E. Silvera
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140312/86527c55/attachment.html>

llvm dev - Mar 2014 - [LLVMdev] Autovectorization questions

[LLVMdev] Autovectorization questions

[LLVMdev] Autovectorization questions

[LLVMdev] Autovectorization questions

[LLVMdev] Autovectorization questions

[LLVMdev] Autovectorization questions