thr3ads.net - llvm dev - [LLVMdev] loop vectorizer [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2013-Oct-30 17:28 UTC

[LLVMdev] loop vectorizer

On 30 October 2013 09:25, Nadav Rotem <nrotem at apple.com> wrote:
> The access pattern to arrays a and b is non-linear.  Unrolled loops are
> usually handled by the SLP-vectorizer.  Are ir0 and ir1 consecutive for all
> values for i ?
>
Based on his list of values, it seems that the induction stride is linear
within each block of 4 iterations, but it's not a clear relationship.

As you say, it should be possible to spot that once the loop is unrolled,
and get the SLP to vectorize if the relationship becomes clear.

Maybe I'm wrong, but this looks like a problem of missed opportunities, not
technically hard to implement.

--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131030/bd2d9130/attachment.html>

Frank Winter

2013-Oct-30 17:40 UTC

head link

[LLVMdev] loop vectorizer

I ran the BB vectorizer as I guess this is the SLP vectorizer.

BBV: using target information
BBV: fusing loop #1 for for.body in _Z3barmmPfS_S_...
BBV: found 2 instructions with candidate pairs
BBV: found 0 pair connections.
BBV: done!

However, this was run on the unrolled loop (I guess).

Here is the IR printed by 'opt':

entry:
   %cmp9 = icmp ult i64 %start, %end
   br i1 %cmp9, label %for.body, label %for.end

for.body:                                         ; preds = %entry, 
%for.body
   %storemerge10 = phi i64 [ %inc, %for.body ], [ %start, %entry ]
   %div = lshr i64 %storemerge10, 2
   %mul1 = shl i64 %div, 3
   %rem = and i64 %storemerge10, 3
   %add2 = or i64 %mul1, %rem
   %0 = lshr i64 %storemerge10, 1
   %add51 = shl i64 %0, 2
   %mul6 = or i64 %rem, %add51
   %add8 = or i64 %mul6, 4
   %arrayidx = getelementptr inbounds float* %a, i64 %add2
   %1 = load float* %arrayidx, align 4
   %arrayidx9 = getelementptr inbounds float* %b, i64 %add2
   %2 = load float* %arrayidx9, align 4
   %add10 = fadd float %1, %2
   %arrayidx11 = getelementptr inbounds float* %c, i64 %add2
   store float %add10, float* %arrayidx11, align 4
   %arrayidx12 = getelementptr inbounds float* %a, i64 %add8
   %3 = load float* %arrayidx12, align 4
   %arrayidx13 = getelementptr inbounds float* %b, i64 %add8
   %4 = load float* %arrayidx13, align 4
   %add14 = fadd float %3, %4
   %arrayidx15 = getelementptr inbounds float* %c, i64 %add8
   store float %add14, float* %arrayidx15, align 4
   %inc = add i64 %storemerge10, 1
   %exitcond = icmp eq i64 %inc, %end
   br i1 %exitcond, label %for.end, label %for.body

for.end:                                          ; preds = %for.body, 
%entry
   ret void


Is what you're saying that I should unroll the loop first by a given 
factor and then run SLP again? How would I do that say for a factor of 2?

Frank



On 30/10/13 13:28, Renato Golin wrote:> On 30 October 2013 09:25, Nadav Rotem <nrotem at apple.com 
> <mailto:nrotem at apple.com>> wrote:
>
>     The access pattern to arrays a and b is non-linear.  Unrolled
>     loops are usually handled by the SLP-vectorizer.  Are ir0 and ir1
>     consecutive for all values for i ?
>
>
> Based on his list of values, it seems that the induction stride is 
> linear within each block of 4 iterations, but it's not a clear 
> relationship.
>
> As you say, it should be possible to spot that once the loop is 
> unrolled, and get the SLP to vectorize if the relationship becomes clear.
>
> Maybe I'm wrong, but this looks like a problem of missed 
> opportunities, not technically hard to implement.
>
> --renato

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131030/07d318bd/attachment.html>

Hal Finkel

2013-Oct-30 17:50 UTC

head link

[LLVMdev] loop vectorizer

----- Original Message -----> 
> 
> I ran the BB vectorizer as I guess this is the SLP vectorizer.
No, while the BB vectorizer is doing a form of SLP vectorization, there is a
separate SLP vectorization pass which uses a different algorithm. You can pass
-vectorize-slp to opt.

 -Hal
> 
> BBV: using target information
> BBV: fusing loop #1 for for.body in _Z3barmmPfS_S_...
> BBV: found 2 instructions with candidate pairs
> BBV: found 0 pair connections.
> BBV: done!
> 
> However, this was run on the unrolled loop (I guess).
> 
> Here is the IR printed by 'opt':
> 
> entry:
> %cmp9 = icmp ult i64 %start, %end
> br i1 %cmp9, label %for.body, label %for.end
> 
> for.body: ; preds = %entry, %for.body
> %storemerge10 = phi i64 [ %inc, %for.body ], [ %start, %entry ]
> %div = lshr i64 %storemerge10, 2
> %mul1 = shl i64 %div, 3
> %rem = and i64 %storemerge10, 3
> %add2 = or i64 %mul1, %rem
> %0 = lshr i64 %storemerge10, 1
> %add51 = shl i64 %0, 2
> %mul6 = or i64 %rem, %add51
> %add8 = or i64 %mul6, 4
> %arrayidx = getelementptr inbounds float* %a, i64 %add2
> %1 = load float* %arrayidx, align 4
> %arrayidx9 = getelementptr inbounds float* %b, i64 %add2
> %2 = load float* %arrayidx9, align 4
> %add10 = fadd float %1, %2
> %arrayidx11 = getelementptr inbounds float* %c, i64 %add2
> store float %add10, float* %arrayidx11, align 4
> %arrayidx12 = getelementptr inbounds float* %a, i64 %add8
> %3 = load float* %arrayidx12, align 4
> %arrayidx13 = getelementptr inbounds float* %b, i64 %add8
> %4 = load float* %arrayidx13, align 4
> %add14 = fadd float %3, %4
> %arrayidx15 = getelementptr inbounds float* %c, i64 %add8
> store float %add14, float* %arrayidx15, align 4
> %inc = add i64 %storemerge10, 1
> %exitcond = icmp eq i64 %inc, %end
> br i1 %exitcond, label %for.end, label %for.body
> 
> for.end: ; preds = %for.body, %entry
> ret void
> 
> 
> Is what you're saying that I should unroll the loop first by a given
> factor and then run SLP again? How would I do that say for a factor
> of 2?
> 
> Frank
> 
> 
> 
> On 30/10/13 13:28, Renato Golin wrote:
> 
> 
> 
> 
> On 30 October 2013 09:25, Nadav Rotem < nrotem at apple.com > wrote:
> 
> 
> The access pattern to arrays a and b is non-linear. Unrolled loops
> are usually handled by the SLP-vectorizer. Are ir0 and ir1
> consecutive for all values for i ?
> 
> 
> Based on his list of values, it seems that the induction stride is
> linear within each block of 4 iterations, but it's not a clear
> relationship.
> 
> 
> As you say, it should be possible to spot that once the loop is
> unrolled, and get the SLP to vectorize if the relationship becomes
> clear.
> 
> 
> Maybe I'm wrong, but this looks like a problem of missed
> opportunities, not technically hard to implement.
> 
> 
> --renato
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Hal Finkel

2013-Oct-30 17:52 UTC

head link

[LLVMdev] loop vectorizer

----- Original Message -----> 
> 
> 
> On 30 October 2013 09:25, Nadav Rotem < nrotem at apple.com > wrote:
> 
> 
> The access pattern to arrays a and b is non-linear. Unrolled loops
> are usually handled by the SLP-vectorizer. Are ir0 and ir1
> consecutive for all values for i ?
On problem that you might run into currently is that the loop vectorizer, which
is the only pass that, by default, will do partial unrolling, is run after SLP
vectorization. The regular unroller can do partial unrolling, but unless the
target overrides the relevant TTI interface and changes the default, the
-unroll-allow-partial or -unroll-runtime flags would need to be specified.

 -Hal
> 
> 
> 
> Based on his list of values, it seems that the induction stride is
> linear within each block of 4 iterations, but it's not a clear
> relationship.
> 
> 
> As you say, it should be possible to spot that once the loop is
> unrolled, and get the SLP to vectorize if the relationship becomes
> clear.
> 
> 
> Maybe I'm wrong, but this looks like a problem of missed
> opportunities, not technically hard to implement.
> 
> 
> --renato
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Oct 2013 - [LLVMdev] loop vectorizer

[LLVMdev] loop vectorizer

[LLVMdev] loop vectorizer

[LLVMdev] loop vectorizer

[LLVMdev] loop vectorizer

Apparently Analagous Threads