Displaying 8 results from an estimated 8 matches for "add13".
Did you mean:
add1
2017 Jan 24
3
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
On Tue, Jan 24, 2017 at 1:20 PM, Sanjay Patel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
> I started looking at the log files that you attached, and I'm confused.
> The code that is supposedly causing the perf regression is created by the
> loop vectorizer, right? Except the bad code is not in the "vector.body", so
> is there something peculiar about
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
...ign 4, !tbaa !1 %add9 = add nsw i32
> %add7, %5 %arrayidx10 = getelementptr inbounds i32* %a, i32 6 %6 = load
> i32* %arrayidx10, align 4, !tbaa !1 %add11 = add nsw i32 %add9, %6
> %arrayidx12 = getelementptr inbounds i32* %a, i32 7 %7 = load i32*
> %arrayidx12, align 4, !tbaa !1 %add13 = add nsw i32 %add11, %7
> %arrayidx14 = getelementptr inbounds i32* %a, i32 8 %8 = load i32*
> %arrayidx14, align 4, !tbaa !1 %add15 = add nsw i32 %add13, %8
> %arrayidx16 = getelementptr inbounds i32* %a, i32 9 %9 = load i32*
> %arrayidx16, align 4, !tbaa !1 %add17 = add nsw i32 %...
2014 Sep 19
3
[LLVMdev] [Vectorization] Mis match in code generated
...rrayidx8, align 4, !tbaa !1
%add9 = add nsw i32 %add7, %5
%arrayidx10 = getelementptr inbounds i32* %a, i32 6
%6 = load i32* %arrayidx10, align 4, !tbaa !1
%add11 = add nsw i32 %add9, %6
%arrayidx12 = getelementptr inbounds i32* %a, i32 7
%7 = load i32* %arrayidx12, align 4, !tbaa !1
%add13 = add nsw i32 %add11, %7
%arrayidx14 = getelementptr inbounds i32* %a, i32 8
%8 = load i32* %arrayidx14, align 4, !tbaa !1
%add15 = add nsw i32 %add13, %8
%arrayidx16 = getelementptr inbounds i32* %a, i32 9
%9 = load i32* %arrayidx16, align 4, !tbaa !1
%add17 = add nsw i32 %add15, %9...
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
...i32* %arrayidx8, align 4, !tbaa !1 %add9 = add nsw i32
%add7, %5 %arrayidx10 = getelementptr inbounds i32* %a, i32 6 %6 = load
i32* %arrayidx10, align 4, !tbaa !1 %add11 = add nsw i32 %add9, %6
%arrayidx12 = getelementptr inbounds i32* %a, i32 7 %7 = load i32*
%arrayidx12, align 4, !tbaa !1 %add13 = add nsw i32 %add11, %7
%arrayidx14 = getelementptr inbounds i32* %a, i32 8 %8 = load i32*
%arrayidx14, align 4, !tbaa !1 %add15 = add nsw i32 %add13, %8
%arrayidx16 = getelementptr inbounds i32* %a, i32 9 %9 = load i32*
%arrayidx16, align 4, !tbaa !1 %add17 = add nsw i32 %add15, %9
%arrayidx1...
2013 Nov 08
1
[LLVMdev] loop vectorizer and storing to uniform addresses
...3 = load float* %arrayidx9, align 4
%arrayidx10 = getelementptr inbounds [4 x float]* %sum, i32 0, i64 1
%14 = load float* %arrayidx10, align 4
%add11 = fadd float %13, %14
%arrayidx12 = getelementptr inbounds [4 x float]* %sum, i32 0, i64 2
%15 = load float* %arrayidx12, align 4
%add13 = fadd float %add11, %15
%arrayidx14 = getelementptr inbounds [4 x float]* %sum, i32 0, i64 3
%16 = load float* %arrayidx14, align 4
%add15 = fadd float %add13, %16
ret float %add15
}
Thus, the inner loop is not unrolled.
opt -basicaa -loop-vectorize -debug-only=loop-vectorize
-vec...
2014 Nov 10
2
[LLVMdev] [Vectorization] Mis match in code generated
...getelementptr inbounds i32* %a, i32 6
> > > %6 = load i32* %arrayidx10, align 4, !tbaa !1
> > > %add11 = add nsw i32 %add9, %6
> > > %arrayidx12 = getelementptr inbounds i32* %a, i32 7
> > > %7 = load i32* %arrayidx12, align 4, !tbaa !1
> > > %add13 = add nsw i32 %add11, %7
> > > %arrayidx14 = getelementptr inbounds i32* %a, i32 8
> > > %8 = load i32* %arrayidx14, align 4, !tbaa !1
> > > %add15 = add nsw i32 %add13, %8
> > > %arrayidx16 = getelementptr inbounds i32* %a, i32 9
> > > %9 = lo...
2013 Nov 08
0
[LLVMdev] loop vectorizer and storing to uniform addresses
On 7 November 2013 17:18, Frank Winter <fwinter at jlab.org> wrote:
> LV: We don't allow storing to uniform addresses
>
This is triggering because it didn't recognize as a reduction variable
during the canVectorizeInstrs() but did recognize that sum[q] is loop
invariant in canVectorizeMemory().
I'm guessing the nested loop was unrolled because of the low trip-count,
and
2013 Nov 08
3
[LLVMdev] loop vectorizer and storing to uniform addresses
I am trying my luck on this global reduction kernel:
float foo( int start , int end , float * A )
{
float sum[4] = {0.,0.,0.,0.};
for (int i = start ; i < end ; ++i ) {
for (int q = 0 ; q < 4 ; ++q )
sum[q] += A[i*4+q];
}
return sum[0]+sum[1]+sum[2]+sum[3];
}
LV: Checking a loop in "foo"
LV: Found a loop: for.cond1
LV: Found an induction variable.
LV: We