search for: add13

Displaying 8 results from an estimated 8 matches for "add13".

Did you mean: add1
2017 Jan 24
3
[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
On Tue, Jan 24, 2017 at 1:20 PM, Sanjay Patel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > I started looking at the log files that you attached, and I'm confused. > The code that is supposedly causing the perf regression is created by the > loop vectorizer, right? Except the bad code is not in the "vector.body", so > is there something peculiar about
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
...ign 4, !tbaa !1 %add9 = add nsw i32 > %add7, %5 %arrayidx10 = getelementptr inbounds i32* %a, i32 6 %6 = load > i32* %arrayidx10, align 4, !tbaa !1 %add11 = add nsw i32 %add9, %6 > %arrayidx12 = getelementptr inbounds i32* %a, i32 7 %7 = load i32* > %arrayidx12, align 4, !tbaa !1 %add13 = add nsw i32 %add11, %7 > %arrayidx14 = getelementptr inbounds i32* %a, i32 8 %8 = load i32* > %arrayidx14, align 4, !tbaa !1 %add15 = add nsw i32 %add13, %8 > %arrayidx16 = getelementptr inbounds i32* %a, i32 9 %9 = load i32* > %arrayidx16, align 4, !tbaa !1 %add17 = add nsw i32 %...
2014 Sep 19
3
[LLVMdev] [Vectorization] Mis match in code generated
...rrayidx8, align 4, !tbaa !1 %add9 = add nsw i32 %add7, %5 %arrayidx10 = getelementptr inbounds i32* %a, i32 6 %6 = load i32* %arrayidx10, align 4, !tbaa !1 %add11 = add nsw i32 %add9, %6 %arrayidx12 = getelementptr inbounds i32* %a, i32 7 %7 = load i32* %arrayidx12, align 4, !tbaa !1 %add13 = add nsw i32 %add11, %7 %arrayidx14 = getelementptr inbounds i32* %a, i32 8 %8 = load i32* %arrayidx14, align 4, !tbaa !1 %add15 = add nsw i32 %add13, %8 %arrayidx16 = getelementptr inbounds i32* %a, i32 9 %9 = load i32* %arrayidx16, align 4, !tbaa !1 %add17 = add nsw i32 %add15, %9...
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
...i32* %arrayidx8, align 4, !tbaa !1 %add9 = add nsw i32 %add7, %5 %arrayidx10 = getelementptr inbounds i32* %a, i32 6 %6 = load i32* %arrayidx10, align 4, !tbaa !1 %add11 = add nsw i32 %add9, %6 %arrayidx12 = getelementptr inbounds i32* %a, i32 7 %7 = load i32* %arrayidx12, align 4, !tbaa !1 %add13 = add nsw i32 %add11, %7 %arrayidx14 = getelementptr inbounds i32* %a, i32 8 %8 = load i32* %arrayidx14, align 4, !tbaa !1 %add15 = add nsw i32 %add13, %8 %arrayidx16 = getelementptr inbounds i32* %a, i32 9 %9 = load i32* %arrayidx16, align 4, !tbaa !1 %add17 = add nsw i32 %add15, %9 %arrayidx1...
2013 Nov 08
1
[LLVMdev] loop vectorizer and storing to uniform addresses
...3 = load float* %arrayidx9, align 4 %arrayidx10 = getelementptr inbounds [4 x float]* %sum, i32 0, i64 1 %14 = load float* %arrayidx10, align 4 %add11 = fadd float %13, %14 %arrayidx12 = getelementptr inbounds [4 x float]* %sum, i32 0, i64 2 %15 = load float* %arrayidx12, align 4 %add13 = fadd float %add11, %15 %arrayidx14 = getelementptr inbounds [4 x float]* %sum, i32 0, i64 3 %16 = load float* %arrayidx14, align 4 %add15 = fadd float %add13, %16 ret float %add15 } Thus, the inner loop is not unrolled. opt -basicaa -loop-vectorize -debug-only=loop-vectorize -vec...
2014 Nov 10
2
[LLVMdev] [Vectorization] Mis match in code generated
...getelementptr inbounds i32* %a, i32 6 > > > %6 = load i32* %arrayidx10, align 4, !tbaa !1 > > > %add11 = add nsw i32 %add9, %6 > > > %arrayidx12 = getelementptr inbounds i32* %a, i32 7 > > > %7 = load i32* %arrayidx12, align 4, !tbaa !1 > > > %add13 = add nsw i32 %add11, %7 > > > %arrayidx14 = getelementptr inbounds i32* %a, i32 8 > > > %8 = load i32* %arrayidx14, align 4, !tbaa !1 > > > %add15 = add nsw i32 %add13, %8 > > > %arrayidx16 = getelementptr inbounds i32* %a, i32 9 > > > %9 = lo...
2013 Nov 08
0
[LLVMdev] loop vectorizer and storing to uniform addresses
On 7 November 2013 17:18, Frank Winter <fwinter at jlab.org> wrote: > LV: We don't allow storing to uniform addresses > This is triggering because it didn't recognize as a reduction variable during the canVectorizeInstrs() but did recognize that sum[q] is loop invariant in canVectorizeMemory(). I'm guessing the nested loop was unrolled because of the low trip-count, and
2013 Nov 08
3
[LLVMdev] loop vectorizer and storing to uniform addresses
I am trying my luck on this global reduction kernel: float foo( int start , int end , float * A ) { float sum[4] = {0.,0.,0.,0.}; for (int i = start ; i < end ; ++i ) { for (int q = 0 ; q < 4 ; ++q ) sum[q] += A[i*4+q]; } return sum[0]+sum[1]+sum[2]+sum[3]; } LV: Checking a loop in "foo" LV: Found a loop: for.cond1 LV: Found an induction variable. LV: We