thr3ads.net - search: "arrayidx11"

Displaying 19 results from an estimated 19 matches for "arrayidx11".

Did you mean: arrayidx1

2013 Oct 30

[LLVMdev] loop vectorizer

...i%4; c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; c[ ir1 ] = a[ ir1 ] + b[ ir1 ]; } } LV: Found a loop: for.body LV: Found an induction variable. LV: We need to do 0 pointer comparisons. LV: Checking memory dependencies LV: Bad stride - Not an AddRecExpr pointer %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 SCEV: ((4 * %add2)<nsw> + %c)<nsw> LV: Bad stride - Not an AddRecExpr pointer %arrayidx15 = getelementptr inbounds float* %c, i64 %add8 SCEV: ((4 * %add8)<nsw> + %c)<nsw> LV: Src Scev: ((4 * %add2)<nsw> + %c)<nsw>...

Loop vectorization with the loop containing bitcast

2016 Aug 17

Loop vectorization with the loop containing bitcast

...liasSet[0x9508b80, 1] must alias, No access Pointers: (<4 x i64>* %1, 18446744073709551615) AliasSet[0x95f8a70, 2] must alias, No access Pointers: (<4 x double>* %2, 18446744073709551615), (<4 x i64>* %0, 18446744073709551615) LAA: Accesses(3): %1 = bitcast double* %arrayIdx11 to <4 x i64>* (write) %2 = bitcast double* %arrayIdx to <4 x double>* (write) %0 = bitcast double* %arrayIdx to <4 x i64>* (read-only) Underlying objects for pointer %1 = bitcast double* %arrayIdx11 to <4 x i64>* @b = common local_unnamed_addr global...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

...mul6 = or i64 %rem, %add51 > %add8 = or i64 %mul6, 4 > %arrayidx = getelementptr inbounds float* %a, i64 %add2 > %1 = load float* %arrayidx, align 4 > %arrayidx9 = getelementptr inbounds float* %b, i64 %add2 > %2 = load float* %arrayidx9, align 4 > %add10 = fadd float %1, %2 > %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 > store float %add10, float* %arrayidx11, align 4 > %arrayidx12 = getelementptr inbounds float* %a, i64 %add8 > %3 = load float* %arrayidx12, align 4 > %arrayidx13 = getelementptr inbounds float* %b, i64 %add8 > %4 = load float* %arrayidx...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

...4 %i.015, 3 %add2 = or i64 %mul, %rem %add8 = or i64 %add2, 4 %arrayidx = getelementptr inbounds float* %a, i64 %add2 %0 = load float* %arrayidx, align 4 %arrayidx9 = getelementptr inbounds float* %b, i64 %add2 %1 = load float* %arrayidx9, align 4 %add10 = fadd float %0, %1 %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 store float %add10, float* %arrayidx11, align 4 %arrayidx12 = getelementptr inbounds float* %a, i64 %add8 %2 = load float* %arrayidx12, align 4 %arrayidx13 = getelementptr inbounds float* %b, i64 %add8 %3 = load float* %arrayidx13, align...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

...+ b[ ir0 ]; > c[ ir1 ] = a[ ir1 ] + b[ ir1 ]; > } > } > > LV: Found a loop: for.body > LV: Found an induction variable. > LV: We need to do 0 pointer comparisons. > LV: Checking memory dependencies > LV: Bad stride - Not an AddRecExpr pointer %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 SCEV: ((4 * %add2)<nsw> + %c)<nsw> > LV: Bad stride - Not an AddRecExpr pointer %arrayidx15 = getelementptr inbounds float* %c, i64 %add8 SCEV: ((4 * %add8)<nsw> + %c)<nsw> > LV: Src Scev: ((4 * %add2)<nsw> + %c)&lt...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

...= or i64 %mul, %rem > %add8 = or i64 %add2, 4 > %arrayidx = getelementptr inbounds float* %a, i64 %add2 > %0 = load float* %arrayidx, align 4 > %arrayidx9 = getelementptr inbounds float* %b, i64 %add2 > %1 = load float* %arrayidx9, align 4 > %add10 = fadd float %0, %1 > %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 > store float %add10, float* %arrayidx11, align 4 > %arrayidx12 = getelementptr inbounds float* %a, i64 %add8 > %2 = load float* %arrayidx12, align 4 > %arrayidx13 = getelementptr inbounds float* %b, i64 %add8 > %3 = load float* %arr...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

...i64 %0, 2 %mul6 = or i64 %rem, %add51 %add8 = or i64 %mul6, 4 %arrayidx = getelementptr inbounds float* %a, i64 %add2 %1 = load float* %arrayidx, align 4 %arrayidx9 = getelementptr inbounds float* %b, i64 %add2 %2 = load float* %arrayidx9, align 4 %add10 = fadd float %1, %2 %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 store float %add10, float* %arrayidx11, align 4 %arrayidx12 = getelementptr inbounds float* %a, i64 %add8 %3 = load float* %arrayidx12, align 4 %arrayidx13 = getelementptr inbounds float* %b, i64 %add8 %4 = load float* %arrayidx13, align...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

On 30 October 2013 09:25, Nadav Rotem <nrotem at apple.com> wrote: > The access pattern to arrays a and b is non-linear. Unrolled loops are > usually handled by the SLP-vectorizer. Are ir0 and ir1 consecutive for all > values for i ? > Based on his list of values, it seems that the induction stride is linear within each block of 4 iterations, but it's not a clear

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

...8 = or i64 %add2, 4 >> %arrayidx = getelementptr inbounds float* %a, i64 %add2 >> %0 = load float* %arrayidx, align 4 >> %arrayidx9 = getelementptr inbounds float* %b, i64 %add2 >> %1 = load float* %arrayidx9, align 4 >> %add10 = fadd float %0, %1 >> %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 >> store float %add10, float* %arrayidx11, align 4 >> %arrayidx12 = getelementptr inbounds float* %a, i64 %add8 >> %2 = load float* %arrayidx12, align 4 >> %arrayidx13 = getelementptr inbounds float* %b, i64 %add8 >&gt...

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

...body4: ; preds = %for.body4.preheader, %for.body4 %storemerge10 = phi i64 [ %inc19, %for.body4 ], [ %div, %for.body4.preheader ] %mul5 = shl i64 %storemerge10, 3 %add82 = or i64 %mul5, 4 %arrayidx = getelementptr inbounds float* %a, i64 %mul5 %arrayidx11 = getelementptr inbounds float* %b, i64 %mul5 %arrayidx13 = getelementptr inbounds float* %c, i64 %mul5 %arrayidx14 = getelementptr inbounds float* %a, i64 %add82 %arrayidx15 = getelementptr inbounds float* %b, i64 %add82 %arrayidx17 = getelementptr inbounds float* %c, i64 %add82 %0...

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

2017 Jan 24

[InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

On Tue, Jan 24, 2017 at 1:20 PM, Sanjay Patel via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > I started looking at the log files that you attached, and I'm confused. > The code that is supposedly causing the perf regression is created by the > loop vectorizer, right? Except the bad code is not in the "vector.body", so > is there something peculiar about

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

...; preds = > %for.body4.preheader, %for.body4 > %storemerge10 = phi i64 [ %inc19, %for.body4 ], [ %div, > %for.body4.preheader ] > %mul5 = shl i64 %storemerge10, 3 > %add82 = or i64 %mul5, 4 > %arrayidx = getelementptr inbounds float* %a, i64 %mul5 > %arrayidx11 = getelementptr inbounds float* %b, i64 %mul5 > %arrayidx13 = getelementptr inbounds float* %c, i64 %mul5 > %arrayidx14 = getelementptr inbounds float* %a, i64 %add82 > %arrayidx15 = getelementptr inbounds float* %b, i64 %add82 > %arrayidx17 = getelementptr inbounds float* %c, i...

LIBCLC with LLVM 3.9 Trunk

2016 Apr 08

LIBCLC with LLVM 3.9 Trunk

It's not clear what is actually wrong from your original message, I think you need to give some more information as to what you are doing: Example source, what target GPU, compiler error messages or other evidence of "it's wrong" (llvm IR, disassembly, etc) ... -- Mats On 8 April 2016 at 09:55, Liu Xin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > I built it

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2012 Jan 17

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Hi, On Fri, Dec 30, 2011 at 3:09 AM, Tobias Grosser <tobias at grosser.es> wrote: > As it seems my intuition is wrong, I am very eager to see and understand > an example where a search limit of 4000 is really needed. > To make the ball roll again, I attached a testcase that can be tuned to understand the impact on compile time for different sizes of a basic block. One can also

[LLVMdev] How can I remove these redundant copy between registers?

2015 May 21

[LLVMdev] How can I remove these redundant copy between registers?

Hi, I've been working on a Blackfin backend (llvm-3.6.0) based on the previous one that was removed in llvm-3.1. llc generates codes like this: 29 p1 = r2; 30 r5 = [p1]; 31 p1 = r2; 32 r6 = [p1 + 4]; 33 r5 = r6 + r5; 34 r6 = [p0 + -4]; 35 r5 *= r6; 36 p1 = r2; 37 r6 = [p1 + 8]; 38 p1 = r2; p1 and r2 are in different register classes. A p*

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2011 Dec 30

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On 12/29/2011 06:32 PM, Hal Finkel wrote: > On Thu, 2011-12-29 at 15:00 +0100, Tobias Grosser wrote: >> On 12/14/2011 01:25 AM, Hal Finkel wrote: >> One thing that I would still like to have is a test case where >> bb-vectorize-search-limit is needed to avoid exponential compile time >> growth and another test case that is not optimized, if >>

Handling native i16 types in clang and opt

2017 May 21

Handling native i16 types in clang and opt

Hello. My target architecture supports natively 16 bit integers (i16). Whenever I write in C programs using only short types, clang compiles the program to LLVM and converts the i16 data to i32 to perform arithmetic operations and then truncates the results to i16. Then, the InstructionCombining (INSTCOMBINE or IC) pass removes these conversions back and forth from i16, except for

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2012 Jan 24

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

...a !0 %add21237 = add i32 %5, %6 %7 = load i32* %arrayidx21239, align 4, !tbaa !0 %add21240 = add i32 %add21237, %7 ret i32 %add21240 do.body9: ; preds = %for.body, %do.body9 %indvars.iv = phi i64 [ %indvars.iv.next, %do.body9 ], [ 0, %for.body ] %arrayidx11 = getelementptr inbounds [100 x i32]* %B, i64 0, i64 %indvars.iv %8 = load i32* %arrayidx11, align 4, !tbaa !0 %arrayidx13 = getelementptr inbounds [100 x i32]* %C, i64 0, i64 %indvars.iv %9 = load i32* %arrayidx13, align 4, !tbaa !0 %add14 = add nsw i32 %9, %8 %arrayidx16 = getelementptr...

[LLVMdev] LiveIntervals analysis problem

2013 Feb 14

[LLVMdev] LiveIntervals analysis problem

...s = %if.then104 %cmp111 = icmp eq i32 %lost.addr.1, 0 br i1 %cmp111, label %if.then113, label %if.else124 if.then113: ; preds = %if.then110 %re114 = getelementptr inbounds %struct.LDPARMS* %ldp, i32 0, i32 3 %16 = load i32* %re114, align 4, !tbaa !3 %arrayidx115 = getelementptr inbounds i16* %s, i32 %16 %17 = load i16* %arrayidx115, align 2, !tbaa !5 %rebit117 = getelementptr inbounds %struct.LDPARMS* %ldp, i32 0, i32 7 %18 = load i16* %rebit117, align 2, !tbaa !5 %and119267 = and i16 %18, %17 %cmp120 = icmp eq i16 %and119267, 0 br i1 %cmp120,...

search for: arrayidx11