thr3ads.net - search: "arrayidx13"

2013 Oct 30

3

[LLVMdev] loop vectorizer

...t; %2 = load float* %arrayidx9, align 4 > %add10 = fadd float %1, %2 > %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 > store float %add10, float* %arrayidx11, align 4 > %arrayidx12 = getelementptr inbounds float* %a, i64 %add8 > %3 = load float* %arrayidx12, align 4 > %arrayidx13 = getelementptr inbounds float* %b, i64 %add8 > %4 = load float* %arrayidx13, align 4 > %add14 = fadd float %3, %4 > %arrayidx15 = getelementptr inbounds float* %c, i64 %add8 > store float %add14, float* %arrayidx15, align 4 > %inc = add i64 %storemerge10, 1 > %exitcond = icmp eq...

[LLVMdev] loop vectorizer

2013 Oct 30

0

[LLVMdev] loop vectorizer

..., i64 %add2 %1 = load float* %arrayidx9, align 4 %add10 = fadd float %0, %1 %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 store float %add10, float* %arrayidx11, align 4 %arrayidx12 = getelementptr inbounds float* %a, i64 %add8 %2 = load float* %arrayidx12, align 4 %arrayidx13 = getelementptr inbounds float* %b, i64 %add8 %3 = load float* %arrayidx13, align 4 %add14 = fadd float %2, %3 %arrayidx15 = getelementptr inbounds float* %c, i64 %add8 store float %add14, float* %arrayidx15, align 4 %inc = add i64 %i.015, 1 %exitcond = icmp eq i64 %inc, %end b...

[LLVMdev] loop vectorizer

2013 Oct 30

2

[LLVMdev] loop vectorizer

...= load float* %arrayidx9, align 4 > %add10 = fadd float %0, %1 > %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 > store float %add10, float* %arrayidx11, align 4 > %arrayidx12 = getelementptr inbounds float* %a, i64 %add8 > %2 = load float* %arrayidx12, align 4 > %arrayidx13 = getelementptr inbounds float* %b, i64 %add8 > %3 = load float* %arrayidx13, align 4 > %add14 = fadd float %2, %3 > %arrayidx15 = getelementptr inbounds float* %c, i64 %add8 > store float %add14, float* %arrayidx15, align 4 > %inc = add i64 %i.015, 1 > %exitcond = icmp eq i...

[LLVMdev] loop vectorizer

2013 Oct 30

0

[LLVMdev] loop vectorizer

..., i64 %add2 %2 = load float* %arrayidx9, align 4 %add10 = fadd float %1, %2 %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 store float %add10, float* %arrayidx11, align 4 %arrayidx12 = getelementptr inbounds float* %a, i64 %add8 %3 = load float* %arrayidx12, align 4 %arrayidx13 = getelementptr inbounds float* %b, i64 %add8 %4 = load float* %arrayidx13, align 4 %add14 = fadd float %3, %4 %arrayidx15 = getelementptr inbounds float* %c, i64 %add8 store float %add14, float* %arrayidx15, align 4 %inc = add i64 %storemerge10, 1 %exitcond = icmp eq i64 %inc, %e...

[LLVMdev] loop vectorizer

2013 Oct 30

3

[LLVMdev] loop vectorizer

On 30 October 2013 09:25, Nadav Rotem <nrotem at apple.com> wrote: > The access pattern to arrays a and b is non-linear. Unrolled loops are > usually handled by the SLP-vectorizer. Are ir0 and ir1 consecutive for all > values for i ? > Based on his list of values, it seems that the induction stride is linear within each block of 4 iterations, but it's not a clear

[LLVMdev] loop vectorizer

2013 Oct 30

2

[LLVMdev] loop vectorizer

..., i64 %add2 %1 = load float* %arrayidx9, align 4 %add10 = fadd float %0, %1 %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 store float %add10, float* %arrayidx11, align 4 %arrayidx12 = getelementptr inbounds float* %a, i64 %add8 %2 = load float* %arrayidx12, align 4 %arrayidx13 = getelementptr inbounds float* %b, i64 %add8 %3 = load float* %arrayidx13, align 4 %add14 = fadd float %2, %3 %arrayidx15 = getelementptr inbounds float* %c, i64 %add8 store float %add14, float* %arrayidx15, align 4 %inc = add i64 %i.015, 1 %exitcond = icmp eq i64 %inc, %end b...

[LLVMdev] loop vectorizer

2013 Oct 30

0

[LLVMdev] loop vectorizer

...n 4 >> %add10 = fadd float %0, %1 >> %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 >> store float %add10, float* %arrayidx11, align 4 >> %arrayidx12 = getelementptr inbounds float* %a, i64 %add8 >> %2 = load float* %arrayidx12, align 4 >> %arrayidx13 = getelementptr inbounds float* %b, i64 %add8 >> %3 = load float* %arrayidx13, align 4 >> %add14 = fadd float %2, %3 >> %arrayidx15 = getelementptr inbounds float* %c, i64 %add8 >> store float %add14, float* %arrayidx15, align 4 >> %inc = add i64 %i.015, 1 &g...

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

2

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

....body4.preheader, %for.body4 %storemerge10 = phi i64 [ %inc19, %for.body4 ], [ %div, %for.body4.preheader ] %mul5 = shl i64 %storemerge10, 3 %add82 = or i64 %mul5, 4 %arrayidx = getelementptr inbounds float* %a, i64 %mul5 %arrayidx11 = getelementptr inbounds float* %b, i64 %mul5 %arrayidx13 = getelementptr inbounds float* %c, i64 %mul5 %arrayidx14 = getelementptr inbounds float* %a, i64 %add82 %arrayidx15 = getelementptr inbounds float* %b, i64 %add82 %arrayidx17 = getelementptr inbounds float* %c, i64 %add82 %0 = bitcast float* %arrayidx to <4 x float>* %1 = load...

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

0

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

...> %storemerge10 = phi i64 [ %inc19, %for.body4 ], [ %div, > %for.body4.preheader ] > %mul5 = shl i64 %storemerge10, 3 > %add82 = or i64 %mul5, 4 > %arrayidx = getelementptr inbounds float* %a, i64 %mul5 > %arrayidx11 = getelementptr inbounds float* %b, i64 %mul5 > %arrayidx13 = getelementptr inbounds float* %c, i64 %mul5 > %arrayidx14 = getelementptr inbounds float* %a, i64 %add82 > %arrayidx15 = getelementptr inbounds float* %b, i64 %add82 > %arrayidx17 = getelementptr inbounds float* %c, i64 %add82 > %0 = bitcast float* %arrayidx to <4 x float&g...

[LLVMdev] loop vectorizer

2013 Oct 30

0

[LLVMdev] loop vectorizer

...= load float* %arrayidx9, align 4 > %add10 = fadd float %0, %1 > %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 > store float %add10, float* %arrayidx11, align 4 > %arrayidx12 = getelementptr inbounds float* %a, i64 %add8 > %2 = load float* %arrayidx12, align 4 > %arrayidx13 = getelementptr inbounds float* %b, i64 %add8 > %3 = load float* %arrayidx13, align 4 > %add14 = fadd float %2, %3 > %arrayidx15 = getelementptr inbounds float* %c, i64 %add8 > store float %add14, float* %arrayidx15, align 4 > %inc = add i64 %i.015, 1 > %exitcond = icmp eq i...

LIBCLC with LLVM 3.9 Trunk

2016 Apr 08

2

LIBCLC with LLVM 3.9 Trunk

It's not clear what is actually wrong from your original message, I think you need to give some more information as to what you are doing: Example source, what target GPU, compiler error messages or other evidence of "it's wrong" (llvm IR, disassembly, etc) ... -- Mats On 8 April 2016 at 09:55, Liu Xin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > I built it

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2012 Jan 17

0

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Hi, On Fri, Dec 30, 2011 at 3:09 AM, Tobias Grosser <tobias at grosser.es> wrote: > As it seems my intuition is wrong, I am very eager to see and understand > an example where a search limit of 4000 is really needed. > To make the ball roll again, I attached a testcase that can be tuned to understand the impact on compile time for different sizes of a basic block. One can also

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2011 Dec 30

3

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On 12/29/2011 06:32 PM, Hal Finkel wrote: > On Thu, 2011-12-29 at 15:00 +0100, Tobias Grosser wrote: >> On 12/14/2011 01:25 AM, Hal Finkel wrote: >> One thing that I would still like to have is a test case where >> bb-vectorize-search-limit is needed to avoid exponential compile time >> growth and another test case that is not optimized, if >>

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2012 Jan 24

4

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

...i32 %add21240 do.body9: ; preds = %for.body, %do.body9 %indvars.iv = phi i64 [ %indvars.iv.next, %do.body9 ], [ 0, %for.body ] %arrayidx11 = getelementptr inbounds [100 x i32]* %B, i64 0, i64 %indvars.iv %8 = load i32* %arrayidx11, align 4, !tbaa !0 %arrayidx13 = getelementptr inbounds [100 x i32]* %C, i64 0, i64 %indvars.iv %9 = load i32* %arrayidx13, align 4, !tbaa !0 %add14 = add nsw i32 %9, %8 %arrayidx16 = getelementptr inbounds [100 x i32]* %A, i64 0, i64 %indvars.iv %mul21 = mul nsw i32 %add14, %9 %sub = sub nsw i32 %add14, %mul21 %mul4...

search for: arrayidx13