Displaying 14 results from an estimated 14 matches for "arrayidx13".
Did you mean:
arrayidx1
2013 Oct 30
3
[LLVMdev] loop vectorizer
...t; %2 = load float* %arrayidx9, align 4
> %add10 = fadd float %1, %2
> %arrayidx11 = getelementptr inbounds float* %c, i64 %add2
> store float %add10, float* %arrayidx11, align 4
> %arrayidx12 = getelementptr inbounds float* %a, i64 %add8
> %3 = load float* %arrayidx12, align 4
> %arrayidx13 = getelementptr inbounds float* %b, i64 %add8
> %4 = load float* %arrayidx13, align 4
> %add14 = fadd float %3, %4
> %arrayidx15 = getelementptr inbounds float* %c, i64 %add8
> store float %add14, float* %arrayidx15, align 4
> %inc = add i64 %storemerge10, 1
> %exitcond = icmp eq...
2013 Oct 30
0
[LLVMdev] loop vectorizer
..., i64 %add2
%1 = load float* %arrayidx9, align 4
%add10 = fadd float %0, %1
%arrayidx11 = getelementptr inbounds float* %c, i64 %add2
store float %add10, float* %arrayidx11, align 4
%arrayidx12 = getelementptr inbounds float* %a, i64 %add8
%2 = load float* %arrayidx12, align 4
%arrayidx13 = getelementptr inbounds float* %b, i64 %add8
%3 = load float* %arrayidx13, align 4
%add14 = fadd float %2, %3
%arrayidx15 = getelementptr inbounds float* %c, i64 %add8
store float %add14, float* %arrayidx15, align 4
%inc = add i64 %i.015, 1
%exitcond = icmp eq i64 %inc, %end
b...
2013 Oct 30
2
[LLVMdev] loop vectorizer
...= load float* %arrayidx9, align 4
> %add10 = fadd float %0, %1
> %arrayidx11 = getelementptr inbounds float* %c, i64 %add2
> store float %add10, float* %arrayidx11, align 4
> %arrayidx12 = getelementptr inbounds float* %a, i64 %add8
> %2 = load float* %arrayidx12, align 4
> %arrayidx13 = getelementptr inbounds float* %b, i64 %add8
> %3 = load float* %arrayidx13, align 4
> %add14 = fadd float %2, %3
> %arrayidx15 = getelementptr inbounds float* %c, i64 %add8
> store float %add14, float* %arrayidx15, align 4
> %inc = add i64 %i.015, 1
> %exitcond = icmp eq i...
2013 Oct 30
0
[LLVMdev] loop vectorizer
..., i64 %add2
%2 = load float* %arrayidx9, align 4
%add10 = fadd float %1, %2
%arrayidx11 = getelementptr inbounds float* %c, i64 %add2
store float %add10, float* %arrayidx11, align 4
%arrayidx12 = getelementptr inbounds float* %a, i64 %add8
%3 = load float* %arrayidx12, align 4
%arrayidx13 = getelementptr inbounds float* %b, i64 %add8
%4 = load float* %arrayidx13, align 4
%add14 = fadd float %3, %4
%arrayidx15 = getelementptr inbounds float* %c, i64 %add8
store float %add14, float* %arrayidx15, align 4
%inc = add i64 %storemerge10, 1
%exitcond = icmp eq i64 %inc, %e...
2013 Oct 30
3
[LLVMdev] loop vectorizer
On 30 October 2013 09:25, Nadav Rotem <nrotem at apple.com> wrote:
> The access pattern to arrays a and b is non-linear. Unrolled loops are
> usually handled by the SLP-vectorizer. Are ir0 and ir1 consecutive for all
> values for i ?
>
Based on his list of values, it seems that the induction stride is linear
within each block of 4 iterations, but it's not a clear
2013 Oct 30
2
[LLVMdev] loop vectorizer
..., i64 %add2
%1 = load float* %arrayidx9, align 4
%add10 = fadd float %0, %1
%arrayidx11 = getelementptr inbounds float* %c, i64 %add2
store float %add10, float* %arrayidx11, align 4
%arrayidx12 = getelementptr inbounds float* %a, i64 %add8
%2 = load float* %arrayidx12, align 4
%arrayidx13 = getelementptr inbounds float* %b, i64 %add8
%3 = load float* %arrayidx13, align 4
%add14 = fadd float %2, %3
%arrayidx15 = getelementptr inbounds float* %c, i64 %add8
store float %add14, float* %arrayidx15, align 4
%inc = add i64 %i.015, 1
%exitcond = icmp eq i64 %inc, %end
b...
2013 Oct 30
0
[LLVMdev] loop vectorizer
...n 4
>> %add10 = fadd float %0, %1
>> %arrayidx11 = getelementptr inbounds float* %c, i64 %add2
>> store float %add10, float* %arrayidx11, align 4
>> %arrayidx12 = getelementptr inbounds float* %a, i64 %add8
>> %2 = load float* %arrayidx12, align 4
>> %arrayidx13 = getelementptr inbounds float* %b, i64 %add8
>> %3 = load float* %arrayidx13, align 4
>> %add14 = fadd float %2, %3
>> %arrayidx15 = getelementptr inbounds float* %c, i64 %add8
>> store float %add14, float* %arrayidx15, align 4
>> %inc = add i64 %i.015, 1
&g...
2013 Nov 01
2
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
....body4.preheader, %for.body4
%storemerge10 = phi i64 [ %inc19, %for.body4 ], [ %div,
%for.body4.preheader ]
%mul5 = shl i64 %storemerge10, 3
%add82 = or i64 %mul5, 4
%arrayidx = getelementptr inbounds float* %a, i64 %mul5
%arrayidx11 = getelementptr inbounds float* %b, i64 %mul5
%arrayidx13 = getelementptr inbounds float* %c, i64 %mul5
%arrayidx14 = getelementptr inbounds float* %a, i64 %add82
%arrayidx15 = getelementptr inbounds float* %b, i64 %add82
%arrayidx17 = getelementptr inbounds float* %c, i64 %add82
%0 = bitcast float* %arrayidx to <4 x float>*
%1 = load...
2013 Nov 01
0
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
...> %storemerge10 = phi i64 [ %inc19, %for.body4 ], [ %div,
> %for.body4.preheader ]
> %mul5 = shl i64 %storemerge10, 3
> %add82 = or i64 %mul5, 4
> %arrayidx = getelementptr inbounds float* %a, i64 %mul5
> %arrayidx11 = getelementptr inbounds float* %b, i64 %mul5
> %arrayidx13 = getelementptr inbounds float* %c, i64 %mul5
> %arrayidx14 = getelementptr inbounds float* %a, i64 %add82
> %arrayidx15 = getelementptr inbounds float* %b, i64 %add82
> %arrayidx17 = getelementptr inbounds float* %c, i64 %add82
> %0 = bitcast float* %arrayidx to <4 x float&g...
2013 Oct 30
0
[LLVMdev] loop vectorizer
...= load float* %arrayidx9, align 4
> %add10 = fadd float %0, %1
> %arrayidx11 = getelementptr inbounds float* %c, i64 %add2
> store float %add10, float* %arrayidx11, align 4
> %arrayidx12 = getelementptr inbounds float* %a, i64 %add8
> %2 = load float* %arrayidx12, align 4
> %arrayidx13 = getelementptr inbounds float* %b, i64 %add8
> %3 = load float* %arrayidx13, align 4
> %add14 = fadd float %2, %3
> %arrayidx15 = getelementptr inbounds float* %c, i64 %add8
> store float %add14, float* %arrayidx15, align 4
> %inc = add i64 %i.015, 1
> %exitcond = icmp eq i...
2016 Apr 08
2
LIBCLC with LLVM 3.9 Trunk
It's not clear what is actually wrong from your original message, I think
you need to give some more information as to what you are doing: Example
source, what target GPU, compiler error messages or other evidence of "it's
wrong" (llvm IR, disassembly, etc) ...
--
Mats
On 8 April 2016 at 09:55, Liu Xin via llvm-dev <llvm-dev at lists.llvm.org>
wrote:
> I built it
2012 Jan 17
0
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
Hi,
On Fri, Dec 30, 2011 at 3:09 AM, Tobias Grosser <tobias at grosser.es> wrote:
> As it seems my intuition is wrong, I am very eager to see and understand
> an example where a search limit of 4000 is really needed.
>
To make the ball roll again, I attached a testcase that can be tuned
to understand the impact on compile time for different sizes of a
basic block. One can also
2011 Dec 30
3
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On 12/29/2011 06:32 PM, Hal Finkel wrote:
> On Thu, 2011-12-29 at 15:00 +0100, Tobias Grosser wrote:
>> On 12/14/2011 01:25 AM, Hal Finkel wrote:
>> One thing that I would still like to have is a test case where
>> bb-vectorize-search-limit is needed to avoid exponential compile time
>> growth and another test case that is not optimized, if
>>
2012 Jan 24
4
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
...i32 %add21240
do.body9: ; preds = %for.body, %do.body9
%indvars.iv = phi i64 [ %indvars.iv.next, %do.body9 ], [ 0, %for.body ]
%arrayidx11 = getelementptr inbounds [100 x i32]* %B, i64 0, i64 %indvars.iv
%8 = load i32* %arrayidx11, align 4, !tbaa !0
%arrayidx13 = getelementptr inbounds [100 x i32]* %C, i64 0, i64 %indvars.iv
%9 = load i32* %arrayidx13, align 4, !tbaa !0
%add14 = add nsw i32 %9, %8
%arrayidx16 = getelementptr inbounds [100 x i32]* %A, i64 0, i64 %indvars.iv
%mul21 = mul nsw i32 %add14, %9
%sub = sub nsw i32 %add14, %mul21
%mul4...