search for: next_loopiv

Displaying 7 results from an estimated 7 matches for "next_loopiv".

2013 Mar 01
2
[LLVMdev] Interesting post increment situation in DAG combiner
...s. p.loop_body.us65: ; preds = %p.loop_body.lr.ph.us78, %p.loop_body.us65 %p_arrayidx.us69.phi = phi i16* [ %p_arrayidx.us69.gep, %p.loop_body.lr.ph.us78 ], [ %p_arrayidx.us69.inc, %p.loop_body.us65 ] %p.loopiv48.us66 = phi i32 [ 0, %p.loop_body.lr.ph.us78 ], [ %p.next_loopiv.us67, %p.loop_body.us65 ] %vector_ptr.us70 = bitcast i16* %p_arrayidx.us69.phi to <4 x i16>* %p.next_loopiv.us67 = add nsw i32 %p.loopiv48.us66, 4 <<<<<<<<<<<<<<<<<< IV %_p_vec_full.us71 = load <4 x i16>* %vector_ptr.us70, ali...
2013 Mar 01
0
[LLVMdev] Interesting post increment situation in DAG combiner
...; preds = > %p.loop_body.lr.ph.us78, %p.loop_body.us65 > %p_arrayidx.us69.phi = phi i16* [ %p_arrayidx.us69.gep, > %p.loop_body.lr.ph.us78 ], [ %p_arrayidx.us69.inc, %p.loop_body.us65 > ] > %p.loopiv48.us66 = phi i32 [ 0, %p.loop_body.lr.ph.us78 ], [ > %p.next_loopiv.us67, %p.loop_body.us65 ] > %vector_ptr.us70 = bitcast i16* %p_arrayidx.us69.phi to <4 x i16>* > %p.next_loopiv.us67 = add nsw i32 %p.loopiv48.us66, 4 > <<<<<<<<<<<<<<<<<< > IV > %_p_vec_full.us71 = load <4 x i16...
2013 Mar 01
1
[LLVMdev] Interesting post increment situation in DAG combiner
...reds = > > %p.loop_body.lr.ph.us78, %p.loop_body.us65 > > %p_arrayidx.us69.phi = phi i16* [ %p_arrayidx.us69.gep, > > %p.loop_body.lr.ph.us78 ], [ %p_arrayidx.us69.inc, %p.loop_body.us65 > ] > > %p.loopiv48.us66 = phi i32 [ 0, %p.loop_body.lr.ph.us78 ], [ > > %p.next_loopiv.us67, %p.loop_body.us65 ] > > %vector_ptr.us70 = bitcast i16* %p_arrayidx.us69.phi to <4 x i16>* > > %p.next_loopiv.us67 = add nsw i32 %p.loopiv48.us66, 4 > > <<<<<<<<<<<<<<<<<< > > IV > > %_p_vec_ful...
2013 Mar 11
0
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
...quot; ret void polly.loop_body: ; preds = %polly.loop_body, %CUDA.LoopHeader.x.preheader %_p_scalar_ = phi float [ 0.000000e+00, %CUDA.LoopHeader.x.preheader ], [ %p_8, %polly.loop_body ] %polly.loopiv10 = phi i64 [ 0, %CUDA.LoopHeader.x.preheader ], [ %polly.next_loopiv, %polly.loop_body ] %polly.next_loopiv = add i64 %polly.loopiv10, 1 %p_ = add i64 %polly.loopiv10, %p_.moved.to.4.cloned %p_newGEPInst9.cloned = getelementptr float* inttoptr (i64 47246749696 to float*), i64 %p_ %p_newGEPInst12.cloned = getelementptr float* inttoptr (i64 47380971520 to floa...
2013 Mar 11
2
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
Dear all, Attached notunrolled.ll is a module containing reduction kernel. What I'm trying to do is to unroll it in such way, that partial reduction on unrolled iterations would be performed on register, and then stored to memory only once. Currently llvm's unroller together with all standard optimizations produce code, which stores value to memory after every unrolled iteration, which is
2013 Mar 01
0
[LLVMdev] parallel loop metadata simplification
----- Original Message ----- > From: "Paul Redmond" <paul.redmond at intel.com> > To: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> > Sent: Thursday, February 28, 2013 1:30:57 PM > Subject: [LLVMdev] parallel loop metadata simplification > > Hi, > > I've been working on clang codegen for #pragma ivdep and creating the >
2013 Feb 28
5
[LLVMdev] parallel loop metadata simplification
Hi, I've been working on clang codegen for #pragma ivdep and creating the llvm.mem.parallel_loop_access metadata seems quite difficult. The main problem is that there are so many places where loads and stores are created and all of them need to be changed when emitting a parallel loop. Note that creating llvm.loop.parallel is not a problem. One option is to modify IRBuilder to enable