thr3ads.net - search: "afterloop"

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

2013 Mar 11

0

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

...ll ptx_device i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() %PositionOfBlockInGrid.x = shl i32 %ctaid.x, 9 %BlockLB.Add.ThreadPosInBlock.x = add i32 %PositionOfBlockInGrid.x, %tid.x %isThreadLBgtLoopUB.x = icmp sgt i32 %BlockLB.Add.ThreadPosInBlock.x, 65535 br i1 %isThreadLBgtLoopUB.x, label %CUDA.AfterLoop.x, label %CUDA.LoopHeader.x.preheader CUDA.LoopHeader.x.preheader: ; preds = %"Loop Function Root" %1 = sext i32 %BlockLB.Add.ThreadPosInBlock.x to i64 store float 0.000000e+00, float* inttoptr (i64 47380979712 to float*), align 8192, !tbaa !0 %p_.moved.to.4.cl...

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

2013 Mar 11

2

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

Dear all, Attached notunrolled.ll is a module containing reduction kernel. What I'm trying to do is to unroll it in such way, that partial reduction on unrolled iterations would be performed on register, and then stored to memory only once. Currently llvm's unroller together with all standard optimizations produce code, which stores value to memory after every unrolled iteration, which is

[LLVMdev] Combining Branch Statements - Missing Optimization Pass?

2010 May 28

4

[LLVMdev] Combining Branch Statements - Missing Optimization Pass?

...hen11 %storemerge26 = phi double [ %phitmp, %then11 ], [ 1.000000e+01, %endif.endif15_crit_edge ] ; <double> [#uses=1] %lsr.iv.next = add i32 %lsr.iv, 1 ; <i32> [#uses=2] %exitcond = icmp eq i32 %lsr.iv.next, 10001 ; <i1> [#uses=1] br i1 %exitcond, label %afterloop, label %loop afterloop: ; preds = %endif15 %r_fadd19 = fadd double %anotherFloat.1, %storemerge26 ; <double> [#uses=1] %phitmp32 = fcmp ogt double %r_fadd19, 3.535200e+03 ; <i1> [#uses=1] %storemerge27 = zext i1 %phitmp32 to i32 ; &lt...

[LLVMdev] Combining Branch Statements - Missing Optimization Pass?

2010 May 28

0

[LLVMdev] Combining Branch Statements - Missing Optimization Pass?

...erge26 = phi double [ %phitmp, %then11 ], [ 1.000000e+01, %endif.endif15_crit_edge ] ; <double> [#uses=1] > %lsr.iv.next = add i32 %lsr.iv, 1 ; <i32> [#uses=2] > %exitcond = icmp eq i32 %lsr.iv.next, 10001 ; <i1> [#uses=1] > br i1 %exitcond, label %afterloop, label %loop > > afterloop: ; preds = %endif15 > %r_fadd19 = fadd double %anotherFloat.1, %storemerge26 ; <double> [#uses=1] > %phitmp32 = fcmp ogt double %r_fadd19, 3.535200e+03 ; <i1> [#uses=1] > %storemerge27 = zext i1 %phi...

[RFC][PIR] Parallel LLVM IR -- Stage 0 -- IR extension

2017 Jan 28

3

[RFC][PIR] Parallel LLVM IR -- Stage 0 -- IR extension

...fork label %task, label %latch task: %aptr = getelementptr i32, i32* %A, i32 0, i32 %i %aval = load i32* %aptr %cptr = getelementptr i32, i32* %C, i32 0, i32 %i store i32 %aval, i32* %aptr halt label %latch latch: %inc = add i32, i32 %i, i32 1 br label %header exit: join label %afterloop afterloop: ... (2) Reasoning: The proposed approach is crafted such that the semantics of the parallel program is represented correctly in almost native, low-level IR right after front-end and preserved at any point till the final lowering to sequential IR or parallel runtime library calls. To...

[LLVMdev] Emitting LLVM IR for control flow

2010 Nov 22

0

[LLVMdev] Emitting LLVM IR for control flow

...adly get 3.000000 instead of 2.000000. This happens, I believe, because the instruction %faddtmp = fadd *double* %x1.0, 1.000000e+000 ; <*double*> [#uses=2] is being generated before %ltcmptmp = fcmp ult *double* %c.0, 2.000000e+000 ; <i1> [#uses=1] br i1 %ltcmptmp, label %loop, label %afterloop and therefore the loop body is emited first and only afterwards we determine whether the loop should exit. I was wondering if this is the intended behaviour, since the fibi(x) example in chapter 7 uses this extra loop to return correct values for fibonacci numbers, or perhaps a known bug in Kaleid...

(no subject)

2017 Mar 08

5

(no subject)

...= load i32* %aptr > > %cptr = getelementptr i32, i32* %C, i32 0, i32 %i > > store i32 %aval, i32* %aptr > > halt label %latch > > > > latch: > > %inc = add i32, i32 %i, i32 1 > > br label %header > > > > exit: > > join label %afterloop > > > > afterloop: > > ... > > > > > > > > (2) Reasoning: > > The proposed approach is crafted such that the semantics of the parallel > > program is represented correctly in almost native, low-level IR right > > after front-end and pr...

(no subject)

2017 Mar 08

3

(no subject)

...%cptr = getelementptr i32, i32* %C, i32 0, i32 %i >>> store i32 %aval, i32* %aptr >>> halt label %latch >>> >>> latch: >>> %inc = add i32, i32 %i, i32 1 >>> br label %header >>> >>> exit: >>> join label %afterloop >>> >>> afterloop: >>> ... >>> >>> >>> >>> (2) Reasoning: >>> The proposed approach is crafted such that the semantics of the >>> parallel program is represented correctly in almost native, >>> low-level IR...

(no subject)

2017 Mar 08

3

(no subject)

...32, i32* %C, i32 0, i32 %i >>>> store i32 %aval, i32* %aptr >>>> halt label %latch >>>> >>>> latch: >>>> %inc = add i32, i32 %i, i32 1 >>>> br label %header >>>> >>>> exit: >>>> join label %afterloop >>>> >>>> afterloop: >>>> ... >>>> >>>> >>>> >>>> (2) Reasoning: >>>> The proposed approach is crafted such that the semantics of the parallel >>>> program is represented correctly in alm...

(no subject)

2017 Mar 08

4

(no subject)

...2 %i > >>> store i32 %aval, i32* %aptr > >>> halt label %latch > >>> > >>> latch: > >>> %inc = add i32, i32 %i, i32 1 > >>> br label %header > >>> > >>> exit: > >>> join label %afterloop > >>> > >>> afterloop: > >>> ... > >>> > >>> > >>> > >>> (2) Reasoning: > >>> The proposed approach is crafted such that the semantics of the > >>> parallel program is represented correctly...

(no subject)

2017 Mar 08

2

(no subject)

...t;>>>> halt label %latch >>>>>> >>>>>> latch: >>>>>> %inc = add i32, i32 %i, i32 1 >>>>>> br label %header >>>>>> >>>>>> exit: >>>>>> join label %afterloop >>>>>> >>>>>> afterloop: >>>>>> ... >>>>>> >>>>>> >>>>>> >>>>>> (2) Reasoning: >>>>>> The proposed approach is crafted such that the semantics of the >&...

(no subject)

2017 Mar 08

2

(no subject)

...l, i32* %aptr > > >>> halt label %latch > > >>> > > >>> latch: > > >>> %inc = add i32, i32 %i, i32 1 > > >>> br label %header > > >>> > > >>> exit: > > >>> join label %afterloop > > >>> > > >>> afterloop: > > >>> ... > > >>> > > >>> > > >>> > > >>> (2) Reasoning: > > >>> The proposed approach is crafted such that the semantics of the > > >>&g...

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

2017 Mar 08

3

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

...t;>>>> halt label %latch >>>>>> >>>>>> latch: >>>>>> %inc = add i32, i32 %i, i32 1 >>>>>> br label %header >>>>>> >>>>>> exit: >>>>>> join label %afterloop >>>>>> >>>>>> afterloop: >>>>>> ... >>>>>> >>>>>> >>>>>> >>>>>> (2) Reasoning: >>>>>> The proposed approach is crafted such that the semantics of the >...

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

2017 Mar 08

2

[RFC][PIR] Parallel LLVM IR -- Stage 0 --

...gt;>>> >>>>>>>> latch: >>>>>>>> %inc = add i32, i32 %i, i32 1 >>>>>>>> br label %header >>>>>>>> >>>>>>>> exit: >>>>>>>> join label %afterloop >>>>>>>> >>>>>>>> afterloop: >>>>>>>> ... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> (2) Reasoning: >>>>>>>> T...

search for: afterloop