Displaying 2 results from an estimated 2 matches for "blocklb".
2013 Mar 11
0
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
...;
define ptx_kernel void @__kernelgen_matvec_loop_7(i32* nocapture) #0 {
"Loop Function Root":
%tid.x = tail call ptx_device i32 @llvm.nvvm.read.ptx.sreg.tid.x()
%ctaid.x = tail call ptx_device i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
%PositionOfBlockInGrid.x = shl i32 %ctaid.x, 9
%BlockLB.Add.ThreadPosInBlock.x = add i32 %PositionOfBlockInGrid.x, %tid.x
%isThreadLBgtLoopUB.x = icmp sgt i32 %BlockLB.Add.ThreadPosInBlock.x,
65535
br i1 %isThreadLBgtLoopUB.x, label %CUDA.AfterLoop.x, label
%CUDA.LoopHeader.x.preheader
CUDA.LoopHeader.x.preheader: ; preds = %&q...
2013 Mar 11
2
[LLVMdev] How to unroll reduction loop with caching accumulator on register?
Dear all,
Attached notunrolled.ll is a module containing reduction kernel. What I'm
trying to do is to unroll it in such way, that partial reduction on
unrolled iterations would be performed on register, and then stored to
memory only once. Currently llvm's unroller together with all standard
optimizations produce code, which stores value to memory after every
unrolled iteration, which is