thr3ads.net - search: "nvvm"

Displaying 20 results from an estimated 70 matches for "nvvm".

Did you mean: nvkm

[LLVMdev] Emitting IR in older formats (for NVVM)

2015 Jan 12

[LLVMdev] Emitting IR in older formats (for NVVM)

This question is specifically motivated by the practical constraints of NVVM, but I don't know anywhere better to ask (hopefully, e.g., @jholewinski is still following), and I believe it concerns general LLVM issues: NVIDIA's libNVVM is built on LLVM 3.2. This means its bitcode and LL text parsers are from that generation. It's interface calls for adding module...

Missing TargetPrefix for NVVM intrinsics

2016 Jul 01

Missing TargetPrefix for NVVM intrinsics

Justins: I noticed that the intrinsics in IntrinsicsNVVM don't specify a TargetPrefix. This seems like a simple omission, so I was going to simply throw a `let TargetPrefix = "nvvm" ` block around them, but this doesn't quite work. There seem to be three prefixes that are used in this file. About 900 are int_nvvm_*, 30 are int_ptx_*, a...

[LLVMdev] Behaviour of NVPTX intrinsic

2014 Sep 30

[LLVMdev] Behaviour of NVPTX intrinsic

...arrier intrinsics. test.ll ------- ; ModuleID = 'test.bc' define void @test(i16* %I_0, i16* %I_1, i16* %I_2, i16* %I_3, i16* %O_0) { entry: %T_0 = load volatile i16* %I_0 %T_1 = load volatile i16* %I_1 %T_2 = load volatile i16* %I_2 %T_3 = load volatile i16* %I_3 call void @llvm.nvvm.barrier0() %T_5 = add i16 %T_1, %T_3 call void @llvm.nvvm.barrier0() %T_7 = mul i16 %T_0, %T_2 %T_8 = xor i16 %T_2, %T_0 %T_9 = mul i16 %T_0, %T_1 call void @llvm.nvvm.barrier0() %T_11 = sub i16 %T_7, %T_5 %T_12 = add i16 %T_8, %T_9 %T_13 = add i16 %T_11, %T_12 store volatile i1...

[LLVMdev] Emitting IR in older formats (for NVVM)

2015 Jan 13

[LLVMdev] Emitting IR in older formats (for NVVM)

Since SPIR can be (easily) transformed to NVVM IR at least for me this helps a lot. Thank you Tobias. -MH On January 12, 2015, Tobias Grosser <tgrosser at inf.ethz.ch> wrote: > On 12.01.2015 05:48, Jonathan Ragan-Kelley wrote: > > This question is specifically motivated by the practical constraints of > > NVVM, but I don...

[LLVMdev] Behaviour of NVPTX intrinsic

2014 Sep 30

[LLVMdev] Behaviour of NVPTX intrinsic

is there any guarantee that the nvptx intrinsic "llvm.nvvm.barrier0" will not be moved around by opt ? In other words, can I expect all the instructions above "llvm.nvvm.barrier0" to remain above it and those below it to remain below, after all the opt passes are run ? If that is not the case, is there a way to define such an intrinsic ? Th...

instrumenting device code with gpucc

2016 Mar 12

instrumenting device code with gpucc

Hey Jingyue, Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect anchor didn't go away; ptxas is still complaining about the duplicate definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused the nvvm-reflect pass? Thanks! yuanfeng On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue a...

[LLVMdev] Emitting IR in older formats (for NVVM)

2015 Jan 13

[LLVMdev] Emitting IR in older formats (for NVVM)

Thanks, all. I didn’t realize a 7.0 RC was public and changed to 3.4—I will go down that road for now, though I’ll probably also look into integrating variants of the SPIR converter in the future. Another possibility is to skip libnvvm altogether and use LLVM's NVPTX target. This is of course harder since you have to configure the passes yourself instead of just calling a few C functions, but it does give you more control over the optimization pipeline and gives you full visibility into the compiler. Unfortunately, there ar...

instrumenting device code with gpucc

2016 Mar 13

instrumenting device code with gpucc

...cmd should I use to link axpy.s with axpy-sm_30.fatbin? I tried to use -cc1as, but the flag '-fcuda-include-gpubinary' was not recognized. Thanks! yuanfeng On Sat, Mar 12, 2016 at 12:05 AM, Jingyue Wu <jingyue at google.com> wrote: > I've no idea. Without instrumentation, nvvm_reflect_anchor doesn't appear > in the final PTX, right? If that's the case, some pass in llc must have > deleted the anchor and you should be able to figure out which one. > > On Fri, Mar 11, 2016 at 4:56 PM, Yuanfeng Peng < > yuanfeng.jack.peng at gmail.com> wrote: &g...

instrumenting device code with gpucc

2016 Mar 15

instrumenting device code with gpucc

...1as, >> but the flag '-fcuda-include-gpubinary' was not recognized. >> >> Thanks! >> >> yuanfeng >> >> On Sat, Mar 12, 2016 at 12:05 AM, Jingyue Wu <jingyue at google.com> wrote: >> >>> I've no idea. Without instrumentation, nvvm_reflect_anchor doesn't >>> appear in the final PTX, right? If that's the case, some pass in llc must >>> have deleted the anchor and you should be able to figure out which one. >>> >>> On Fri, Mar 11, 2016 at 4:56 PM, Yuanfeng Peng < >>> yuanf...

Information about the number of indices in memory accesses

2020 Sep 23

Information about the number of indices in memory accesses

...So now the operand of every load and store is a GEP instruction. For simple stuff i am getting the right answer but when the index expression becomes more complex multiple GEPs are introduced. For instance: *(A+2*(blockDim.x*blockIdx.x+threadIdx.x+1)+2+3) = 5; produces: %6 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x() %7 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() %8 = mul i32 %6, %7, %9 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() %10 = add i32 %8, %9, %11 = add i32 %10, 1, %12 = mul i32 2, %11, %13 = zext i32 %12 to i64, %14 = getelementptr inbounds i32, i32* %0,...

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Jul 11

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

...uot;, "=f,f,f,f"(float %sub7, float 4.097000e+03, float %3) nounwind %5 = tail call float asm "mad.f32 $0, $1, $2, $3;", "=f,f,f,f"(float %3, float 4.097000e+03, float %4) nounwind %sub12 = fsub float %0, %2 %sub14 = fsub float %3, %5 %6 = tail call float @llvm.nvvm.mul.rn.f(float %0, float %3) nounwind %sub18 = fsub float -0.000000e+00, %6 %7 = tail call float asm "mad.f32 $0, $1, $2, $3;", "=f,f,f,f"(float %2, float %5, float %sub18) nounwind %8 = tail call float asm "mad.f32 $0, $1, $2, $3;", "=f,f,f,f"(float %2...

instrumenting device code with gpucc

2016 Mar 10

instrumenting device code with gpucc

....cu" "-mrelocation-model" "static" "-mthread-model" "posix" >> "-mdisable-fp-elim" "-fmath-errno" "-no-integrated-as" "-fcuda-is-device" >> "-mlink-cuda-bitcode" >> "/usr/local/cuda/nvvm/libdevice/libdevice.compute_35.10.bc" >> "-target-feature" "+ptx42" "-target-cpu" "sm_35" "-dwarf-column-info" >> "-debugger-tuning=gdb" "-resource-dir" >> "/usr/local/google/home/jingyue/Work/llvm/ins...

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

2012 May 01

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

...too much, I'm sure we could change it. :) > > * The register naming seems a little arbitrary as well, using FL prefixes for 64- > bit float and da prefixes for 64-bit float arguments for example. Really, any choice is going to be arbitrary. > > * Something I picked up in the NVVM IR spec - it seems to only be possible to > use the bar.sync 0 instruction. Unless this is being removed for PTX 3.0, the > spec (and the PTX backend) support using bar.sync {0..15}. The old PTX > intrinsic also supports a non-zero integer operand. The NVVM intrinsic is there to implement...

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

2012 May 02

[LLVMdev] [llvm-commits] [PATCH][RFC] NVPTX Backend

...refixes for 64-bit float arguments for example. </pre> </blockquote> <pre wrap=""> Really, any choice is going to be arbitrary. </pre> <blockquote type="cite"> <pre wrap="">* Something I picked up in the NVVM IR spec - it seems to only be possible to use the bar.sync 0 instruction. Unless this is being removed for PTX 3.0, the spec (and the PTX backend) support using bar.sync {0..15}. The old PTX intrinsic also supports a non-zero integer operand. </pre> </blockquote> <pre wrap=&q...

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

2013 Mar 11

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

...128:128:128-n16:32:64" target triple = "nvptx64-unknown-unknown" @__kernelgen_version = constant [15 x i8] c"0.2/1654:1675M\00" define ptx_kernel void @__kernelgen_matvec_loop_7(i32* nocapture) #0 { "Loop Function Root": %tid.x = tail call ptx_device i32 @llvm.nvvm.read.ptx.sreg.tid.x() %ctaid.x = tail call ptx_device i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() %PositionOfBlockInGrid.x = shl i32 %ctaid.x, 9 %BlockLB.Add.ThreadPosInBlock.x = add i32 %PositionOfBlockInGrid.x, %tid.x %isThreadLBgtLoopUB.x = icmp sgt i32 %BlockLB.Add.ThreadPosInBlock.x, 65535...

Information about the number of indices in memory accesses

2020 Oct 03

Information about the number of indices in memory accesses

...le stuff i am getting the right answer but when the index > > expression becomes more complex multiple GEPs are introduced. For > > instance: > > > > *(A+2*(blockDim.x*blockIdx.x+threadIdx.x+1)+2+3) = 5; > > > > produces: > > > > %6 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x() > > %7 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() > > %8 = mul i32 %6, %7, > > %9 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() > > %10 = add i32 %8, %9, > > %11 = add i32 %10, 1, > > %12 = mul i32 2, %11, > > %13 = zex...

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

2013 Mar 11

[LLVMdev] How to unroll reduction loop with caching accumulator on register?

Dear all, Attached notunrolled.ll is a module containing reduction kernel. What I'm trying to do is to unroll it in such way, that partial reduction on unrolled iterations would be performed on register, and then stored to memory only once. Currently llvm's unroller together with all standard optimizations produce code, which stores value to memory after every unrolled iteration, which is

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

2012 Nov 09

[LLVMdev] [NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params

...at > %sub7, float 4.097000e+03, float %3) nounwind > %5 = tail call float asm "mad.f32 $0, $1, $2, $3;", "=f,f,f,f"(float %3, > float 4.097000e+03, float %4) nounwind > %sub12 = fsub float %0, %2 > %sub14 = fsub float %3, %5 > %6 = tail call float @llvm.nvvm.mul.rn.f(float %0, float %3) nounwind > %sub18 = fsub float -0.000000e+00, %6 > %7 = tail call float asm "mad.f32 $0, $1, $2, $3;", "=f,f,f,f"(float %2, > float %5, float %sub18) nounwind > %8 = tail call float asm "mad.f32 $0, $1, $2, $3;", "=f,...

[LLVMdev] C++AMP -> OpenCL (NVPTX) prototype

2013 Apr 14

[LLVMdev] C++AMP -> OpenCL (NVPTX) prototype

...on amd64 linux with the stable binary drivers. The compilation process currently works like this: .cpp -> [clang++ -fc++-amp] -> .ll - compile non-amp code .cpp -> [clang++ -fc++-amp -famp-is-kernel] -> .amp.ll - compile amp kernels only .amp.ll -> [opt -amp-to-opencl] -> .nvvm.ll - create kernel wrapper to deal with buffer/const inputs - add nvvm annotations .nvvm.ll -> [llc -march=nvptx] -> .ptx - compile kernels to NVPTX (unchanged) .ll + .ptx -> [opt -amp-create-stubs .ptx] -> .opt.ll - embed ptx as array data - create functions to get kernel info,...

Information about the number of indices in memory accesses

2020 Oct 03

Information about the number of indices in memory accesses

...t;> > expression becomes more complex multiple GEPs are introduced. For >>> > instance: >>> > >>> > *(A+2*(blockDim.x*blockIdx.x+threadIdx.x+1)+2+3) = 5; >>> > >>> > produces: >>> > >>> > %6 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x() >>> > %7 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() >>> > %8 = mul i32 %6, %7, >>> > %9 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() >>> > %10 = add i32 %8, %9, >>> > %11 = add i32 %10, 1, >>> &g...

search for: nvvm