search for: fatbin

Displaying 9 results from an estimated 9 matches for "fatbin".

2016 Mar 15
2
instrumenting device code with gpucc
Hi Jingyue, Sorry to ask again, but how exactly could I glue the fatbin with the instrumented host code? Or does it mean we actually cannot instrument both the host & device code at the same time? Thanks! yuanfeng On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at google.com> wrote: > Including fatbin into host code should be done in frontend. &g...
2017 Jun 09
1
NVPTX Back-end: relocatable device code support for dynamic parallelism
...-gen_device_file_name "/tmp/tmpxft_00007040_00000000-4_cuda_id_test.cudafe1.gpu" --nv_arch "compute_35" --gen_module_id_file --module_id_file_name "/tmp/tmpxft_00007040_00000000-3_cuda_id_test.module_id" --include_file_name "tmpxft_00007040_00000000-2_cuda_id_test.fatbin.c" "/tmp/tmpxft_00007040_00000000-9_cuda_id_test.cpp1.ii" #$ gcc -std=c++11 -E -x c++ -D__CUDACC__ -D__NVCC__ -D__CUDACC_RDC__ "-I/opt/cuda-8.0/bin/..//include" -D"__CUDACC_VER__=80061" -D"__CUDACC_VER_BUILD__=61" -D"__CUDACC_VER_MINOR__=0"...
2017 Sep 27
2
OrcJIT + CUDA Prototype for Cling
...I am currently a stuck with a runtime issue, on which my interpreter prototype fails to execute kernels with a CUDA runtime error. === How to use the prototype This application interprets cuda runtime code. The program needs the whole cuda-program (.cu-file) and its pre-compiled device code (as fatbin) as an input:     command: cuda-interpreter [source].cu [kernels].fatbin I also implemented an alternative mode, which is generating an object file. The object file can be linked (ld) to an exectuable. This mode is just implemented to check if the LLVM module generation works as expected. Activat...
2016 Mar 13
2
instrumenting device code with gpucc
...nsive! I finally figured out a way to resolve the issue: all I have to do is to use `-only-needed` when merging the device bitcodes with llvm-link. However, since we actually need to instrument the host code as well, I encountered another issue when I tried to glue the instrumented host code and fatbin together. When I only instrumented the device code, I used the following cmd to do so: "/mnt/wtf/tools/bin/clang-3.9" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-aux-triple" "nvptx64-nvidia-cuda" "-fcuda-target-overloads" &qu...
2017 Jun 14
4
[CUDA] Lost debug information when compiling CUDA code
...d-a-device.bc after this step; 3) Generate IR files for b.cu: clang++ -g -emit-llvm --cuda-gpu-arch=sm_35 -c b.cu; 4) Link instrumented-a.device.bc with the device code generated for b.cu: llvm-link intrumented-a-device.bc b-cuda-nvptx64-nvidia-cuda-sm_35.bc -o ab-device.bc; 5) Use llc, ptxas & fatbinary on ab-device.bc to get ab-device.ptx, ab-device.o & ab-device.fatbin; 6) Call clang again the generate the host object file ab.o, with ab-device.o & ab-device.fatbin embedded; 7) Link against libraries and get the final binary: a.out. The binary a.out fails with an exception I when run...
2017 Nov 14
1
OrcJIT + CUDA Prototype for Cling
...interpreter prototype fails to execute kernels with a CUDA runtime > error. > > > === How to use the prototype > > This application interprets cuda runtime code. The program needs > the whole cuda-program (.cu-file) and its pre-compiled device code > (as fatbin) as an input: > >     command: cuda-interpreter [source].cu [kernels].fatbin > > I also implemented an alternative mode, which is generating an > object file. The object file can be linked (ld) to an exectuable. > This mode is just implemented to check if the LLVM...
2016 Mar 12
2
instrumenting device code with gpucc
Hey Jingyue, Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect anchor didn't go away; ptxas is still complaining about the duplicate definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused the nvvm-reflect pass? Thanks! yuanfeng On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote: > According to the examples you
2016 Mar 05
2
instrumenting device code with gpucc
...link the modified axpy-sm_20.bc to the final binary, you need several extra steps: 1. Compile axpy-sm_20.bc to PTX assembly using llc: llc axpy-sm_20.bc -o axpy-sm_20.ptx -march=<nvptx or nvptx64> 2. Compile the PTX assembly to SASS using ptxas 3. Make the SASS a fat binary using NVIDIA's fatbinary tool 4. Link the fat binary to the host code using ld. Clang does step 2-4 by invoking subcommands. Therefore, you can use "clang -###" to dump all the subcommands, and then find the ones for step 2-4. For example, $ clang++ -### -O3 axpy.cu -I/usr/local/cuda/samples/common/inc -L/us...
2016 Mar 10
4
instrumenting device code with gpucc
...binary, you need several >> extra steps: >> 1. Compile axpy-sm_20.bc to PTX assembly using llc: llc axpy-sm_20.bc -o >> axpy-sm_20.ptx -march=<nvptx or nvptx64> >> 2. Compile the PTX assembly to SASS using ptxas >> 3. Make the SASS a fat binary using NVIDIA's fatbinary tool >> 4. Link the fat binary to the host code using ld. >> >> Clang does step 2-4 by invoking subcommands. Therefore, you can use >> "clang -###" to dump all the subcommands, and then find the ones for step >> 2-4. For example, >> >> $ clang+...