thr3ads.net - search: "fatbin"

Displaying 9 results from an estimated 9 matches for "fatbin".

2016 Mar 15

instrumenting device code with gpucc

Hi Jingyue, Sorry to ask again, but how exactly could I glue the fatbin with the instrumented host code? Or does it mean we actually cannot instrument both the host & device code at the same time? Thanks! yuanfeng On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at google.com> wrote: > Including fatbin into host code should be done in frontend. &g...

NVPTX Back-end: relocatable device code support for dynamic parallelism

2017 Jun 09

NVPTX Back-end: relocatable device code support for dynamic parallelism

...-gen_device_file_name "/tmp/tmpxft_00007040_00000000-4_cuda_id_test.cudafe1.gpu" --nv_arch "compute_35" --gen_module_id_file --module_id_file_name "/tmp/tmpxft_00007040_00000000-3_cuda_id_test.module_id" --include_file_name "tmpxft_00007040_00000000-2_cuda_id_test.fatbin.c" "/tmp/tmpxft_00007040_00000000-9_cuda_id_test.cpp1.ii" #$ gcc -std=c++11 -E -x c++ -D__CUDACC__ -D__NVCC__ -D__CUDACC_RDC__ "-I/opt/cuda-8.0/bin/..//include" -D"__CUDACC_VER__=80061" -D"__CUDACC_VER_BUILD__=61" -D"__CUDACC_VER_MINOR__=0"...

OrcJIT + CUDA Prototype for Cling

2017 Sep 27

OrcJIT + CUDA Prototype for Cling

...I am currently a stuck with a runtime issue, on which my interpreter prototype fails to execute kernels with a CUDA runtime error. === How to use the prototype This application interprets cuda runtime code. The program needs the whole cuda-program (.cu-file) and its pre-compiled device code (as fatbin) as an input: command: cuda-interpreter [source].cu [kernels].fatbin I also implemented an alternative mode, which is generating an object file. The object file can be linked (ld) to an exectuable. This mode is just implemented to check if the LLVM module generation works as expected. Activat...

instrumenting device code with gpucc

2016 Mar 13

instrumenting device code with gpucc

...nsive! I finally figured out a way to resolve the issue: all I have to do is to use `-only-needed` when merging the device bitcodes with llvm-link. However, since we actually need to instrument the host code as well, I encountered another issue when I tried to glue the instrumented host code and fatbin together. When I only instrumented the device code, I used the following cmd to do so: "/mnt/wtf/tools/bin/clang-3.9" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-aux-triple" "nvptx64-nvidia-cuda" "-fcuda-target-overloads" &qu...

[CUDA] Lost debug information when compiling CUDA code

2017 Jun 14

[CUDA] Lost debug information when compiling CUDA code

...d-a-device.bc after this step; 3) Generate IR files for b.cu: clang++ -g -emit-llvm --cuda-gpu-arch=sm_35 -c b.cu; 4) Link instrumented-a.device.bc with the device code generated for b.cu: llvm-link intrumented-a-device.bc b-cuda-nvptx64-nvidia-cuda-sm_35.bc -o ab-device.bc; 5) Use llc, ptxas & fatbinary on ab-device.bc to get ab-device.ptx, ab-device.o & ab-device.fatbin; 6) Call clang again the generate the host object file ab.o, with ab-device.o & ab-device.fatbin embedded; 7) Link against libraries and get the final binary: a.out. The binary a.out fails with an exception I when run...

OrcJIT + CUDA Prototype for Cling

2017 Nov 14

OrcJIT + CUDA Prototype for Cling

...interpreter prototype fails to execute kernels with a CUDA runtime > error. > > > === How to use the prototype > > This application interprets cuda runtime code. The program needs > the whole cuda-program (.cu-file) and its pre-compiled device code > (as fatbin) as an input: > > command: cuda-interpreter [source].cu [kernels].fatbin > > I also implemented an alternative mode, which is generating an > object file. The object file can be linked (ld) to an exectuable. > This mode is just implemented to check if the LLVM...

instrumenting device code with gpucc

2016 Mar 12

instrumenting device code with gpucc

Hey Jingyue, Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect anchor didn't go away; ptxas is still complaining about the duplicate definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused the nvvm-reflect pass? Thanks! yuanfeng On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote: > According to the examples you

instrumenting device code with gpucc

2016 Mar 05

instrumenting device code with gpucc

...link the modified axpy-sm_20.bc to the final binary, you need several extra steps: 1. Compile axpy-sm_20.bc to PTX assembly using llc: llc axpy-sm_20.bc -o axpy-sm_20.ptx -march=<nvptx or nvptx64> 2. Compile the PTX assembly to SASS using ptxas 3. Make the SASS a fat binary using NVIDIA's fatbinary tool 4. Link the fat binary to the host code using ld. Clang does step 2-4 by invoking subcommands. Therefore, you can use "clang -###" to dump all the subcommands, and then find the ones for step 2-4. For example, $ clang++ -### -O3 axpy.cu -I/usr/local/cuda/samples/common/inc -L/us...

instrumenting device code with gpucc

2016 Mar 10

instrumenting device code with gpucc

...binary, you need several >> extra steps: >> 1. Compile axpy-sm_20.bc to PTX assembly using llc: llc axpy-sm_20.bc -o >> axpy-sm_20.ptx -march=<nvptx or nvptx64> >> 2. Compile the PTX assembly to SASS using ptxas >> 3. Make the SASS a fat binary using NVIDIA's fatbinary tool >> 4. Link the fat binary to the host code using ld. >> >> Clang does step 2-4 by invoking subcommands. Therefore, you can use >> "clang -###" to dump all the subcommands, and then find the ones for step >> 2-4. For example, >> >> $ clang+...

search for: fatbin