Displaying 9 results from an estimated 9 matches for "fatbin".
2016 Mar 15
2
instrumenting device code with gpucc
Hi Jingyue,
Sorry to ask again, but how exactly could I glue the fatbin with the
instrumented host code? Or does it mean we actually cannot instrument both
the host & device code at the same time?
Thanks!
yuanfeng
On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at google.com> wrote:
> Including fatbin into host code should be done in frontend.
&g...
2017 Jun 09
1
NVPTX Back-end: relocatable device code support for dynamic parallelism
...-gen_device_file_name "/tmp/tmpxft_00007040_00000000-4_cuda_id_test.cudafe1.gpu" --nv_arch "compute_35" --gen_module_id_file --module_id_file_name "/tmp/tmpxft_00007040_00000000-3_cuda_id_test.module_id" --include_file_name "tmpxft_00007040_00000000-2_cuda_id_test.fatbin.c" "/tmp/tmpxft_00007040_00000000-9_cuda_id_test.cpp1.ii"
#$ gcc -std=c++11 -E -x c++ -D__CUDACC__ -D__NVCC__ -D__CUDACC_RDC__ "-I/opt/cuda-8.0/bin/..//include" -D"__CUDACC_VER__=80061" -D"__CUDACC_VER_BUILD__=61" -D"__CUDACC_VER_MINOR__=0"...
2017 Sep 27
2
OrcJIT + CUDA Prototype for Cling
...I am currently a stuck with a runtime issue, on which my
interpreter prototype fails to execute kernels with a CUDA runtime error.
=== How to use the prototype
This application interprets cuda runtime code. The program needs the
whole cuda-program (.cu-file) and its pre-compiled device code (as
fatbin) as an input:
command: cuda-interpreter [source].cu [kernels].fatbin
I also implemented an alternative mode, which is generating an object
file. The object file can be linked (ld) to an exectuable. This mode is
just implemented to check if the LLVM module generation works as
expected. Activat...
2016 Mar 13
2
instrumenting device code with gpucc
...nsive! I finally figured out a way to resolve the
issue: all I have to do is to use `-only-needed` when merging the device
bitcodes with llvm-link.
However, since we actually need to instrument the host code as well, I
encountered another issue when I tried to glue the instrumented host code
and fatbin together. When I only instrumented the device code, I used the
following cmd to do so:
"/mnt/wtf/tools/bin/clang-3.9" "-cc1" "-triple" "x86_64-unknown-linux-gnu"
"-aux-triple" "nvptx64-nvidia-cuda" "-fcuda-target-overloads"
&qu...
2017 Jun 14
4
[CUDA] Lost debug information when compiling CUDA code
...d-a-device.bc after this step;
3) Generate IR files for b.cu: clang++ -g -emit-llvm --cuda-gpu-arch=sm_35 -c b.cu;
4) Link instrumented-a.device.bc with the device code generated for b.cu: llvm-link intrumented-a-device.bc b-cuda-nvptx64-nvidia-cuda-sm_35.bc -o ab-device.bc;
5) Use llc, ptxas & fatbinary on ab-device.bc to get ab-device.ptx, ab-device.o & ab-device.fatbin;
6) Call clang again the generate the host object file ab.o, with ab-device.o & ab-device.fatbin embedded;
7) Link against libraries and get the final binary: a.out.
The binary a.out fails with an exception I when run...
2017 Nov 14
1
OrcJIT + CUDA Prototype for Cling
...interpreter prototype fails to execute kernels with a CUDA runtime
> error.
>
>
> === How to use the prototype
>
> This application interprets cuda runtime code. The program needs
> the whole cuda-program (.cu-file) and its pre-compiled device code
> (as fatbin) as an input:
>
> command: cuda-interpreter [source].cu [kernels].fatbin
>
> I also implemented an alternative mode, which is generating an
> object file. The object file can be linked (ld) to an exectuable.
> This mode is just implemented to check if the LLVM...
2016 Mar 12
2
instrumenting device code with gpucc
Hey Jingyue,
Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect
anchor didn't go away; ptxas is still complaining about the duplicate
definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused
the nvvm-reflect pass?
Thanks!
yuanfeng
On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote:
> According to the examples you
2016 Mar 05
2
instrumenting device code with gpucc
...link the modified axpy-sm_20.bc to the final binary, you need several
extra steps:
1. Compile axpy-sm_20.bc to PTX assembly using llc: llc axpy-sm_20.bc -o
axpy-sm_20.ptx -march=<nvptx or nvptx64>
2. Compile the PTX assembly to SASS using ptxas
3. Make the SASS a fat binary using NVIDIA's fatbinary tool
4. Link the fat binary to the host code using ld.
Clang does step 2-4 by invoking subcommands. Therefore, you can use "clang
-###" to dump all the subcommands, and then find the ones for step 2-4. For
example,
$ clang++ -### -O3 axpy.cu -I/usr/local/cuda/samples/common/inc
-L/us...
2016 Mar 10
4
instrumenting device code with gpucc
...binary, you need several
>> extra steps:
>> 1. Compile axpy-sm_20.bc to PTX assembly using llc: llc axpy-sm_20.bc -o
>> axpy-sm_20.ptx -march=<nvptx or nvptx64>
>> 2. Compile the PTX assembly to SASS using ptxas
>> 3. Make the SASS a fat binary using NVIDIA's fatbinary tool
>> 4. Link the fat binary to the host code using ld.
>>
>> Clang does step 2-4 by invoking subcommands. Therefore, you can use
>> "clang -###" to dump all the subcommands, and then find the ones for step
>> 2-4. For example,
>>
>> $ clang+...