thr3ads.net - llvm dev - [llvm-dev] instrumenting device code with gpucc [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Yuanfeng Peng via llvm-dev

2016-Mar-15 17:45 UTC

[llvm-dev] instrumenting device code with gpucc

Hi Jingyue,

Sorry to ask again, but how exactly could I glue the fatbin with the
instrumented host code?  Or does it mean we actually cannot instrument both
the host & device code at the same time?

Thanks!
yuanfeng

On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at google.com>
wrote:
> Including fatbin into host code should be done in frontend.
>
> On Mon, Mar 14, 2016 at 12:13 AM, Yuanfeng Peng <
> yuanfeng.jack.peng at gmail.com> wrote:
>
>> Hey Jingyue,
>>
>> Thanks for being so responsive!  I finally figured out a way to resolve
>> the issue: all I have to do is to use `-only-needed` when merging the
>> device bitcodes with llvm-link.
>>
>> However, since we actually need to instrument the host code as well,  I
>> encountered another issue when I tried to glue the instrumented host
code
>> and fatbin together.  When I only instrumented the device code, I used
the
>> following cmd to do so:
>>
>> "/mnt/wtf/tools/bin/clang-3.9" "-cc1"
"-triple"
>> "x86_64-unknown-linux-gnu" "-aux-triple"
"nvptx64-nvidia-cuda"
>> "-fcuda-target-overloads"
"-fcuda-disable-target-call-checks" "-emit-obj"
>> "-disable-free" "-main-file-name"
"axpy.cu" "-mrelocation-model"
>> "static" "-mthread-model" "posix"
"-fmath-errno" "-masm-verbose"
>> "-mconstructor-aliases" "-munwind-tables"
"-fuse-init-array" "-target-cpu"
>> "x86-64" "-momit-leaf-frame-pointer"
"-dwarf-column-info"
>> "-debugger-tuning=gdb" "-resource-dir"
>> "/mnt/wtf/tools/bin/../lib/clang/3.9.0" "-I"
>> "/usr/local/cuda-7.0/samples/common/inc"
"-internal-isystem"
>>
"/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8"
>> "-internal-isystem"
>>
"/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8"
>> "-internal-isystem"
>>
"/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8"
>> "-internal-isystem"
>>
"/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward"
>> "-internal-isystem" "/usr/local/include"
"-internal-isystem"
>> "/mnt/wtf/tools/bin/../lib/clang/3.9.0/include"
"-internal-externc-isystem"
>> "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include"
>> "-internal-externc-isystem" "/usr/include"
"-internal-isystem"
>> "/usr/local/cuda/include" "-include"
"__clang_cuda_runtime_wrapper.h" "-O3"
>> "-fdeprecated-macro" "-fdebug-compilation-dir"
>> "/mnt/wtf/workspace/cuda/gpu-race-detection"
"-ferror-limit" "19"
>> "-fmessage-length" "291" "-pthread"
"-fobjc-runtime=gcc" "-fcxx-exceptions"
>> "-fexceptions" "-fdiagnostics-show-option"
"-vectorize-loops"
>> "-vectorize-slp" "-o" "axpy-host.o"
"-x" "cuda" "tests/axpy.cu"
>> "-fcuda-include-gpubinary" "axpy-sm_30.fatbin"
>>
>> which, from my understanding, compiles the host code in tests/axpy.cu
>> and link it with axpy-sm_30.fatbin.  However, now that I instrumented
the
>> IR of the host code (axpy.bc) and did `llc axpy.bc -o axpy.s`, which
cmd
>> should I use to link axpy.s with axpy-sm_30.fatbin?  I tried to use
-cc1as,
>> but the flag '-fcuda-include-gpubinary' was not recognized.
>>
>> Thanks!
>>
>> yuanfeng
>>
>> On Sat, Mar 12, 2016 at 12:05 AM, Jingyue Wu <jingyue at
google.com> wrote:
>>
>>> I've no idea. Without instrumentation, nvvm_reflect_anchor
doesn't
>>> appear in the final PTX, right? If that's the case, some pass
in llc must
>>> have deleted the anchor and you should be able to figure out which
one.
>>>
>>> On Fri, Mar 11, 2016 at 4:56 PM, Yuanfeng Peng <
>>> yuanfeng.jack.peng at gmail.com> wrote:
>>>
>>>> Hey Jingyue,
>>>>
>>>> Though I tried `opt -nvvm-reflect` on both bc files, the nvvm
reflect
>>>> anchor didn't go away; ptxas is still complaining about the
duplicate
>>>> definition of of function '_ZL21__nvvm_reflect_anchorv'
.  Did I misused
>>>> the nvvm-reflect pass?
>>>>
>>>> Thanks!
>>>> yuanfeng
>>>>
>>>> On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at
google.com>
>>>> wrote:
>>>>
>>>>> According to the examples you sent, I believe the linking
issue was
>>>>> caused by nvvm reflection anchors. I haven't played
with that, but I guess
>>>>> running nvvm-reflect on an IR removes the nvvm reflect
anchors. After that,
>>>>> you can llvm-link the two bc/ll files.
>>>>>
>>>>> Another potential issue is that your cuda_hooks-sm_30.ll is
>>>>> unoptimized. This could cause the instrumented code to run
super slow.
>>>>>
>>>>> On Fri, Mar 11, 2016 at 9:40 AM, Yuanfeng Peng <
>>>>> yuanfeng.jack.peng at gmail.com> wrote:
>>>>>
>>>>>> Hey Jingyue,
>>>>>>
>>>>>> Attached are the .ll files.  Thanks!
>>>>>>
>>>>>> yuanfeng
>>>>>>
>>>>>> On Fri, Mar 11, 2016 at 3:47 AM, Jingyue Wu <jingyue
at google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Looks like we are getting closer!
>>>>>>>
>>>>>>> On Thu, Mar 10, 2016 at 5:21 PM, Yuanfeng Peng <
>>>>>>> yuanfeng.jack.peng at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Jingyue,
>>>>>>>>
>>>>>>>> Thank you so much for the helpful response!  I
didn't know that PTX
>>>>>>>> assembly cannot be linked; that's likely
the reason for my issue.
>>>>>>>>
>>>>>>>> So I did the following as you
suggested(axpy-sm_30.bc is the
>>>>>>>> instrumented bitcode, and cuda_hooks-sm_30.bc
contains the hook functions):
>>>>>>>>
>>>>>>>> *llvm-link axpy-sm_30.bc cuda_hooks-sm_30.bc 
-o inst_axpy-sm_30.bc*
>>>>>>>>
>>>>>>>> *llc inst_axpy-sm_30.bc -o axpy-sm_30.s*
>>>>>>>>
>>>>>>>> *"/usr/local/cuda/bin/ptxas"
"-m64" "-O3" -c "--gpu-name" "sm_30"
>>>>>>>> "--output-file" axpy-sm_30.o
axpy-sm_30.s*
>>>>>>>>
>>>>>>>> However, I got the following error from ptxas:
>>>>>>>>
>>>>>>>> *ptxas axpy-sm_30.s, line 106; error   :
Duplicate definition of
>>>>>>>> function '_ZL21__nvvm_reflect_anchorv'*
>>>>>>>>
>>>>>>>> *ptxas axpy-sm_30.s, line 106; fatal   :
Parsing error near '.2':
>>>>>>>> syntax error*
>>>>>>>>
>>>>>>>> *ptxas fatal   : Ptx assembly aborted due to
errors*
>>>>>>>>
>>>>>>>> Looks like some cuda function definitions are
in both bitcode files
>>>>>>>> which caused duplicate definition... what am I
supposed to do to resolve
>>>>>>>> this issue?
>>>>>>>>
>>>>>>> Can you attach axpy-sm_30.ll and
cuda_hooks-sm_30.ll? The
>>>>>>> duplication may be caused by how nvvm reflection
works, but I'd like to see
>>>>>>> a concrete example.
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> yuanfeng
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/dfaae562/attachment-0001.html>

Jingyue Wu via llvm-dev

2016-Mar-15 17:48 UTC

head link

[llvm-dev] instrumenting device code with gpucc

When you generate axpy-host.bc, you should use "clang -cc1 ..." with
the
"-fcuda-include-gpubinary" flag. "clang -cc1" invokes the
frontend only.

On Tue, Mar 15, 2016 at 6:45 PM, Yuanfeng Peng <yuanfeng.jack.peng at
gmail.com> wrote:
> Hi Jingyue,
>
> Sorry to ask again, but how exactly could I glue the fatbin with the
> instrumented host code?  Or does it mean we actually cannot instrument both
> the host & device code at the same time?
>
> Thanks!
> yuanfeng
>
> On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at google.com>
wrote:
>
>> Including fatbin into host code should be done in frontend.
>>
>> On Mon, Mar 14, 2016 at 12:13 AM, Yuanfeng Peng <
>> yuanfeng.jack.peng at gmail.com> wrote:
>>
>>> Hey Jingyue,
>>>
>>> Thanks for being so responsive!  I finally figured out a way to
resolve
>>> the issue: all I have to do is to use `-only-needed` when merging
the
>>> device bitcodes with llvm-link.
>>>
>>> However, since we actually need to instrument the host code as
well,  I
>>> encountered another issue when I tried to glue the instrumented
host code
>>> and fatbin together.  When I only instrumented the device code, I
used the
>>> following cmd to do so:
>>>
>>> "/mnt/wtf/tools/bin/clang-3.9" "-cc1"
"-triple"
>>> "x86_64-unknown-linux-gnu" "-aux-triple"
"nvptx64-nvidia-cuda"
>>> "-fcuda-target-overloads"
"-fcuda-disable-target-call-checks" "-emit-obj"
>>> "-disable-free" "-main-file-name"
"axpy.cu" "-mrelocation-model"
>>> "static" "-mthread-model" "posix"
"-fmath-errno" "-masm-verbose"
>>> "-mconstructor-aliases" "-munwind-tables"
"-fuse-init-array" "-target-cpu"
>>> "x86-64" "-momit-leaf-frame-pointer"
"-dwarf-column-info"
>>> "-debugger-tuning=gdb" "-resource-dir"
>>> "/mnt/wtf/tools/bin/../lib/clang/3.9.0" "-I"
>>> "/usr/local/cuda-7.0/samples/common/inc"
"-internal-isystem"
>>>
"/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8"
>>> "-internal-isystem"
>>>
"/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8"
>>> "-internal-isystem"
>>>
"/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8"
>>> "-internal-isystem"
>>>
"/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward"
>>> "-internal-isystem" "/usr/local/include"
"-internal-isystem"
>>> "/mnt/wtf/tools/bin/../lib/clang/3.9.0/include"
"-internal-externc-isystem"
>>> "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include"
>>> "-internal-externc-isystem" "/usr/include"
"-internal-isystem"
>>> "/usr/local/cuda/include" "-include"
"__clang_cuda_runtime_wrapper.h" "-O3"
>>> "-fdeprecated-macro" "-fdebug-compilation-dir"
>>> "/mnt/wtf/workspace/cuda/gpu-race-detection"
"-ferror-limit" "19"
>>> "-fmessage-length" "291" "-pthread"
"-fobjc-runtime=gcc" "-fcxx-exceptions"
>>> "-fexceptions" "-fdiagnostics-show-option"
"-vectorize-loops"
>>> "-vectorize-slp" "-o" "axpy-host.o"
"-x" "cuda" "tests/axpy.cu"
>>> "-fcuda-include-gpubinary" "axpy-sm_30.fatbin"
>>>
>>> which, from my understanding, compiles the host code in
tests/axpy.cu
>>> and link it with axpy-sm_30.fatbin.  However, now that I
instrumented the
>>> IR of the host code (axpy.bc) and did `llc axpy.bc -o axpy.s`,
which cmd
>>> should I use to link axpy.s with axpy-sm_30.fatbin?  I tried to use
-cc1as,
>>> but the flag '-fcuda-include-gpubinary' was not recognized.
>>>
>>> Thanks!
>>>
>>> yuanfeng
>>>
>>> On Sat, Mar 12, 2016 at 12:05 AM, Jingyue Wu <jingyue at
google.com> wrote:
>>>
>>>> I've no idea. Without instrumentation, nvvm_reflect_anchor
doesn't
>>>> appear in the final PTX, right? If that's the case, some
pass in llc must
>>>> have deleted the anchor and you should be able to figure out
which one.
>>>>
>>>> On Fri, Mar 11, 2016 at 4:56 PM, Yuanfeng Peng <
>>>> yuanfeng.jack.peng at gmail.com> wrote:
>>>>
>>>>> Hey Jingyue,
>>>>>
>>>>> Though I tried `opt -nvvm-reflect` on both bc files, the
nvvm reflect
>>>>> anchor didn't go away; ptxas is still complaining about
the duplicate
>>>>> definition of of function
'_ZL21__nvvm_reflect_anchorv' .  Did I misused
>>>>> the nvvm-reflect pass?
>>>>>
>>>>> Thanks!
>>>>> yuanfeng
>>>>>
>>>>> On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at
google.com>
>>>>> wrote:
>>>>>
>>>>>> According to the examples you sent, I believe the
linking issue was
>>>>>> caused by nvvm reflection anchors. I haven't played
with that, but I guess
>>>>>> running nvvm-reflect on an IR removes the nvvm reflect
anchors. After that,
>>>>>> you can llvm-link the two bc/ll files.
>>>>>>
>>>>>> Another potential issue is that your
cuda_hooks-sm_30.ll is
>>>>>> unoptimized. This could cause the instrumented code to
run super slow.
>>>>>>
>>>>>> On Fri, Mar 11, 2016 at 9:40 AM, Yuanfeng Peng <
>>>>>> yuanfeng.jack.peng at gmail.com> wrote:
>>>>>>
>>>>>>> Hey Jingyue,
>>>>>>>
>>>>>>> Attached are the .ll files.  Thanks!
>>>>>>>
>>>>>>> yuanfeng
>>>>>>>
>>>>>>> On Fri, Mar 11, 2016 at 3:47 AM, Jingyue Wu
<jingyue at google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Looks like we are getting closer!
>>>>>>>>
>>>>>>>> On Thu, Mar 10, 2016 at 5:21 PM, Yuanfeng Peng
<
>>>>>>>> yuanfeng.jack.peng at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Jingyue,
>>>>>>>>>
>>>>>>>>> Thank you so much for the helpful response!
I didn't know that
>>>>>>>>> PTX assembly cannot be linked; that's
likely the reason for my issue.
>>>>>>>>>
>>>>>>>>> So I did the following as you
suggested(axpy-sm_30.bc is the
>>>>>>>>> instrumented bitcode, and
cuda_hooks-sm_30.bc contains the hook functions):
>>>>>>>>>
>>>>>>>>> *llvm-link axpy-sm_30.bc
cuda_hooks-sm_30.bc  -o
>>>>>>>>> inst_axpy-sm_30.bc*
>>>>>>>>>
>>>>>>>>> *llc inst_axpy-sm_30.bc -o axpy-sm_30.s*
>>>>>>>>>
>>>>>>>>> *"/usr/local/cuda/bin/ptxas"
"-m64" "-O3" -c "--gpu-name" "sm_30"
>>>>>>>>> "--output-file" axpy-sm_30.o
axpy-sm_30.s*
>>>>>>>>>
>>>>>>>>> However, I got the following error from
ptxas:
>>>>>>>>>
>>>>>>>>> *ptxas axpy-sm_30.s, line 106; error   :
Duplicate definition of
>>>>>>>>> function
'_ZL21__nvvm_reflect_anchorv'*
>>>>>>>>>
>>>>>>>>> *ptxas axpy-sm_30.s, line 106; fatal   :
Parsing error near '.2':
>>>>>>>>> syntax error*
>>>>>>>>>
>>>>>>>>> *ptxas fatal   : Ptx assembly aborted due
to errors*
>>>>>>>>>
>>>>>>>>> Looks like some cuda function definitions
are in both bitcode
>>>>>>>>> files which caused duplicate definition...
what am I supposed to do to
>>>>>>>>> resolve this issue?
>>>>>>>>>
>>>>>>>> Can you attach axpy-sm_30.ll and
cuda_hooks-sm_30.ll? The
>>>>>>>> duplication may be caused by how nvvm
reflection works, but I'd like to see
>>>>>>>> a concrete example.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> yuanfeng
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/93d811f3/attachment.html>

Yuanfeng Peng via llvm-dev

2016-Mar-15 17:51 UTC

head link

[llvm-dev] instrumenting device code with gpucc

Gotcha.  Thank you sooooo much for all your invaluable help!

yuanfeng

On Tue, Mar 15, 2016 at 10:48 AM, Jingyue Wu <jingyue at google.com>
wrote:
> When you generate axpy-host.bc, you should use "clang -cc1 ..."
with the
> "-fcuda-include-gpubinary" flag. "clang -cc1" invokes
the frontend only.
>
> On Tue, Mar 15, 2016 at 6:45 PM, Yuanfeng Peng <
> yuanfeng.jack.peng at gmail.com> wrote:
>
>> Hi Jingyue,
>>
>> Sorry to ask again, but how exactly could I glue the fatbin with the
>> instrumented host code?  Or does it mean we actually cannot instrument
both
>> the host & device code at the same time?
>>
>> Thanks!
>> yuanfeng
>>
>> On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at
google.com> wrote:
>>
>>> Including fatbin into host code should be done in frontend.
>>>
>>> On Mon, Mar 14, 2016 at 12:13 AM, Yuanfeng Peng <
>>> yuanfeng.jack.peng at gmail.com> wrote:
>>>
>>>> Hey Jingyue,
>>>>
>>>> Thanks for being so responsive!  I finally figured out a way to
resolve
>>>> the issue: all I have to do is to use `-only-needed` when
merging the
>>>> device bitcodes with llvm-link.
>>>>
>>>> However, since we actually need to instrument the host code as
well,  I
>>>> encountered another issue when I tried to glue the instrumented
host code
>>>> and fatbin together.  When I only instrumented the device code,
I used the
>>>> following cmd to do so:
>>>>
>>>> "/mnt/wtf/tools/bin/clang-3.9" "-cc1"
"-triple"
>>>> "x86_64-unknown-linux-gnu" "-aux-triple"
"nvptx64-nvidia-cuda"
>>>> "-fcuda-target-overloads"
"-fcuda-disable-target-call-checks" "-emit-obj"
>>>> "-disable-free" "-main-file-name"
"axpy.cu" "-mrelocation-model"
>>>> "static" "-mthread-model" "posix"
"-fmath-errno" "-masm-verbose"
>>>> "-mconstructor-aliases" "-munwind-tables"
"-fuse-init-array" "-target-cpu"
>>>> "x86-64" "-momit-leaf-frame-pointer"
"-dwarf-column-info"
>>>> "-debugger-tuning=gdb" "-resource-dir"
>>>> "/mnt/wtf/tools/bin/../lib/clang/3.9.0"
"-I"
>>>> "/usr/local/cuda-7.0/samples/common/inc"
"-internal-isystem"
>>>>
"/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8"
>>>> "-internal-isystem"
>>>>
"/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8"
>>>> "-internal-isystem"
>>>>
"/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8"
>>>> "-internal-isystem"
>>>>
"/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward"
>>>> "-internal-isystem" "/usr/local/include"
"-internal-isystem"
>>>> "/mnt/wtf/tools/bin/../lib/clang/3.9.0/include"
"-internal-externc-isystem"
>>>> "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include"
>>>> "-internal-externc-isystem" "/usr/include"
"-internal-isystem"
>>>> "/usr/local/cuda/include" "-include"
"__clang_cuda_runtime_wrapper.h" "-O3"
>>>> "-fdeprecated-macro"
"-fdebug-compilation-dir"
>>>> "/mnt/wtf/workspace/cuda/gpu-race-detection"
"-ferror-limit" "19"
>>>> "-fmessage-length" "291"
"-pthread" "-fobjc-runtime=gcc" "-fcxx-exceptions"
>>>> "-fexceptions" "-fdiagnostics-show-option"
"-vectorize-loops"
>>>> "-vectorize-slp" "-o"
"axpy-host.o" "-x" "cuda"
"tests/axpy.cu"
>>>> "-fcuda-include-gpubinary"
"axpy-sm_30.fatbin"
>>>>
>>>> which, from my understanding, compiles the host code in
tests/axpy.cu
>>>> and link it with axpy-sm_30.fatbin.  However, now that I
instrumented the
>>>> IR of the host code (axpy.bc) and did `llc axpy.bc -o axpy.s`,
which cmd
>>>> should I use to link axpy.s with axpy-sm_30.fatbin?  I tried to
use -cc1as,
>>>> but the flag '-fcuda-include-gpubinary' was not
recognized.
>>>>
>>>> Thanks!
>>>>
>>>> yuanfeng
>>>>
>>>> On Sat, Mar 12, 2016 at 12:05 AM, Jingyue Wu <jingyue at
google.com>
>>>> wrote:
>>>>
>>>>> I've no idea. Without instrumentation,
nvvm_reflect_anchor doesn't
>>>>> appear in the final PTX, right? If that's the case,
some pass in llc must
>>>>> have deleted the anchor and you should be able to figure
out which one.
>>>>>
>>>>> On Fri, Mar 11, 2016 at 4:56 PM, Yuanfeng Peng <
>>>>> yuanfeng.jack.peng at gmail.com> wrote:
>>>>>
>>>>>> Hey Jingyue,
>>>>>>
>>>>>> Though I tried `opt -nvvm-reflect` on both bc files,
the nvvm reflect
>>>>>> anchor didn't go away; ptxas is still complaining
about the duplicate
>>>>>> definition of of function
'_ZL21__nvvm_reflect_anchorv' .  Did I misused
>>>>>> the nvvm-reflect pass?
>>>>>>
>>>>>> Thanks!
>>>>>> yuanfeng
>>>>>>
>>>>>> On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu
<jingyue at google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> According to the examples you sent, I believe the
linking issue was
>>>>>>> caused by nvvm reflection anchors. I haven't
played with that, but I guess
>>>>>>> running nvvm-reflect on an IR removes the nvvm
reflect anchors. After that,
>>>>>>> you can llvm-link the two bc/ll files.
>>>>>>>
>>>>>>> Another potential issue is that your
cuda_hooks-sm_30.ll is
>>>>>>> unoptimized. This could cause the instrumented code
to run super slow.
>>>>>>>
>>>>>>> On Fri, Mar 11, 2016 at 9:40 AM, Yuanfeng Peng <
>>>>>>> yuanfeng.jack.peng at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hey Jingyue,
>>>>>>>>
>>>>>>>> Attached are the .ll files.  Thanks!
>>>>>>>>
>>>>>>>> yuanfeng
>>>>>>>>
>>>>>>>> On Fri, Mar 11, 2016 at 3:47 AM, Jingyue Wu
<jingyue at google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Looks like we are getting closer!
>>>>>>>>>
>>>>>>>>> On Thu, Mar 10, 2016 at 5:21 PM, Yuanfeng
Peng <
>>>>>>>>> yuanfeng.jack.peng at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Jingyue,
>>>>>>>>>>
>>>>>>>>>> Thank you so much for the helpful
response!  I didn't know that
>>>>>>>>>> PTX assembly cannot be linked;
that's likely the reason for my issue.
>>>>>>>>>>
>>>>>>>>>> So I did the following as you
suggested(axpy-sm_30.bc is the
>>>>>>>>>> instrumented bitcode, and
cuda_hooks-sm_30.bc contains the hook functions):
>>>>>>>>>>
>>>>>>>>>> *llvm-link axpy-sm_30.bc
cuda_hooks-sm_30.bc  -o
>>>>>>>>>> inst_axpy-sm_30.bc*
>>>>>>>>>>
>>>>>>>>>> *llc inst_axpy-sm_30.bc -o
axpy-sm_30.s*
>>>>>>>>>>
>>>>>>>>>> *"/usr/local/cuda/bin/ptxas"
"-m64" "-O3" -c "--gpu-name" "sm_30"
>>>>>>>>>> "--output-file" axpy-sm_30.o
axpy-sm_30.s*
>>>>>>>>>>
>>>>>>>>>> However, I got the following error from
ptxas:
>>>>>>>>>>
>>>>>>>>>> *ptxas axpy-sm_30.s, line 106; error  
: Duplicate definition of
>>>>>>>>>> function
'_ZL21__nvvm_reflect_anchorv'*
>>>>>>>>>>
>>>>>>>>>> *ptxas axpy-sm_30.s, line 106; fatal  
: Parsing error near '.2':
>>>>>>>>>> syntax error*
>>>>>>>>>>
>>>>>>>>>> *ptxas fatal   : Ptx assembly aborted
due to errors*
>>>>>>>>>>
>>>>>>>>>> Looks like some cuda function
definitions are in both bitcode
>>>>>>>>>> files which caused duplicate
definition... what am I supposed to do to
>>>>>>>>>> resolve this issue?
>>>>>>>>>>
>>>>>>>>> Can you attach axpy-sm_30.ll and
cuda_hooks-sm_30.ll? The
>>>>>>>>> duplication may be caused by how nvvm
reflection works, but I'd like to see
>>>>>>>>> a concrete example.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> yuanfeng
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/675a6248/attachment.html>

llvm dev - Mar 2016 - instrumenting device code with gpucc

[llvm-dev] instrumenting device code with gpucc

[llvm-dev] instrumenting device code with gpucc

[llvm-dev] instrumenting device code with gpucc