Yuanfeng Peng via llvm-dev
2016-Mar-15 17:45 UTC
[llvm-dev] instrumenting device code with gpucc
Hi Jingyue, Sorry to ask again, but how exactly could I glue the fatbin with the instrumented host code? Or does it mean we actually cannot instrument both the host & device code at the same time? Thanks! yuanfeng On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at google.com> wrote:> Including fatbin into host code should be done in frontend. > > On Mon, Mar 14, 2016 at 12:13 AM, Yuanfeng Peng < > yuanfeng.jack.peng at gmail.com> wrote: > >> Hey Jingyue, >> >> Thanks for being so responsive! I finally figured out a way to resolve >> the issue: all I have to do is to use `-only-needed` when merging the >> device bitcodes with llvm-link. >> >> However, since we actually need to instrument the host code as well, I >> encountered another issue when I tried to glue the instrumented host code >> and fatbin together. When I only instrumented the device code, I used the >> following cmd to do so: >> >> "/mnt/wtf/tools/bin/clang-3.9" "-cc1" "-triple" >> "x86_64-unknown-linux-gnu" "-aux-triple" "nvptx64-nvidia-cuda" >> "-fcuda-target-overloads" "-fcuda-disable-target-call-checks" "-emit-obj" >> "-disable-free" "-main-file-name" "axpy.cu" "-mrelocation-model" >> "static" "-mthread-model" "posix" "-fmath-errno" "-masm-verbose" >> "-mconstructor-aliases" "-munwind-tables" "-fuse-init-array" "-target-cpu" >> "x86-64" "-momit-leaf-frame-pointer" "-dwarf-column-info" >> "-debugger-tuning=gdb" "-resource-dir" >> "/mnt/wtf/tools/bin/../lib/clang/3.9.0" "-I" >> "/usr/local/cuda-7.0/samples/common/inc" "-internal-isystem" >> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8" >> "-internal-isystem" >> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" >> "-internal-isystem" >> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" >> "-internal-isystem" >> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward" >> "-internal-isystem" "/usr/local/include" "-internal-isystem" >> "/mnt/wtf/tools/bin/../lib/clang/3.9.0/include" "-internal-externc-isystem" >> "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" >> "-internal-externc-isystem" "/usr/include" "-internal-isystem" >> "/usr/local/cuda/include" "-include" "__clang_cuda_runtime_wrapper.h" "-O3" >> "-fdeprecated-macro" "-fdebug-compilation-dir" >> "/mnt/wtf/workspace/cuda/gpu-race-detection" "-ferror-limit" "19" >> "-fmessage-length" "291" "-pthread" "-fobjc-runtime=gcc" "-fcxx-exceptions" >> "-fexceptions" "-fdiagnostics-show-option" "-vectorize-loops" >> "-vectorize-slp" "-o" "axpy-host.o" "-x" "cuda" "tests/axpy.cu" >> "-fcuda-include-gpubinary" "axpy-sm_30.fatbin" >> >> which, from my understanding, compiles the host code in tests/axpy.cu >> and link it with axpy-sm_30.fatbin. However, now that I instrumented the >> IR of the host code (axpy.bc) and did `llc axpy.bc -o axpy.s`, which cmd >> should I use to link axpy.s with axpy-sm_30.fatbin? I tried to use -cc1as, >> but the flag '-fcuda-include-gpubinary' was not recognized. >> >> Thanks! >> >> yuanfeng >> >> On Sat, Mar 12, 2016 at 12:05 AM, Jingyue Wu <jingyue at google.com> wrote: >> >>> I've no idea. Without instrumentation, nvvm_reflect_anchor doesn't >>> appear in the final PTX, right? If that's the case, some pass in llc must >>> have deleted the anchor and you should be able to figure out which one. >>> >>> On Fri, Mar 11, 2016 at 4:56 PM, Yuanfeng Peng < >>> yuanfeng.jack.peng at gmail.com> wrote: >>> >>>> Hey Jingyue, >>>> >>>> Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect >>>> anchor didn't go away; ptxas is still complaining about the duplicate >>>> definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused >>>> the nvvm-reflect pass? >>>> >>>> Thanks! >>>> yuanfeng >>>> >>>> On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> >>>> wrote: >>>> >>>>> According to the examples you sent, I believe the linking issue was >>>>> caused by nvvm reflection anchors. I haven't played with that, but I guess >>>>> running nvvm-reflect on an IR removes the nvvm reflect anchors. After that, >>>>> you can llvm-link the two bc/ll files. >>>>> >>>>> Another potential issue is that your cuda_hooks-sm_30.ll is >>>>> unoptimized. This could cause the instrumented code to run super slow. >>>>> >>>>> On Fri, Mar 11, 2016 at 9:40 AM, Yuanfeng Peng < >>>>> yuanfeng.jack.peng at gmail.com> wrote: >>>>> >>>>>> Hey Jingyue, >>>>>> >>>>>> Attached are the .ll files. Thanks! >>>>>> >>>>>> yuanfeng >>>>>> >>>>>> On Fri, Mar 11, 2016 at 3:47 AM, Jingyue Wu <jingyue at google.com> >>>>>> wrote: >>>>>> >>>>>>> Looks like we are getting closer! >>>>>>> >>>>>>> On Thu, Mar 10, 2016 at 5:21 PM, Yuanfeng Peng < >>>>>>> yuanfeng.jack.peng at gmail.com> wrote: >>>>>>> >>>>>>>> Hi Jingyue, >>>>>>>> >>>>>>>> Thank you so much for the helpful response! I didn't know that PTX >>>>>>>> assembly cannot be linked; that's likely the reason for my issue. >>>>>>>> >>>>>>>> So I did the following as you suggested(axpy-sm_30.bc is the >>>>>>>> instrumented bitcode, and cuda_hooks-sm_30.bc contains the hook functions): >>>>>>>> >>>>>>>> *llvm-link axpy-sm_30.bc cuda_hooks-sm_30.bc -o inst_axpy-sm_30.bc* >>>>>>>> >>>>>>>> *llc inst_axpy-sm_30.bc -o axpy-sm_30.s* >>>>>>>> >>>>>>>> *"/usr/local/cuda/bin/ptxas" "-m64" "-O3" -c "--gpu-name" "sm_30" >>>>>>>> "--output-file" axpy-sm_30.o axpy-sm_30.s* >>>>>>>> >>>>>>>> However, I got the following error from ptxas: >>>>>>>> >>>>>>>> *ptxas axpy-sm_30.s, line 106; error : Duplicate definition of >>>>>>>> function '_ZL21__nvvm_reflect_anchorv'* >>>>>>>> >>>>>>>> *ptxas axpy-sm_30.s, line 106; fatal : Parsing error near '.2': >>>>>>>> syntax error* >>>>>>>> >>>>>>>> *ptxas fatal : Ptx assembly aborted due to errors* >>>>>>>> >>>>>>>> Looks like some cuda function definitions are in both bitcode files >>>>>>>> which caused duplicate definition... what am I supposed to do to resolve >>>>>>>> this issue? >>>>>>>> >>>>>>> Can you attach axpy-sm_30.ll and cuda_hooks-sm_30.ll? The >>>>>>> duplication may be caused by how nvvm reflection works, but I'd like to see >>>>>>> a concrete example. >>>>>>> >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> yuanfeng >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/dfaae562/attachment-0001.html>
Jingyue Wu via llvm-dev
2016-Mar-15 17:48 UTC
[llvm-dev] instrumenting device code with gpucc
When you generate axpy-host.bc, you should use "clang -cc1 ..." with the "-fcuda-include-gpubinary" flag. "clang -cc1" invokes the frontend only. On Tue, Mar 15, 2016 at 6:45 PM, Yuanfeng Peng <yuanfeng.jack.peng at gmail.com> wrote:> Hi Jingyue, > > Sorry to ask again, but how exactly could I glue the fatbin with the > instrumented host code? Or does it mean we actually cannot instrument both > the host & device code at the same time? > > Thanks! > yuanfeng > > On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at google.com> wrote: > >> Including fatbin into host code should be done in frontend. >> >> On Mon, Mar 14, 2016 at 12:13 AM, Yuanfeng Peng < >> yuanfeng.jack.peng at gmail.com> wrote: >> >>> Hey Jingyue, >>> >>> Thanks for being so responsive! I finally figured out a way to resolve >>> the issue: all I have to do is to use `-only-needed` when merging the >>> device bitcodes with llvm-link. >>> >>> However, since we actually need to instrument the host code as well, I >>> encountered another issue when I tried to glue the instrumented host code >>> and fatbin together. When I only instrumented the device code, I used the >>> following cmd to do so: >>> >>> "/mnt/wtf/tools/bin/clang-3.9" "-cc1" "-triple" >>> "x86_64-unknown-linux-gnu" "-aux-triple" "nvptx64-nvidia-cuda" >>> "-fcuda-target-overloads" "-fcuda-disable-target-call-checks" "-emit-obj" >>> "-disable-free" "-main-file-name" "axpy.cu" "-mrelocation-model" >>> "static" "-mthread-model" "posix" "-fmath-errno" "-masm-verbose" >>> "-mconstructor-aliases" "-munwind-tables" "-fuse-init-array" "-target-cpu" >>> "x86-64" "-momit-leaf-frame-pointer" "-dwarf-column-info" >>> "-debugger-tuning=gdb" "-resource-dir" >>> "/mnt/wtf/tools/bin/../lib/clang/3.9.0" "-I" >>> "/usr/local/cuda-7.0/samples/common/inc" "-internal-isystem" >>> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8" >>> "-internal-isystem" >>> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" >>> "-internal-isystem" >>> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" >>> "-internal-isystem" >>> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward" >>> "-internal-isystem" "/usr/local/include" "-internal-isystem" >>> "/mnt/wtf/tools/bin/../lib/clang/3.9.0/include" "-internal-externc-isystem" >>> "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" >>> "-internal-externc-isystem" "/usr/include" "-internal-isystem" >>> "/usr/local/cuda/include" "-include" "__clang_cuda_runtime_wrapper.h" "-O3" >>> "-fdeprecated-macro" "-fdebug-compilation-dir" >>> "/mnt/wtf/workspace/cuda/gpu-race-detection" "-ferror-limit" "19" >>> "-fmessage-length" "291" "-pthread" "-fobjc-runtime=gcc" "-fcxx-exceptions" >>> "-fexceptions" "-fdiagnostics-show-option" "-vectorize-loops" >>> "-vectorize-slp" "-o" "axpy-host.o" "-x" "cuda" "tests/axpy.cu" >>> "-fcuda-include-gpubinary" "axpy-sm_30.fatbin" >>> >>> which, from my understanding, compiles the host code in tests/axpy.cu >>> and link it with axpy-sm_30.fatbin. However, now that I instrumented the >>> IR of the host code (axpy.bc) and did `llc axpy.bc -o axpy.s`, which cmd >>> should I use to link axpy.s with axpy-sm_30.fatbin? I tried to use -cc1as, >>> but the flag '-fcuda-include-gpubinary' was not recognized. >>> >>> Thanks! >>> >>> yuanfeng >>> >>> On Sat, Mar 12, 2016 at 12:05 AM, Jingyue Wu <jingyue at google.com> wrote: >>> >>>> I've no idea. Without instrumentation, nvvm_reflect_anchor doesn't >>>> appear in the final PTX, right? If that's the case, some pass in llc must >>>> have deleted the anchor and you should be able to figure out which one. >>>> >>>> On Fri, Mar 11, 2016 at 4:56 PM, Yuanfeng Peng < >>>> yuanfeng.jack.peng at gmail.com> wrote: >>>> >>>>> Hey Jingyue, >>>>> >>>>> Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect >>>>> anchor didn't go away; ptxas is still complaining about the duplicate >>>>> definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused >>>>> the nvvm-reflect pass? >>>>> >>>>> Thanks! >>>>> yuanfeng >>>>> >>>>> On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> >>>>> wrote: >>>>> >>>>>> According to the examples you sent, I believe the linking issue was >>>>>> caused by nvvm reflection anchors. I haven't played with that, but I guess >>>>>> running nvvm-reflect on an IR removes the nvvm reflect anchors. After that, >>>>>> you can llvm-link the two bc/ll files. >>>>>> >>>>>> Another potential issue is that your cuda_hooks-sm_30.ll is >>>>>> unoptimized. This could cause the instrumented code to run super slow. >>>>>> >>>>>> On Fri, Mar 11, 2016 at 9:40 AM, Yuanfeng Peng < >>>>>> yuanfeng.jack.peng at gmail.com> wrote: >>>>>> >>>>>>> Hey Jingyue, >>>>>>> >>>>>>> Attached are the .ll files. Thanks! >>>>>>> >>>>>>> yuanfeng >>>>>>> >>>>>>> On Fri, Mar 11, 2016 at 3:47 AM, Jingyue Wu <jingyue at google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Looks like we are getting closer! >>>>>>>> >>>>>>>> On Thu, Mar 10, 2016 at 5:21 PM, Yuanfeng Peng < >>>>>>>> yuanfeng.jack.peng at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Jingyue, >>>>>>>>> >>>>>>>>> Thank you so much for the helpful response! I didn't know that >>>>>>>>> PTX assembly cannot be linked; that's likely the reason for my issue. >>>>>>>>> >>>>>>>>> So I did the following as you suggested(axpy-sm_30.bc is the >>>>>>>>> instrumented bitcode, and cuda_hooks-sm_30.bc contains the hook functions): >>>>>>>>> >>>>>>>>> *llvm-link axpy-sm_30.bc cuda_hooks-sm_30.bc -o >>>>>>>>> inst_axpy-sm_30.bc* >>>>>>>>> >>>>>>>>> *llc inst_axpy-sm_30.bc -o axpy-sm_30.s* >>>>>>>>> >>>>>>>>> *"/usr/local/cuda/bin/ptxas" "-m64" "-O3" -c "--gpu-name" "sm_30" >>>>>>>>> "--output-file" axpy-sm_30.o axpy-sm_30.s* >>>>>>>>> >>>>>>>>> However, I got the following error from ptxas: >>>>>>>>> >>>>>>>>> *ptxas axpy-sm_30.s, line 106; error : Duplicate definition of >>>>>>>>> function '_ZL21__nvvm_reflect_anchorv'* >>>>>>>>> >>>>>>>>> *ptxas axpy-sm_30.s, line 106; fatal : Parsing error near '.2': >>>>>>>>> syntax error* >>>>>>>>> >>>>>>>>> *ptxas fatal : Ptx assembly aborted due to errors* >>>>>>>>> >>>>>>>>> Looks like some cuda function definitions are in both bitcode >>>>>>>>> files which caused duplicate definition... what am I supposed to do to >>>>>>>>> resolve this issue? >>>>>>>>> >>>>>>>> Can you attach axpy-sm_30.ll and cuda_hooks-sm_30.ll? The >>>>>>>> duplication may be caused by how nvvm reflection works, but I'd like to see >>>>>>>> a concrete example. >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> yuanfeng >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/93d811f3/attachment.html>
Yuanfeng Peng via llvm-dev
2016-Mar-15 17:51 UTC
[llvm-dev] instrumenting device code with gpucc
Gotcha. Thank you sooooo much for all your invaluable help! yuanfeng On Tue, Mar 15, 2016 at 10:48 AM, Jingyue Wu <jingyue at google.com> wrote:> When you generate axpy-host.bc, you should use "clang -cc1 ..." with the > "-fcuda-include-gpubinary" flag. "clang -cc1" invokes the frontend only. > > On Tue, Mar 15, 2016 at 6:45 PM, Yuanfeng Peng < > yuanfeng.jack.peng at gmail.com> wrote: > >> Hi Jingyue, >> >> Sorry to ask again, but how exactly could I glue the fatbin with the >> instrumented host code? Or does it mean we actually cannot instrument both >> the host & device code at the same time? >> >> Thanks! >> yuanfeng >> >> On Tue, Mar 15, 2016 at 10:09 AM, Jingyue Wu <jingyue at google.com> wrote: >> >>> Including fatbin into host code should be done in frontend. >>> >>> On Mon, Mar 14, 2016 at 12:13 AM, Yuanfeng Peng < >>> yuanfeng.jack.peng at gmail.com> wrote: >>> >>>> Hey Jingyue, >>>> >>>> Thanks for being so responsive! I finally figured out a way to resolve >>>> the issue: all I have to do is to use `-only-needed` when merging the >>>> device bitcodes with llvm-link. >>>> >>>> However, since we actually need to instrument the host code as well, I >>>> encountered another issue when I tried to glue the instrumented host code >>>> and fatbin together. When I only instrumented the device code, I used the >>>> following cmd to do so: >>>> >>>> "/mnt/wtf/tools/bin/clang-3.9" "-cc1" "-triple" >>>> "x86_64-unknown-linux-gnu" "-aux-triple" "nvptx64-nvidia-cuda" >>>> "-fcuda-target-overloads" "-fcuda-disable-target-call-checks" "-emit-obj" >>>> "-disable-free" "-main-file-name" "axpy.cu" "-mrelocation-model" >>>> "static" "-mthread-model" "posix" "-fmath-errno" "-masm-verbose" >>>> "-mconstructor-aliases" "-munwind-tables" "-fuse-init-array" "-target-cpu" >>>> "x86-64" "-momit-leaf-frame-pointer" "-dwarf-column-info" >>>> "-debugger-tuning=gdb" "-resource-dir" >>>> "/mnt/wtf/tools/bin/../lib/clang/3.9.0" "-I" >>>> "/usr/local/cuda-7.0/samples/common/inc" "-internal-isystem" >>>> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8" >>>> "-internal-isystem" >>>> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" >>>> "-internal-isystem" >>>> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" >>>> "-internal-isystem" >>>> "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward" >>>> "-internal-isystem" "/usr/local/include" "-internal-isystem" >>>> "/mnt/wtf/tools/bin/../lib/clang/3.9.0/include" "-internal-externc-isystem" >>>> "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" >>>> "-internal-externc-isystem" "/usr/include" "-internal-isystem" >>>> "/usr/local/cuda/include" "-include" "__clang_cuda_runtime_wrapper.h" "-O3" >>>> "-fdeprecated-macro" "-fdebug-compilation-dir" >>>> "/mnt/wtf/workspace/cuda/gpu-race-detection" "-ferror-limit" "19" >>>> "-fmessage-length" "291" "-pthread" "-fobjc-runtime=gcc" "-fcxx-exceptions" >>>> "-fexceptions" "-fdiagnostics-show-option" "-vectorize-loops" >>>> "-vectorize-slp" "-o" "axpy-host.o" "-x" "cuda" "tests/axpy.cu" >>>> "-fcuda-include-gpubinary" "axpy-sm_30.fatbin" >>>> >>>> which, from my understanding, compiles the host code in tests/axpy.cu >>>> and link it with axpy-sm_30.fatbin. However, now that I instrumented the >>>> IR of the host code (axpy.bc) and did `llc axpy.bc -o axpy.s`, which cmd >>>> should I use to link axpy.s with axpy-sm_30.fatbin? I tried to use -cc1as, >>>> but the flag '-fcuda-include-gpubinary' was not recognized. >>>> >>>> Thanks! >>>> >>>> yuanfeng >>>> >>>> On Sat, Mar 12, 2016 at 12:05 AM, Jingyue Wu <jingyue at google.com> >>>> wrote: >>>> >>>>> I've no idea. Without instrumentation, nvvm_reflect_anchor doesn't >>>>> appear in the final PTX, right? If that's the case, some pass in llc must >>>>> have deleted the anchor and you should be able to figure out which one. >>>>> >>>>> On Fri, Mar 11, 2016 at 4:56 PM, Yuanfeng Peng < >>>>> yuanfeng.jack.peng at gmail.com> wrote: >>>>> >>>>>> Hey Jingyue, >>>>>> >>>>>> Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect >>>>>> anchor didn't go away; ptxas is still complaining about the duplicate >>>>>> definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused >>>>>> the nvvm-reflect pass? >>>>>> >>>>>> Thanks! >>>>>> yuanfeng >>>>>> >>>>>> On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> >>>>>> wrote: >>>>>> >>>>>>> According to the examples you sent, I believe the linking issue was >>>>>>> caused by nvvm reflection anchors. I haven't played with that, but I guess >>>>>>> running nvvm-reflect on an IR removes the nvvm reflect anchors. After that, >>>>>>> you can llvm-link the two bc/ll files. >>>>>>> >>>>>>> Another potential issue is that your cuda_hooks-sm_30.ll is >>>>>>> unoptimized. This could cause the instrumented code to run super slow. >>>>>>> >>>>>>> On Fri, Mar 11, 2016 at 9:40 AM, Yuanfeng Peng < >>>>>>> yuanfeng.jack.peng at gmail.com> wrote: >>>>>>> >>>>>>>> Hey Jingyue, >>>>>>>> >>>>>>>> Attached are the .ll files. Thanks! >>>>>>>> >>>>>>>> yuanfeng >>>>>>>> >>>>>>>> On Fri, Mar 11, 2016 at 3:47 AM, Jingyue Wu <jingyue at google.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Looks like we are getting closer! >>>>>>>>> >>>>>>>>> On Thu, Mar 10, 2016 at 5:21 PM, Yuanfeng Peng < >>>>>>>>> yuanfeng.jack.peng at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Jingyue, >>>>>>>>>> >>>>>>>>>> Thank you so much for the helpful response! I didn't know that >>>>>>>>>> PTX assembly cannot be linked; that's likely the reason for my issue. >>>>>>>>>> >>>>>>>>>> So I did the following as you suggested(axpy-sm_30.bc is the >>>>>>>>>> instrumented bitcode, and cuda_hooks-sm_30.bc contains the hook functions): >>>>>>>>>> >>>>>>>>>> *llvm-link axpy-sm_30.bc cuda_hooks-sm_30.bc -o >>>>>>>>>> inst_axpy-sm_30.bc* >>>>>>>>>> >>>>>>>>>> *llc inst_axpy-sm_30.bc -o axpy-sm_30.s* >>>>>>>>>> >>>>>>>>>> *"/usr/local/cuda/bin/ptxas" "-m64" "-O3" -c "--gpu-name" "sm_30" >>>>>>>>>> "--output-file" axpy-sm_30.o axpy-sm_30.s* >>>>>>>>>> >>>>>>>>>> However, I got the following error from ptxas: >>>>>>>>>> >>>>>>>>>> *ptxas axpy-sm_30.s, line 106; error : Duplicate definition of >>>>>>>>>> function '_ZL21__nvvm_reflect_anchorv'* >>>>>>>>>> >>>>>>>>>> *ptxas axpy-sm_30.s, line 106; fatal : Parsing error near '.2': >>>>>>>>>> syntax error* >>>>>>>>>> >>>>>>>>>> *ptxas fatal : Ptx assembly aborted due to errors* >>>>>>>>>> >>>>>>>>>> Looks like some cuda function definitions are in both bitcode >>>>>>>>>> files which caused duplicate definition... what am I supposed to do to >>>>>>>>>> resolve this issue? >>>>>>>>>> >>>>>>>>> Can you attach axpy-sm_30.ll and cuda_hooks-sm_30.ll? The >>>>>>>>> duplication may be caused by how nvvm reflection works, but I'd like to see >>>>>>>>> a concrete example. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> >>>>>>>>>> yuanfeng >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/675a6248/attachment.html>