Yuanfeng Peng via llvm-dev
2016-Mar-12 00:56 UTC
[llvm-dev] instrumenting device code with gpucc
Hey Jingyue, Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect anchor didn't go away; ptxas is still complaining about the duplicate definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused the nvvm-reflect pass? Thanks! yuanfeng On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote:> According to the examples you sent, I believe the linking issue was caused > by nvvm reflection anchors. I haven't played with that, but I guess running > nvvm-reflect on an IR removes the nvvm reflect anchors. After that, you can > llvm-link the two bc/ll files. > > Another potential issue is that your cuda_hooks-sm_30.ll is unoptimized. > This could cause the instrumented code to run super slow. > > On Fri, Mar 11, 2016 at 9:40 AM, Yuanfeng Peng < > yuanfeng.jack.peng at gmail.com> wrote: > >> Hey Jingyue, >> >> Attached are the .ll files. Thanks! >> >> yuanfeng >> >> On Fri, Mar 11, 2016 at 3:47 AM, Jingyue Wu <jingyue at google.com> wrote: >> >>> Looks like we are getting closer! >>> >>> On Thu, Mar 10, 2016 at 5:21 PM, Yuanfeng Peng < >>> yuanfeng.jack.peng at gmail.com> wrote: >>> >>>> Hi Jingyue, >>>> >>>> Thank you so much for the helpful response! I didn't know that PTX >>>> assembly cannot be linked; that's likely the reason for my issue. >>>> >>>> So I did the following as you suggested(axpy-sm_30.bc is the >>>> instrumented bitcode, and cuda_hooks-sm_30.bc contains the hook functions): >>>> >>>> *llvm-link axpy-sm_30.bc cuda_hooks-sm_30.bc -o inst_axpy-sm_30.bc* >>>> >>>> *llc inst_axpy-sm_30.bc -o axpy-sm_30.s* >>>> >>>> *"/usr/local/cuda/bin/ptxas" "-m64" "-O3" -c "--gpu-name" "sm_30" >>>> "--output-file" axpy-sm_30.o axpy-sm_30.s* >>>> >>>> However, I got the following error from ptxas: >>>> >>>> *ptxas axpy-sm_30.s, line 106; error : Duplicate definition of >>>> function '_ZL21__nvvm_reflect_anchorv'* >>>> >>>> *ptxas axpy-sm_30.s, line 106; fatal : Parsing error near '.2': >>>> syntax error* >>>> >>>> *ptxas fatal : Ptx assembly aborted due to errors* >>>> >>>> Looks like some cuda function definitions are in both bitcode files >>>> which caused duplicate definition... what am I supposed to do to resolve >>>> this issue? >>>> >>> Can you attach axpy-sm_30.ll and cuda_hooks-sm_30.ll? The duplication >>> may be caused by how nvvm reflection works, but I'd like to see a concrete >>> example. >>> >>>> >>>> Thanks! >>>> >>>> yuanfeng >>>> >>>> >>>> >>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160311/4382a950/attachment.html>
Jingyue Wu via llvm-dev
2016-Mar-12 08:05 UTC
[llvm-dev] instrumenting device code with gpucc
I've no idea. Without instrumentation, nvvm_reflect_anchor doesn't appear in the final PTX, right? If that's the case, some pass in llc must have deleted the anchor and you should be able to figure out which one. On Fri, Mar 11, 2016 at 4:56 PM, Yuanfeng Peng <yuanfeng.jack.peng at gmail.com> wrote:> Hey Jingyue, > > Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect > anchor didn't go away; ptxas is still complaining about the duplicate > definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused > the nvvm-reflect pass? > > Thanks! > yuanfeng > > On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote: > >> According to the examples you sent, I believe the linking issue was >> caused by nvvm reflection anchors. I haven't played with that, but I guess >> running nvvm-reflect on an IR removes the nvvm reflect anchors. After that, >> you can llvm-link the two bc/ll files. >> >> Another potential issue is that your cuda_hooks-sm_30.ll is unoptimized. >> This could cause the instrumented code to run super slow. >> >> On Fri, Mar 11, 2016 at 9:40 AM, Yuanfeng Peng < >> yuanfeng.jack.peng at gmail.com> wrote: >> >>> Hey Jingyue, >>> >>> Attached are the .ll files. Thanks! >>> >>> yuanfeng >>> >>> On Fri, Mar 11, 2016 at 3:47 AM, Jingyue Wu <jingyue at google.com> wrote: >>> >>>> Looks like we are getting closer! >>>> >>>> On Thu, Mar 10, 2016 at 5:21 PM, Yuanfeng Peng < >>>> yuanfeng.jack.peng at gmail.com> wrote: >>>> >>>>> Hi Jingyue, >>>>> >>>>> Thank you so much for the helpful response! I didn't know that PTX >>>>> assembly cannot be linked; that's likely the reason for my issue. >>>>> >>>>> So I did the following as you suggested(axpy-sm_30.bc is the >>>>> instrumented bitcode, and cuda_hooks-sm_30.bc contains the hook functions): >>>>> >>>>> *llvm-link axpy-sm_30.bc cuda_hooks-sm_30.bc -o inst_axpy-sm_30.bc* >>>>> >>>>> *llc inst_axpy-sm_30.bc -o axpy-sm_30.s* >>>>> >>>>> *"/usr/local/cuda/bin/ptxas" "-m64" "-O3" -c "--gpu-name" "sm_30" >>>>> "--output-file" axpy-sm_30.o axpy-sm_30.s* >>>>> >>>>> However, I got the following error from ptxas: >>>>> >>>>> *ptxas axpy-sm_30.s, line 106; error : Duplicate definition of >>>>> function '_ZL21__nvvm_reflect_anchorv'* >>>>> >>>>> *ptxas axpy-sm_30.s, line 106; fatal : Parsing error near '.2': >>>>> syntax error* >>>>> >>>>> *ptxas fatal : Ptx assembly aborted due to errors* >>>>> >>>>> Looks like some cuda function definitions are in both bitcode files >>>>> which caused duplicate definition... what am I supposed to do to resolve >>>>> this issue? >>>>> >>>> Can you attach axpy-sm_30.ll and cuda_hooks-sm_30.ll? The duplication >>>> may be caused by how nvvm reflection works, but I'd like to see a concrete >>>> example. >>>> >>>>> >>>>> Thanks! >>>>> >>>>> yuanfeng >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160312/6a50c9c1/attachment.html>
Yuanfeng Peng via llvm-dev
2016-Mar-13 23:13 UTC
[llvm-dev] instrumenting device code with gpucc
Hey Jingyue, Thanks for being so responsive! I finally figured out a way to resolve the issue: all I have to do is to use `-only-needed` when merging the device bitcodes with llvm-link. However, since we actually need to instrument the host code as well, I encountered another issue when I tried to glue the instrumented host code and fatbin together. When I only instrumented the device code, I used the following cmd to do so: "/mnt/wtf/tools/bin/clang-3.9" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-aux-triple" "nvptx64-nvidia-cuda" "-fcuda-target-overloads" "-fcuda-disable-target-call-checks" "-emit-obj" "-disable-free" "-main-file-name" "axpy.cu" "-mrelocation-model" "static" "-mthread-model" "posix" "-fmath-errno" "-masm-verbose" "-mconstructor-aliases" "-munwind-tables" "-fuse-init-array" "-target-cpu" "x86-64" "-momit-leaf-frame-pointer" "-dwarf-column-info" "-debugger-tuning=gdb" "-resource-dir" "/mnt/wtf/tools/bin/../lib/clang/3.9.0" "-I" "/usr/local/cuda-7.0/samples/common/inc" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" "-internal-isystem" "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/mnt/wtf/tools/bin/../lib/clang/3.9.0/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-internal-isystem" "/usr/local/cuda/include" "-include" "__clang_cuda_runtime_wrapper.h" "-O3" "-fdeprecated-macro" "-fdebug-compilation-dir" "/mnt/wtf/workspace/cuda/gpu-race-detection" "-ferror-limit" "19" "-fmessage-length" "291" "-pthread" "-fobjc-runtime=gcc" "-fcxx-exceptions" "-fexceptions" "-fdiagnostics-show-option" "-vectorize-loops" "-vectorize-slp" "-o" "axpy-host.o" "-x" "cuda" "tests/axpy.cu" "-fcuda-include-gpubinary" "axpy-sm_30.fatbin" which, from my understanding, compiles the host code in tests/axpy.cu and link it with axpy-sm_30.fatbin. However, now that I instrumented the IR of the host code (axpy.bc) and did `llc axpy.bc -o axpy.s`, which cmd should I use to link axpy.s with axpy-sm_30.fatbin? I tried to use -cc1as, but the flag '-fcuda-include-gpubinary' was not recognized. Thanks! yuanfeng On Sat, Mar 12, 2016 at 12:05 AM, Jingyue Wu <jingyue at google.com> wrote:> I've no idea. Without instrumentation, nvvm_reflect_anchor doesn't appear > in the final PTX, right? If that's the case, some pass in llc must have > deleted the anchor and you should be able to figure out which one. > > On Fri, Mar 11, 2016 at 4:56 PM, Yuanfeng Peng < > yuanfeng.jack.peng at gmail.com> wrote: > >> Hey Jingyue, >> >> Though I tried `opt -nvvm-reflect` on both bc files, the nvvm reflect >> anchor didn't go away; ptxas is still complaining about the duplicate >> definition of of function '_ZL21__nvvm_reflect_anchorv' . Did I misused >> the nvvm-reflect pass? >> >> Thanks! >> yuanfeng >> >> On Fri, Mar 11, 2016 at 10:10 AM, Jingyue Wu <jingyue at google.com> wrote: >> >>> According to the examples you sent, I believe the linking issue was >>> caused by nvvm reflection anchors. I haven't played with that, but I guess >>> running nvvm-reflect on an IR removes the nvvm reflect anchors. After that, >>> you can llvm-link the two bc/ll files. >>> >>> Another potential issue is that your cuda_hooks-sm_30.ll is unoptimized. >>> This could cause the instrumented code to run super slow. >>> >>> On Fri, Mar 11, 2016 at 9:40 AM, Yuanfeng Peng < >>> yuanfeng.jack.peng at gmail.com> wrote: >>> >>>> Hey Jingyue, >>>> >>>> Attached are the .ll files. Thanks! >>>> >>>> yuanfeng >>>> >>>> On Fri, Mar 11, 2016 at 3:47 AM, Jingyue Wu <jingyue at google.com> wrote: >>>> >>>>> Looks like we are getting closer! >>>>> >>>>> On Thu, Mar 10, 2016 at 5:21 PM, Yuanfeng Peng < >>>>> yuanfeng.jack.peng at gmail.com> wrote: >>>>> >>>>>> Hi Jingyue, >>>>>> >>>>>> Thank you so much for the helpful response! I didn't know that PTX >>>>>> assembly cannot be linked; that's likely the reason for my issue. >>>>>> >>>>>> So I did the following as you suggested(axpy-sm_30.bc is the >>>>>> instrumented bitcode, and cuda_hooks-sm_30.bc contains the hook functions): >>>>>> >>>>>> *llvm-link axpy-sm_30.bc cuda_hooks-sm_30.bc -o inst_axpy-sm_30.bc* >>>>>> >>>>>> *llc inst_axpy-sm_30.bc -o axpy-sm_30.s* >>>>>> >>>>>> *"/usr/local/cuda/bin/ptxas" "-m64" "-O3" -c "--gpu-name" "sm_30" >>>>>> "--output-file" axpy-sm_30.o axpy-sm_30.s* >>>>>> >>>>>> However, I got the following error from ptxas: >>>>>> >>>>>> *ptxas axpy-sm_30.s, line 106; error : Duplicate definition of >>>>>> function '_ZL21__nvvm_reflect_anchorv'* >>>>>> >>>>>> *ptxas axpy-sm_30.s, line 106; fatal : Parsing error near '.2': >>>>>> syntax error* >>>>>> >>>>>> *ptxas fatal : Ptx assembly aborted due to errors* >>>>>> >>>>>> Looks like some cuda function definitions are in both bitcode files >>>>>> which caused duplicate definition... what am I supposed to do to resolve >>>>>> this issue? >>>>>> >>>>> Can you attach axpy-sm_30.ll and cuda_hooks-sm_30.ll? The duplication >>>>> may be caused by how nvvm reflection works, but I'd like to see a concrete >>>>> example. >>>>> >>>>>> >>>>>> Thanks! >>>>>> >>>>>> yuanfeng >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160313/649db42b/attachment.html>