OK, I see the problem. You were right that we weren't picking up libdevice. CUDA 7.0 only ships with the following libdevice binaries (found /path/to/cuda/nvvm/libdevice): libdevice.compute_20.10.bc libdevice.compute_30.10.bc libdevice.compute_35.10.bc If you ask for sm_50 with cuda 7.0, clang can't find a matching libdevice binary, and it will apparently silently give up and try to continue compiling your program. That's a bug that we should fix. (If you want the current behavior, you should have to ask clang not to use libdevice.) I see that nvcc from cuda 7.0 works (or at least builds without error). I guess it uses the libdevice for compute_35. We could do the same thing, although I am not sure how to tell whether that's safe in general. I'll look into this as well. Anyway if you build with CUDA 7.5 your problem should go away, because CUDA 7.5 has a libdevice binary for compute_50. Just pass --cuda-path=/path/to/cuda-7.5. Alternatively you could continue building with cuda 7.0 and pass sm_35 as your gpu arch. clang always embeds ptx in the binaries, so the result should still run on your sm_50 card (although your machine will have to jit the ptx on startup). As a third alternative, you could symlink your libdevice.compute_35.10.bc to libdevice.compute_50.10.bc, and...maybe that would work? If you do that, please let me know how it goes, I am curious. :) Thank you very much for the bug report! If you like I'll cc you on any relevant changes, just create an account at https://reviews.llvm.org (if necessary; I can't seem to find you) and let me know your username. Regards, -Justin On Sun, Jul 31, 2016 at 10:59 PM, Yuanfeng Peng <yuanfeng at cis.upenn.edu> wrote:> Hi Justin, > > Thanks for your response! The clang & llvm I'm using was built from source. > > Below is the output of compiling with -v. Any suggestions would be > appreciated! > > clang version 3.9.0 (trunk 270145) (llvm/trunk 270133) > Target: x86_64-unknown-linux-gnu > Thread model: posix > InstalledDir: /usr/local/bin > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8 > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8.4 > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9 > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.3 > Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8 > Candidate multilib: .;@m64 > Candidate multilib: 32;@m32 > Candidate multilib: x32;@mx32 > Selected multilib: .;@m64 > Found CUDA installation: /usr/local/cuda > "/usr/local/bin/clang-3.9" -cc1 -triple nvptx64-nvidia-cuda -aux-triple > x86_64-unknown-linux-gnu -S -disable-free -main-file-name scalarProd.cu > -mrelocation-model static -mthread-model posix -mdisable-fp-elim > -fmath-errno -no-integrated-as -fcuda-is-device -target-cpu sm_50 -v > -dwarf-column-info -debugger-tuning=gdb -resource-dir > /usr/local/bin/../lib/clang/3.9.0 -I ../ -I > /usr/local/cuda-7.0/samples/common/inc -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8 > -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8 > -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8 > -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward > -internal-isystem /usr/local/include -internal-isystem > /usr/local/bin/../lib/clang/3.9.0/include -internal-externc-isystem /include > -internal-externc-isystem /usr/include -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8 > -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8 > -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8 > -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward > -internal-isystem /usr/local/cuda/include -include > __clang_cuda_runtime_wrapper.h -fdeprecated-macro -fno-dwarf-directory-asm > -fdebug-compilation-dir > /mnt/wtf/workspace/cuda/gpu-race-detection/cuda-compressed-conflict-detection/scalarProd > -ferror-limit 19 -fmessage-length 144 -fobjc-runtime=gcc -fcxx-exceptions > -fexceptions -fdiagnostics-show-option -o /tmp/scalarProd-32a530.s -x cuda > scalarProd.cu > hooklib.so loading. > clang -cc1 version 3.9.0 based upon LLVM 3.9.0svn default target > x86_64-unknown-linux-gnu > ignoring nonexistent directory "/include" > ignoring duplicate directory > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" > ignoring duplicate directory > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8" > ignoring duplicate directory > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" > ignoring duplicate directory > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" > ignoring duplicate directory > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward" > ignoring duplicate directory "/usr/local/include" > ignoring duplicate directory "/usr/local/bin/../lib/clang/3.9.0/include" > ignoring duplicate directory "/usr/include" > #include "..." search starts here: > #include <...> search starts here: > .. > /usr/local/cuda-7.0/samples/common/inc > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8 > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8 > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward > /usr/local/include > /usr/local/bin/../lib/clang/3.9.0/include > /usr/include > /usr/local/cuda/include > End of search list. > > "/usr/local/cuda/bin/ptxas" -m64 -O0 --gpu-name sm_50 --output-file > /tmp/scalarProd-181f7e.o /tmp/scalarProd-32a530.s > ptxas fatal : Unresolved extern function '__nv_mul24' > clang-3.9: error: ptxas command failed with exit code 255 (use -v to see > invocation) > > Thanks! > Yuanfeng > > On Mon, Aug 1, 2016 at 1:04 AM, Justin Lebar <jlebar at google.com> wrote: >> >> Hi, Yuanfeng. >> >> What version of clang are you using? CUDA is only known to work at >> tip of head, so you must build clang yourself from source. >> >> I suspect that's your problem, but if building from source doesn't fix >> it, please attach the output of compiling with -v. >> >> Regards, >> -Justin >> >> On Sun, Jul 31, 2016 at 9:24 PM, Chandler Carruth <chandlerc at google.com> >> wrote: >> > Directly CC-ing some folks who may be able to help. >> > >> > On Fri, Jul 29, 2016 at 6:27 AM Yuanfeng Peng via llvm-dev >> > <llvm-dev at lists.llvm.org> wrote: >> >> >> >> Hi, >> >> >> >> I was trying to compile scalarProd.cu (from CUDA SDK) with the >> >> following >> >> command: >> >> >> >> clang++ -I../ -I/usr/local/cuda-7.0/samples/common/inc >> >> --cuda-gpu-arch=sm_50 scalarProd.cu >> >> >> >> but ended up with the following error: >> >> >> >> ptxas fatal : Unresolved extern function '__nv_mul24' >> >> >> >> Seems to me that libdevice was not automatically linked. I wonder what >> >> flags I need to pass to clang to have the code linked against >> >> libdevice? >> >> >> >> Thanks! >> >> Yuanfeng Peng >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> llvm-dev at lists.llvm.org >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >
Yuanfeng Peng via llvm-dev
2016-Aug-01 07:38 UTC
[llvm-dev] [GPUCC] link against libdevice
Hi Justin, Thanks for your help! I passed sm_30 as the target gpu arch and the compilation was successful. I'm also curious about how the symlink solution works so I also tried it :p. The compilation succeeded, but the binary I got crashed with a complaint ' *(8): illegal libdevice function*' . I would appreciate to be kept posted about relevant changes; my username is yuanfeng.peng . Thanks again! Yuanfeng On Mon, Aug 1, 2016 at 2:33 AM, Justin Lebar <jlebar at google.com> wrote:> OK, I see the problem. You were right that we weren't picking up > libdevice. > > CUDA 7.0 only ships with the following libdevice binaries (found > /path/to/cuda/nvvm/libdevice): > > libdevice.compute_20.10.bc libdevice.compute_30.10.bc > libdevice.compute_35.10.bc > > If you ask for sm_50 with cuda 7.0, clang can't find a matching > libdevice binary, and it will apparently silently give up and try to > continue compiling your program. That's a bug that we should fix. > (If you want the current behavior, you should have to ask clang not to > use libdevice.) > > I see that nvcc from cuda 7.0 works (or at least builds without > error). I guess it uses the libdevice for compute_35. We could do > the same thing, although I am not sure how to tell whether that's safe > in general. I'll look into this as well. > > Anyway if you build with CUDA 7.5 your problem should go away, because > CUDA 7.5 has a libdevice binary for compute_50. Just pass > --cuda-path=/path/to/cuda-7.5. Alternatively you could continue > building with cuda 7.0 and pass sm_35 as your gpu arch. clang always > embeds ptx in the binaries, so the result should still run on your > sm_50 card (although your machine will have to jit the ptx on > startup). > > As a third alternative, you could symlink your > libdevice.compute_35.10.bc to libdevice.compute_50.10.bc, and...maybe > that would work? If you do that, please let me know how it goes, I am > curious. :) > > Thank you very much for the bug report! If you like I'll cc you on > any relevant changes, just create an account at > https://reviews.llvm.org (if necessary; I can't seem to find you) and > let me know your username. > > Regards, > -Justin > > On Sun, Jul 31, 2016 at 10:59 PM, Yuanfeng Peng <yuanfeng at cis.upenn.edu> > wrote: > > Hi Justin, > > > > Thanks for your response! The clang & llvm I'm using was built from > source. > > > > Below is the output of compiling with -v. Any suggestions would be > > appreciated! > > > > clang version 3.9.0 (trunk 270145) (llvm/trunk 270133) > > Target: x86_64-unknown-linux-gnu > > Thread model: posix > > InstalledDir: /usr/local/bin > > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8 > > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8.4 > > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9 > > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.3 > > Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8 > > Candidate multilib: .;@m64 > > Candidate multilib: 32;@m32 > > Candidate multilib: x32;@mx32 > > Selected multilib: .;@m64 > > Found CUDA installation: /usr/local/cuda > > "/usr/local/bin/clang-3.9" -cc1 -triple nvptx64-nvidia-cuda -aux-triple > > x86_64-unknown-linux-gnu -S -disable-free -main-file-name scalarProd.cu > > -mrelocation-model static -mthread-model posix -mdisable-fp-elim > > -fmath-errno -no-integrated-as -fcuda-is-device -target-cpu sm_50 -v > > -dwarf-column-info -debugger-tuning=gdb -resource-dir > > /usr/local/bin/../lib/clang/3.9.0 -I ../ -I > > /usr/local/cuda-7.0/samples/common/inc -internal-isystem > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8 > > -internal-isystem > > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8 > > -internal-isystem > > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8 > > -internal-isystem > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward > > -internal-isystem /usr/local/include -internal-isystem > > /usr/local/bin/../lib/clang/3.9.0/include -internal-externc-isystem > /include > > -internal-externc-isystem /usr/include -internal-isystem > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8 > > -internal-isystem > > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8 > > -internal-isystem > > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8 > > -internal-isystem > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward > > -internal-isystem /usr/local/cuda/include -include > > __clang_cuda_runtime_wrapper.h -fdeprecated-macro > -fno-dwarf-directory-asm > > -fdebug-compilation-dir > > > /mnt/wtf/workspace/cuda/gpu-race-detection/cuda-compressed-conflict-detection/scalarProd > > -ferror-limit 19 -fmessage-length 144 -fobjc-runtime=gcc -fcxx-exceptions > > -fexceptions -fdiagnostics-show-option -o /tmp/scalarProd-32a530.s -x > cuda > > scalarProd.cu > > hooklib.so loading. > > clang -cc1 version 3.9.0 based upon LLVM 3.9.0svn default target > > x86_64-unknown-linux-gnu > > ignoring nonexistent directory "/include" > > ignoring duplicate directory > > > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" > > ignoring duplicate directory > > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8" > > ignoring duplicate directory > > > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" > > ignoring duplicate directory > > > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" > > ignoring duplicate directory > > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward" > > ignoring duplicate directory "/usr/local/include" > > ignoring duplicate directory "/usr/local/bin/../lib/clang/3.9.0/include" > > ignoring duplicate directory "/usr/include" > > #include "..." search starts here: > > #include <...> search starts here: > > .. > > /usr/local/cuda-7.0/samples/common/inc > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8 > > > > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8 > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward > > /usr/local/include > > /usr/local/bin/../lib/clang/3.9.0/include > > /usr/include > > /usr/local/cuda/include > > End of search list. > > > > "/usr/local/cuda/bin/ptxas" -m64 -O0 --gpu-name sm_50 --output-file > > /tmp/scalarProd-181f7e.o /tmp/scalarProd-32a530.s > > ptxas fatal : Unresolved extern function '__nv_mul24' > > clang-3.9: error: ptxas command failed with exit code 255 (use -v to see > > invocation) > > > > Thanks! > > Yuanfeng > > > > On Mon, Aug 1, 2016 at 1:04 AM, Justin Lebar <jlebar at google.com> wrote: > >> > >> Hi, Yuanfeng. > >> > >> What version of clang are you using? CUDA is only known to work at > >> tip of head, so you must build clang yourself from source. > >> > >> I suspect that's your problem, but if building from source doesn't fix > >> it, please attach the output of compiling with -v. > >> > >> Regards, > >> -Justin > >> > >> On Sun, Jul 31, 2016 at 9:24 PM, Chandler Carruth <chandlerc at google.com > > > >> wrote: > >> > Directly CC-ing some folks who may be able to help. > >> > > >> > On Fri, Jul 29, 2016 at 6:27 AM Yuanfeng Peng via llvm-dev > >> > <llvm-dev at lists.llvm.org> wrote: > >> >> > >> >> Hi, > >> >> > >> >> I was trying to compile scalarProd.cu (from CUDA SDK) with the > >> >> following > >> >> command: > >> >> > >> >> clang++ -I../ -I/usr/local/cuda-7.0/samples/common/inc > >> >> --cuda-gpu-arch=sm_50 scalarProd.cu > >> >> > >> >> but ended up with the following error: > >> >> > >> >> ptxas fatal : Unresolved extern function '__nv_mul24' > >> >> > >> >> Seems to me that libdevice was not automatically linked. I wonder > what > >> >> flags I need to pass to clang to have the code linked against > >> >> libdevice? > >> >> > >> >> Thanks! > >> >> Yuanfeng Peng > >> >> _______________________________________________ > >> >> LLVM Developers mailing list > >> >> llvm-dev at lists.llvm.org > >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160801/f24898b4/attachment-0001.html>
Mueller-Roemer, Johannes Sebastian via llvm-dev
2016-Aug-01 09:06 UTC
[llvm-dev] [GPUCC] link against libdevice
According to http://docs.nvidia.com/cuda/libdevice-users-guide/basic-usage.html#version-selection compute capabilities > 3.7 should use libdevice.compute_30.XX.bc -----Original Message----- From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Justin Lebar via llvm-dev Sent: Monday, August 1, 2016 08:33 To: Yuanfeng Peng <yuanfeng at cis.upenn.edu> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] [GPUCC] link against libdevice OK, I see the problem. You were right that we weren't picking up libdevice. CUDA 7.0 only ships with the following libdevice binaries (found /path/to/cuda/nvvm/libdevice): libdevice.compute_20.10.bc libdevice.compute_30.10.bc libdevice.compute_35.10.bc If you ask for sm_50 with cuda 7.0, clang can't find a matching libdevice binary, and it will apparently silently give up and try to continue compiling your program. That's a bug that we should fix. (If you want the current behavior, you should have to ask clang not to use libdevice.) I see that nvcc from cuda 7.0 works (or at least builds without error). I guess it uses the libdevice for compute_35. We could do the same thing, although I am not sure how to tell whether that's safe in general. I'll look into this as well. Anyway if you build with CUDA 7.5 your problem should go away, because CUDA 7.5 has a libdevice binary for compute_50. Just pass --cuda-path=/path/to/cuda-7.5. Alternatively you could continue building with cuda 7.0 and pass sm_35 as your gpu arch. clang always embeds ptx in the binaries, so the result should still run on your sm_50 card (although your machine will have to jit the ptx on startup). As a third alternative, you could symlink your libdevice.compute_35.10.bc to libdevice.compute_50.10.bc, and...maybe that would work? If you do that, please let me know how it goes, I am curious. :) Thank you very much for the bug report! If you like I'll cc you on any relevant changes, just create an account at https://reviews.llvm.org (if necessary; I can't seem to find you) and let me know your username. Regards, -Justin On Sun, Jul 31, 2016 at 10:59 PM, Yuanfeng Peng <yuanfeng at cis.upenn.edu> wrote:> Hi Justin, > > Thanks for your response! The clang & llvm I'm using was built from source. > > Below is the output of compiling with -v. Any suggestions would be > appreciated! > > clang version 3.9.0 (trunk 270145) (llvm/trunk 270133) > Target: x86_64-unknown-linux-gnu > Thread model: posix > InstalledDir: /usr/local/bin > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8 > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8.4 > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9 > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.3 > Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8 Candidate > multilib: .;@m64 Candidate multilib: 32;@m32 Candidate multilib: > x32;@mx32 Selected multilib: .;@m64 Found CUDA installation: > /usr/local/cuda "/usr/local/bin/clang-3.9" -cc1 -triple > nvptx64-nvidia-cuda -aux-triple x86_64-unknown-linux-gnu -S > -disable-free -main-file-name scalarProd.cu -mrelocation-model static > -mthread-model posix -mdisable-fp-elim -fmath-errno -no-integrated-as > -fcuda-is-device -target-cpu sm_50 -v -dwarf-column-info > -debugger-tuning=gdb -resource-dir > /usr/local/bin/../lib/clang/3.9.0 -I ../ -I > /usr/local/cuda-7.0/samples/common/inc -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8 > -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu > /c++/4.8 > -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu > /c++/4.8 > -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward > -internal-isystem /usr/local/include -internal-isystem > /usr/local/bin/../lib/clang/3.9.0/include -internal-externc-isystem > /include -internal-externc-isystem /usr/include -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8 > -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu > /c++/4.8 > -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu > /c++/4.8 > -internal-isystem > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward > -internal-isystem /usr/local/cuda/include -include > __clang_cuda_runtime_wrapper.h -fdeprecated-macro > -fno-dwarf-directory-asm -fdebug-compilation-dir > /mnt/wtf/workspace/cuda/gpu-race-detection/cuda-compressed-conflict-de > tection/scalarProd -ferror-limit 19 -fmessage-length 144 > -fobjc-runtime=gcc -fcxx-exceptions -fexceptions > -fdiagnostics-show-option -o /tmp/scalarProd-32a530.s -x cuda > scalarProd.cu hooklib.so loading. > clang -cc1 version 3.9.0 based upon LLVM 3.9.0svn default target > x86_64-unknown-linux-gnu ignoring nonexistent directory "/include" > ignoring duplicate directory > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" > ignoring duplicate directory > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8" > ignoring duplicate directory > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" > ignoring duplicate directory > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" > ignoring duplicate directory > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward" > ignoring duplicate directory "/usr/local/include" > ignoring duplicate directory "/usr/local/bin/../lib/clang/3.9.0/include" > ignoring duplicate directory "/usr/include" > #include "..." search starts here: > #include <...> search starts here: > .. > /usr/local/cuda-7.0/samples/common/inc > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8 > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu > /c++/4.8 > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward > /usr/local/include > /usr/local/bin/../lib/clang/3.9.0/include > /usr/include > /usr/local/cuda/include > End of search list. > > "/usr/local/cuda/bin/ptxas" -m64 -O0 --gpu-name sm_50 --output-file > /tmp/scalarProd-181f7e.o /tmp/scalarProd-32a530.s > ptxas fatal : Unresolved extern function '__nv_mul24' > clang-3.9: error: ptxas command failed with exit code 255 (use -v to > see > invocation) > > Thanks! > Yuanfeng > > On Mon, Aug 1, 2016 at 1:04 AM, Justin Lebar <jlebar at google.com> wrote: >> >> Hi, Yuanfeng. >> >> What version of clang are you using? CUDA is only known to work at >> tip of head, so you must build clang yourself from source. >> >> I suspect that's your problem, but if building from source doesn't >> fix it, please attach the output of compiling with -v. >> >> Regards, >> -Justin >> >> On Sun, Jul 31, 2016 at 9:24 PM, Chandler Carruth >> <chandlerc at google.com> >> wrote: >> > Directly CC-ing some folks who may be able to help. >> > >> > On Fri, Jul 29, 2016 at 6:27 AM Yuanfeng Peng via llvm-dev >> > <llvm-dev at lists.llvm.org> wrote: >> >> >> >> Hi, >> >> >> >> I was trying to compile scalarProd.cu (from CUDA SDK) with the >> >> following >> >> command: >> >> >> >> clang++ -I../ -I/usr/local/cuda-7.0/samples/common/inc >> >> --cuda-gpu-arch=sm_50 scalarProd.cu >> >> >> >> but ended up with the following error: >> >> >> >> ptxas fatal : Unresolved extern function '__nv_mul24' >> >> >> >> Seems to me that libdevice was not automatically linked. I wonder >> >> what flags I need to pass to clang to have the code linked against >> >> libdevice? >> >> >> >> Thanks! >> >> Yuanfeng Peng >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> llvm-dev at lists.llvm.org >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >_______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Artem Belevich via llvm-dev
2016-Aug-02 23:27 UTC
[llvm-dev] [GPUCC] link against libdevice
After r277542 clang should fix the problem: * clang now picks correct libdevice version * clang reports an error if required libdevice library is not found. See https://reviews.llvm.org/D23037 for details. --Artem On Mon, Aug 1, 2016 at 2:06 AM, Mueller-Roemer, Johannes Sebastian via llvm-dev <llvm-dev at lists.llvm.org> wrote:> According to > http://docs.nvidia.com/cuda/libdevice-users-guide/basic-usage.html#version-selection > compute capabilities > 3.7 should use libdevice.compute_30.XX.bc > > -----Original Message----- > From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of > Justin Lebar via llvm-dev > Sent: Monday, August 1, 2016 08:33 > To: Yuanfeng Peng <yuanfeng at cis.upenn.edu> > Cc: llvm-dev <llvm-dev at lists.llvm.org> > Subject: Re: [llvm-dev] [GPUCC] link against libdevice > > OK, I see the problem. You were right that we weren't picking up > libdevice. > > CUDA 7.0 only ships with the following libdevice binaries (found > /path/to/cuda/nvvm/libdevice): > > libdevice.compute_20.10.bc libdevice.compute_30.10.bc > libdevice.compute_35.10.bc > > If you ask for sm_50 with cuda 7.0, clang can't find a matching libdevice > binary, and it will apparently silently give up and try to continue > compiling your program. That's a bug that we should fix. > (If you want the current behavior, you should have to ask clang not to use > libdevice.) > > I see that nvcc from cuda 7.0 works (or at least builds without error). I > guess it uses the libdevice for compute_35. We could do the same thing, > although I am not sure how to tell whether that's safe in general. I'll > look into this as well. > > Anyway if you build with CUDA 7.5 your problem should go away, because > CUDA 7.5 has a libdevice binary for compute_50. Just pass > --cuda-path=/path/to/cuda-7.5. Alternatively you could continue building > with cuda 7.0 and pass sm_35 as your gpu arch. clang always embeds ptx in > the binaries, so the result should still run on your > sm_50 card (although your machine will have to jit the ptx on startup). > > As a third alternative, you could symlink your libdevice.compute_35.10.bc > to libdevice.compute_50.10.bc, and...maybe that would work? If you do > that, please let me know how it goes, I am curious. :) > > Thank you very much for the bug report! If you like I'll cc you on any > relevant changes, just create an account at https://reviews.llvm.org (if > necessary; I can't seem to find you) and let me know your username. > > Regards, > -Justin > > On Sun, Jul 31, 2016 at 10:59 PM, Yuanfeng Peng <yuanfeng at cis.upenn.edu> > wrote: > > Hi Justin, > > > > Thanks for your response! The clang & llvm I'm using was built from > source. > > > > Below is the output of compiling with -v. Any suggestions would be > > appreciated! > > > > clang version 3.9.0 (trunk 270145) (llvm/trunk 270133) > > Target: x86_64-unknown-linux-gnu > > Thread model: posix > > InstalledDir: /usr/local/bin > > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8 > > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8.4 > > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9 > > Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.3 > > Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8 Candidate > > multilib: .;@m64 Candidate multilib: 32;@m32 Candidate multilib: > > x32;@mx32 Selected multilib: .;@m64 Found CUDA installation: > > /usr/local/cuda "/usr/local/bin/clang-3.9" -cc1 -triple > > nvptx64-nvidia-cuda -aux-triple x86_64-unknown-linux-gnu -S > > -disable-free -main-file-name scalarProd.cu -mrelocation-model static > > -mthread-model posix -mdisable-fp-elim -fmath-errno -no-integrated-as > > -fcuda-is-device -target-cpu sm_50 -v -dwarf-column-info > > -debugger-tuning=gdb -resource-dir > > /usr/local/bin/../lib/clang/3.9.0 -I ../ -I > > /usr/local/cuda-7.0/samples/common/inc -internal-isystem > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8 > > -internal-isystem > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu > > /c++/4.8 > > -internal-isystem > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu > > /c++/4.8 > > -internal-isystem > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward > > -internal-isystem /usr/local/include -internal-isystem > > /usr/local/bin/../lib/clang/3.9.0/include -internal-externc-isystem > > /include -internal-externc-isystem /usr/include -internal-isystem > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8 > > -internal-isystem > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu > > /c++/4.8 > > -internal-isystem > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu > > /c++/4.8 > > -internal-isystem > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward > > -internal-isystem /usr/local/cuda/include -include > > __clang_cuda_runtime_wrapper.h -fdeprecated-macro > > -fno-dwarf-directory-asm -fdebug-compilation-dir > > /mnt/wtf/workspace/cuda/gpu-race-detection/cuda-compressed-conflict-de > > tection/scalarProd -ferror-limit 19 -fmessage-length 144 > > -fobjc-runtime=gcc -fcxx-exceptions -fexceptions > > -fdiagnostics-show-option -o /tmp/scalarProd-32a530.s -x cuda > > scalarProd.cu hooklib.so loading. > > clang -cc1 version 3.9.0 based upon LLVM 3.9.0svn default target > > x86_64-unknown-linux-gnu ignoring nonexistent directory "/include" > > ignoring duplicate directory > > > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" > > ignoring duplicate directory > > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8" > > ignoring duplicate directory > > > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" > > ignoring duplicate directory > > > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu/c++/4.8" > > ignoring duplicate directory > > "/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward" > > ignoring duplicate directory "/usr/local/include" > > ignoring duplicate directory "/usr/local/bin/../lib/clang/3.9.0/include" > > ignoring duplicate directory "/usr/include" > > #include "..." search starts here: > > #include <...> search starts here: > > .. > > /usr/local/cuda-7.0/samples/common/inc > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8 > > > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/x86_64-linux-gnu > > /c++/4.8 > > /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/backward > > /usr/local/include > > /usr/local/bin/../lib/clang/3.9.0/include > > /usr/include > > /usr/local/cuda/include > > End of search list. > > > > "/usr/local/cuda/bin/ptxas" -m64 -O0 --gpu-name sm_50 --output-file > > /tmp/scalarProd-181f7e.o /tmp/scalarProd-32a530.s > > ptxas fatal : Unresolved extern function '__nv_mul24' > > clang-3.9: error: ptxas command failed with exit code 255 (use -v to > > see > > invocation) > > > > Thanks! > > Yuanfeng > > > > On Mon, Aug 1, 2016 at 1:04 AM, Justin Lebar <jlebar at google.com> wrote: > >> > >> Hi, Yuanfeng. > >> > >> What version of clang are you using? CUDA is only known to work at > >> tip of head, so you must build clang yourself from source. > >> > >> I suspect that's your problem, but if building from source doesn't > >> fix it, please attach the output of compiling with -v. > >> > >> Regards, > >> -Justin > >> > >> On Sun, Jul 31, 2016 at 9:24 PM, Chandler Carruth > >> <chandlerc at google.com> > >> wrote: > >> > Directly CC-ing some folks who may be able to help. > >> > > >> > On Fri, Jul 29, 2016 at 6:27 AM Yuanfeng Peng via llvm-dev > >> > <llvm-dev at lists.llvm.org> wrote: > >> >> > >> >> Hi, > >> >> > >> >> I was trying to compile scalarProd.cu (from CUDA SDK) with the > >> >> following > >> >> command: > >> >> > >> >> clang++ -I../ -I/usr/local/cuda-7.0/samples/common/inc > >> >> --cuda-gpu-arch=sm_50 scalarProd.cu > >> >> > >> >> but ended up with the following error: > >> >> > >> >> ptxas fatal : Unresolved extern function '__nv_mul24' > >> >> > >> >> Seems to me that libdevice was not automatically linked. I wonder > >> >> what flags I need to pass to clang to have the code linked against > >> >> libdevice? > >> >> > >> >> Thanks! > >> >> Yuanfeng Peng > >> >> _______________________________________________ > >> >> LLVM Developers mailing list > >> >> llvm-dev at lists.llvm.org > >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- --Artem Belevich -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160802/e33ec8a1/attachment.html>