Clang currently doesn't support CUDA separate compilation and thus extern __device__ functions and variables cannot be used. Could someone give me any pointers where to look or what has to be done to support this? If at all possible, I'd like to see what's missing and possibly try to tackle it. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/453c6b1a/attachment.html>
[+Doru, Carlo] On 08/16/2017 05:28 PM, Jakub Beránek via llvm-dev wrote:> Clang currently doesn't support CUDA separate compilation and thus > extern __device__ functions and variables cannot be used. > > Could someone give me any pointers where to look or what has to be > done to support this? If at all possible, I'd like to see what's > missing and possibly try to tackle it. > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/dae01744/attachment.html>
Indeed, Clang currently treats device-side compilation in each CUDA file as a whole program compilation. I.e. the result of it is a GPU executable. I think clang should be able to compile code with 'extern __device__', and it's ptxas that's unhappy to see unresolved symbols because it expects to see the whole program. Someone/somewhere needs to eventually turn GPU object file into a GPU executable. It should be possible to make it work. First step is to tell ptxas to compile to a GPU object file: * Add separate compilation support in driver. At the very minimum driver should pass appropriate flags to ptxas and warn/error if it's not supported. You may be able to get by with just "-Xcuda-ptxas -c" + external nvlink of CUDA files into a single partially linked .o. As for who does GPU-side linking, we should perhaps consider running nvlink completely outside of clang. I.e. clang will produce GPU object files, if required, but it would be up to user's build system to link them together with nvlink before the final linking of the host executable. If that's acceptable, that's probably all you need. If you want clang to do GPU-side linking, then there are more things to do. * Figure out who's supposed to run nvlink. Driver will need to be augmented to run it. The problem here is that there's no easy way to tell if any of the given .o files given to clang during linking phase contain GPU executables, so it will most likely be controlled by a global flag which would insert another step into the compilation pipeline which will invoke nvlink on object files and would pass .o with partially linked host+GPU executable to the host linker. * Figure out whether the way GPU binaries are embedded in host .o is compatible with nvlink and implement missing bits, if necessary. * Figure out whether .o (or executable) produced by nvlink is something that Clang-generated init code can still work with. Fix it, if broken. This should be about it. --Artem On Wed, Aug 16, 2017 at 3:28 PM, Jakub Beránek via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Clang currently doesn't support CUDA separate compilation and thus extern > __device__ functions and variables cannot be used. > > Could someone give me any pointers where to look or what has to be done to > support this? If at all possible, I'd like to see what's missing and > possibly try to tackle it. > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-- --Artem Belevich -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170828/ea40374b/attachment.html>
Thank you for clarification, I'll check the driver and see if I can make compiling to GPU object files + nvlink work. Kuba Beránek 2017-08-28 23:02 GMT+02:00 Artem Belevich <tra at google.com>:> Indeed, Clang currently treats device-side compilation in each CUDA file > as a whole program compilation. I.e. the result of it is a GPU executable. > > I think clang should be able to compile code with 'extern __device__', and > it's ptxas that's unhappy to see unresolved symbols because it expects to > see the whole program. Someone/somewhere needs to eventually turn GPU > object file into a GPU executable. > > It should be possible to make it work. > First step is to tell ptxas to compile to a GPU object file: > * Add separate compilation support in driver. At the very minimum driver > should pass appropriate flags to ptxas and warn/error if it's not > supported. You may be able to get by with just "-Xcuda-ptxas -c" + external > nvlink of CUDA files into a single partially linked .o. > > As for who does GPU-side linking, we should perhaps consider running > nvlink completely outside of clang. I.e. clang will produce GPU object > files, if required, but it would be up to user's build system to link them > together with nvlink before the final linking of the host executable. If > that's acceptable, that's probably all you need. > > If you want clang to do GPU-side linking, then there are more things to do. > * Figure out who's supposed to run nvlink. Driver will need to be > augmented to run it. The problem here is that there's no easy way to tell > if any of the given .o files given to clang during linking phase contain > GPU executables, so it will most likely be controlled by a global flag > which would insert another step into the compilation pipeline which will > invoke nvlink on object files and would pass .o with partially linked > host+GPU executable to the host linker. > * Figure out whether the way GPU binaries are embedded in host .o is > compatible with nvlink and implement missing bits, if necessary. > * Figure out whether .o (or executable) produced by nvlink is something > that Clang-generated init code can still work with. Fix it, if broken. > > This should be about it. > > --Artem > > > On Wed, Aug 16, 2017 at 3:28 PM, Jakub Beránek via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Clang currently doesn't support CUDA separate compilation and thus extern >> __device__ functions and variables cannot be used. >> >> Could someone give me any pointers where to look or what has to be done >> to support this? If at all possible, I'd like to see what's missing and >> possibly try to tackle it. >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > > > -- > --Artem Belevich >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170829/e8e4c2a4/attachment.html>