thr3ads.net - llvm dev - [llvm-dev] CUDA separate compilation [Aug 2017]

If this information is useful, please help other people find it:
Share via:

Jakub Beránek via llvm-dev

2017-Aug-16 22:28 UTC

[llvm-dev] CUDA separate compilation

Clang currently doesn't support CUDA separate compilation and thus extern
__device__ functions and variables cannot be used.

Could someone give me any pointers where to look or what has to be done to
support this? If at all possible, I'd like to see what's missing and
possibly try to tackle it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/453c6b1a/attachment.html>

Hal Finkel via llvm-dev

2017-Aug-17 05:25 UTC

head link

[llvm-dev] CUDA separate compilation

[+Doru, Carlo]

On 08/16/2017 05:28 PM, Jakub Beránek via llvm-dev
wrote:> Clang currently doesn't support CUDA separate compilation and thus 
> extern __device__ functions and variables cannot be used.
>
> Could someone give me any pointers where to look or what has to be 
> done to support this? If at all possible, I'd like to see what's 
> missing and possibly try to tackle it.
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170817/dae01744/attachment.html>

Artem Belevich via llvm-dev

2017-Aug-28 21:02 UTC

head link

[llvm-dev] CUDA separate compilation

Indeed, Clang currently treats device-side compilation in each CUDA file as
a whole program compilation. I.e. the result of it is a GPU executable.

I think clang should be able to compile code with 'extern __device__',
and
it's ptxas that's unhappy to see unresolved symbols because it expects
to
see the whole program. Someone/somewhere needs to eventually turn GPU
object file into a GPU executable.

It should be possible to make it work.
First step is to tell ptxas to compile to a GPU object file:
* Add separate compilation support in driver. At the very minimum driver
should pass appropriate flags to ptxas and warn/error if it's not
supported. You may be able to get by with just "-Xcuda-ptxas -c" +
external
nvlink of CUDA files into a single partially linked .o.

As for who does GPU-side linking, we should perhaps consider running nvlink
completely outside of clang. I.e. clang will produce GPU object files, if
required, but it would be up to user's build system to link them together
with nvlink before the final linking of the host executable. If that's
acceptable, that's probably all you need.

If you want clang to do GPU-side linking, then there are more things to do.
* Figure out who's supposed to run nvlink. Driver will need to be augmented
to run it. The problem here is that there's no easy way to tell if any of
the given .o files given to clang during linking phase contain GPU
executables, so it will most likely be controlled by a global flag which
would insert another step into the compilation pipeline which will invoke
nvlink on object files and would pass .o with partially linked host+GPU
executable to the host linker.
* Figure out whether the way GPU binaries are embedded in host .o is
compatible with nvlink and implement missing bits, if necessary.
* Figure out whether .o (or executable) produced by nvlink is something
that Clang-generated init code can still work with. Fix it, if broken.

This should be about it.

--Artem

On Wed, Aug 16, 2017 at 3:28 PM, Jakub Beránek via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Clang currently doesn't support CUDA separate compilation and thus
extern
> __device__ functions and variables cannot be used.
>
> Could someone give me any pointers where to look or what has to be done to
> support this? If at all possible, I'd like to see what's missing
and
> possibly try to tackle it.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>

-- 
--Artem Belevich
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170828/ea40374b/attachment.html>

Jakub Beránek via llvm-dev

2017-Aug-29 09:34 UTC

head link

[llvm-dev] CUDA separate compilation

Thank you for clarification, I'll check the driver and see if I can make
compiling to GPU object files + nvlink work.

Kuba Beránek

2017-08-28 23:02 GMT+02:00 Artem Belevich <tra at google.com>:
> Indeed, Clang currently treats device-side compilation in each CUDA file
> as a whole program compilation. I.e. the result of it is a GPU executable.
>
> I think clang should be able to compile code with 'extern
__device__', and
> it's ptxas that's unhappy to see unresolved symbols because it
expects to
> see the whole program. Someone/somewhere needs to eventually turn GPU
> object file into a GPU executable.
>
> It should be possible to make it work.
> First step is to tell ptxas to compile to a GPU object file:
> * Add separate compilation support in driver. At the very minimum driver
> should pass appropriate flags to ptxas and warn/error if it's not
> supported. You may be able to get by with just "-Xcuda-ptxas -c"
+ external
> nvlink of CUDA files into a single partially linked .o.
>
> As for who does GPU-side linking, we should perhaps consider running
> nvlink completely outside of clang. I.e. clang will produce GPU object
> files, if required, but it would be up to user's build system to link
them
> together with nvlink before the final linking of the host executable. If
> that's acceptable, that's probably all you need.
>
> If you want clang to do GPU-side linking, then there are more things to do.
> * Figure out who's supposed to run nvlink. Driver will need to be
> augmented to run it. The problem here is that there's no easy way to
tell
> if any of the given .o files given to clang during linking phase contain
> GPU executables, so it will most likely be controlled by a global flag
> which would insert another step into the compilation pipeline which will
> invoke nvlink on object files and would pass .o with partially linked
> host+GPU executable to the host linker.
> * Figure out whether the way GPU binaries are embedded in host .o is
> compatible with nvlink and implement missing bits, if necessary.
> * Figure out whether .o (or executable) produced by nvlink is something
> that Clang-generated init code can still work with. Fix it, if broken.
>
> This should be about it.
>
> --Artem
>
>
> On Wed, Aug 16, 2017 at 3:28 PM, Jakub Beránek via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Clang currently doesn't support CUDA separate compilation and thus
extern
>> __device__ functions and variables cannot be used.
>>
>> Could someone give me any pointers where to look or what has to be done
>> to support this? If at all possible, I'd like to see what's
missing and
>> possibly try to tackle it.
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
>
> --
> --Artem Belevich
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170829/e8e4c2a4/attachment.html>

llvm dev - Aug 2017 - CUDA separate compilation

[llvm-dev] CUDA separate compilation

[llvm-dev] CUDA separate compilation

[llvm-dev] CUDA separate compilation

[llvm-dev] CUDA separate compilation