thr3ads.net - llvm dev - [llvm-dev] Executing OpenMP 4.0 code on Nvidia's GPU [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Ahmed ElTantawy via llvm-dev

2016-Jan-20 13:09 UTC

[llvm-dev] Executing OpenMP 4.0 code on Nvidia's GPU

Hi Arpith,

That is exactly what it is :).

My bad, I thought I copied over the libraries to where LIBRARY_PATH
pointing but apparently it was copied to a wrong destination.

Thanks a lot.

On Wed, Jan 20, 2016 at 4:51 AM, Arpith C Jacob <acjacob at us.ibm.com>
wrote:
> Hi Ahmed,
>
> nvlink is unable to find the GPU OMP runtime library in its path. Does
> LIBRARY_PATH point to the right location? You could try passing the
"-v"
> option to clang to get more information.
>
> Regards,
> Arpith
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160120/0a8a7e1f/attachment.html>

Ahmed ElTantawy via llvm-dev

2016-Jan-20 13:44 UTC

head link

[llvm-dev] Executing OpenMP 4.0 code on Nvidia's GPU

Hi,

I see now that the linking happens at the binary level. I was wondering
whether it is possible to link to the OpenMP runtime library at the LLVM IR
level (to enable LTO optimizations for the code after library calls has
been replaced).

I have done this before by linking to the bitcode of a file that contains
the compiled CUDA implementation of the OpenMP runtime library. But it was
a bit hacky, and offloading was not supported yet. Is it there a
cleaner/standard way to do this ?

Thanks.

On Wed, Jan 20, 2016 at 5:09 AM, Ahmed ElTantawy <ahmede at ece.ubc.ca>
wrote:
> Hi Arpith,
>
> That is exactly what it is :).
>
> My bad, I thought I copied over the libraries to where LIBRARY_PATH
> pointing but apparently it was copied to a wrong destination.
>
> Thanks a lot.
>
> On Wed, Jan 20, 2016 at 4:51 AM, Arpith C Jacob <acjacob at
us.ibm.com>
> wrote:
>
>> Hi Ahmed,
>>
>> nvlink is unable to find the GPU OMP runtime library in its path. Does
>> LIBRARY_PATH point to the right location? You could try passing the
"-v"
>> option to clang to get more information.
>>
>> Regards,
>> Arpith
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160120/51e5d184/attachment.html>

Arpith C Jacob via llvm-dev

2016-Jan-20 15:07 UTC

head link

[llvm-dev] Executing OpenMP 4.0 code on Nvidia's GPU

Hi Ahmed,

I am experimenting with LTO, but as you said, it's still *very* hacky.

Here's what I did.  First compile the CUDA GPU OMP runtime with Clang
(rather than nvcc) to bitcode.  When I looked at Clang-CUDA a couple of
weeks ago I could only get device side bitcode by using the temporary files
generated after passing -save-temps to Clang.  The OMP-GPU version of LLVM
that you are using is not up to date with trunk, so I had to do a bit of
massaging on the generated IR.

I then had to manually link the various device side bitcodes, call opt,
llc, ptxas, and finally link it with the host object file.

We don't have support for this in the driver as yet but once we move to
trunk I will look into streamlining this.

Thanks,
Arpith



From:	Ahmed ElTantawy <ahmede at ece.ubc.ca>
To:	Arpith C Jacob/Watson/IBM at IBMUS
Cc:	llvm-dev at lists.llvm.org, "Bataev, Alexey"
            <alexey.bataev at intel.com>
Date:	01/20/2016 08:44 AM
Subject:	Re: Executing OpenMP 4.0 code on Nvidia's GPU
Sent by:	ahmed.mohammed.eltantawy at gmail.com



Hi,

I see now that the linking happens at the binary level. I was wondering
whether it is possible to link to the OpenMP runtime library at the LLVM IR
level (to enable LTO optimizations for the code after library calls has
been replaced).

I have done this before by linking to the bitcode of a file that contains
the compiled CUDA implementation of the OpenMP runtime library. But it was
a bit hacky, and offloading was not supported yet. Is it there a
cleaner/standard way to do this ?

Thanks.

On Wed, Jan 20, 2016 at 5:09 AM, Ahmed ElTantawy <ahmede at ece.ubc.ca>
wrote:
  Hi Arpith,

  That is exactly what it is :).

  My bad, I thought I copied over the libraries to where LIBRARY_PATH
  pointing but apparently it was copied to a wrong destination.

  Thanks a lot.

  On Wed, Jan 20, 2016 at 4:51 AM, Arpith C Jacob <acjacob at us.ibm.com>
  wrote:
   Hi Ahmed,

   nvlink is unable to find the GPU OMP runtime library in its path. Does
   LIBRARY_PATH point to the right location? You could try passing the
"-v"
   option to clang to get more information.

   Regards,
   Arpith




-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160120/2087e96a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160120/2087e96a/attachment.gif>

Ahmed ElTantawy via llvm-dev

2016-Jan-21 09:49 UTC

head link

[llvm-dev] Executing OpenMP 4.0 code on Nvidia's GPU

Thanks Arpith.

I was doing it in almost the same way but with nvcc (or apc-llc
<https://github.com/apc-llc/nvcc-llvm-ir>), and of course I had to make
the
produced LLVM-IR matches my version of LLVM.

But, I would imagine it will less messy if I can compile CUDA GPU OMP
runtime with Clang directly. I found that there is a patch that was
committed recently to enable compiling CUDA with clang (
http://llvm.org/docs/CompileCudaWithLLVM.html).

Do you know if there is any restriction about the CUDA version for the
compilation of CUDA with clang to work ?

Thanks a lot

On Wed, Jan 20, 2016 at 7:07 AM, Arpith C Jacob <acjacob at us.ibm.com>
wrote:
> Hi Ahmed,
>
> I am experimenting with LTO, but as you said, it's still *very* hacky.
>
> Here's what I did. First compile the CUDA GPU OMP runtime with Clang
> (rather than nvcc) to bitcode. When I looked at Clang-CUDA a couple of
> weeks ago I could only get device side bitcode by using the temporary files
> generated after passing -save-temps to Clang. The OMP-GPU version of LLVM
> that you are using is not up to date with trunk, so I had to do a bit of
> massaging on the generated IR.
>
> I then had to manually link the various device side bitcodes, call opt,
> llc, ptxas, and finally link it with the host object file.
>
> We don't have support for this in the driver as yet but once we move to
> trunk I will look into streamlining this.
>
> Thanks,
> Arpith
>
> [image: Inactive hide details for Ahmed ElTantawy ---01/20/2016 08:44:38
> AM---Hi, I see now that the linking happens at the binary leve]Ahmed
> ElTantawy ---01/20/2016 08:44:38 AM---Hi, I see now that the linking
> happens at the binary level. I was wondering
>
> From: Ahmed ElTantawy <ahmede at ece.ubc.ca>
> To: Arpith C Jacob/Watson/IBM at IBMUS
> Cc: llvm-dev at lists.llvm.org, "Bataev, Alexey"
<alexey.bataev at intel.com>
> Date: 01/20/2016 08:44 AM
> Subject: Re: Executing OpenMP 4.0 code on Nvidia's GPU
> Sent by: ahmed.mohammed.eltantawy at gmail.com
> ------------------------------
>
>
>
> Hi,
>
> I see now that the linking happens at the binary level. I was wondering
> whether it is possible to link to the OpenMP runtime library at the LLVM IR
> level (to enable LTO optimizations for the code after library calls has
> been replaced).
>
> I have done this before by linking to the bitcode of a file that contains
> the compiled CUDA implementation of the OpenMP runtime library. But it was
> a bit hacky, and offloading was not supported yet. Is it there a
> cleaner/standard way to do this ?
>
> Thanks.
>
> On Wed, Jan 20, 2016 at 5:09 AM, Ahmed ElTantawy <*ahmede at ece.ubc.ca*
> <ahmede at ece.ubc.ca>> wrote:
>
>    Hi Arpith,
>
>    That is exactly what it is :).
>
>    My bad, I thought I copied over the libraries to where LIBRARY_PATH
>    pointing but apparently it was copied to a wrong destination.
>
>    Thanks a lot.
>
>    On Wed, Jan 20, 2016 at 4:51 AM, Arpith C Jacob <*acjacob at
us.ibm.com*
>    <acjacob at us.ibm.com>> wrote:
>    Hi Ahmed,
>
>    nvlink is unable to find the GPU OMP runtime library in its path. Does
>    LIBRARY_PATH point to the right location? You could try passing the
"-v"
>    option to clang to get more information.
>
>    Regards,
>    Arpith
>
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160121/07f504eb/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160121/07f504eb/attachment.gif>

Arpith C Jacob via llvm-dev

2016-Jan-21 12:04 UTC

head link

[llvm-dev] Executing OpenMP 4.0 code on Nvidia's GPU

Hi Ahmed,

I was able to compile the CUDA GPU OMP runtime with Clang directly.  I did
have to remove some printf statements and asserts to get it to go through
Clang-CUDA.  Depending on your configuration, you may need some or all of
these flags: --cuda-gpu-arch=sm_35 -nocudalib -DOMPTARGET_NVPTX_TEST=0
-DOMPTARGET_NVPTX_DEBUG=0 -DOMPTARGET_NVPTX_WARNING=0

Cheers,
Arpith



From:	Ahmed ElTantawy <ahmede at ece.ubc.ca>
To:	Arpith C Jacob/Watson/IBM at IBMUS
Cc:	llvm-dev at lists.llvm.org, "Bataev, Alexey"
            <alexey.bataev at intel.com>
Date:	01/21/2016 04:49 AM
Subject:	Re: Executing OpenMP 4.0 code on Nvidia's GPU
Sent by:	ahmed.mohammed.eltantawy at gmail.com



Thanks Arpith.

I was doing it in almost the same way but with nvcc (or apc-llc), and of
course I had to make the produced LLVM-IR matches my version of LLVM.

But, I would imagine it will less messy if I can compile CUDA GPU OMP
runtime with Clang directly. I found that there is a patch that was
committed recently to enable compiling CUDA with clang (
http://llvm.org/docs/CompileCudaWithLLVM.html).

Do you know if there is any restriction about the CUDA version for the
compilation of CUDA with clang to work ?

Thanks a lot

On Wed, Jan 20, 2016 at 7:07 AM, Arpith C Jacob <acjacob at us.ibm.com>
wrote:
  Hi Ahmed,

  I am experimenting with LTO, but as you said, it's still *very* hacky.

  Here's what I did. First compile the CUDA GPU OMP runtime with Clang
  (rather than nvcc) to bitcode. When I looked at Clang-CUDA a couple of
  weeks ago I could only get device side bitcode by using the temporary
  files generated after passing -save-temps to Clang. The OMP-GPU version
  of LLVM that you are using is not up to date with trunk, so I had to do a
  bit of massaging on the generated IR.

  I then had to manually link the various device side bitcodes, call opt,
  llc, ptxas, and finally link it with the host object file.

  We don't have support for this in the driver as yet but once we move to
  trunk I will look into streamlining this.

  Thanks,
  Arpith

  Inactive hide details for Ahmed ElTantawy ---01/20/2016 08:44:38 AM---Hi,
  I see now that the linking happens at the binary leveAhmed ElTantawy
  ---01/20/2016 08:44:38 AM---Hi, I see now that the linking happens at the
  binary level. I was wondering

  From: Ahmed ElTantawy <ahmede at ece.ubc.ca>
  To: Arpith C Jacob/Watson/IBM at IBMUS
  Cc: llvm-dev at lists.llvm.org, "Bataev, Alexey" <alexey.bataev
at intel.com>
  Date: 01/20/2016 08:44 AM
  Subject: Re: Executing OpenMP 4.0 code on Nvidia's GPU
  Sent by: ahmed.mohammed.eltantawy at gmail.com




  Hi,

  I see now that the linking happens at the binary level. I was wondering
  whether it is possible to link to the OpenMP runtime library at the LLVM
  IR level (to enable LTO optimizations for the code after library calls
  has been replaced).

  I have done this before by linking to the bitcode of a file that contains
  the compiled CUDA implementation of the OpenMP runtime library. But it
  was a bit hacky, and offloading was not supported yet. Is it there a
  cleaner/standard way to do this ?

  Thanks.

  On Wed, Jan 20, 2016 at 5:09 AM, Ahmed ElTantawy <ahmede at ece.ubc.ca>
  wrote:
        Hi Arpith,

        That is exactly what it is :).

        My bad, I thought I copied over the libraries to where LIBRARY_PATH
        pointing but apparently it was copied to a wrong destination.

        Thanks a lot.

        On Wed, Jan 20, 2016 at 4:51 AM, Arpith C Jacob <acjacob at
us.ibm.com
        > wrote:
        Hi Ahmed,

        nvlink is unable to find the GPU OMP runtime library in its path.
        Does LIBRARY_PATH point to the right location? You could try
        passing the "-v" option to clang to get more information.

        Regards,
        Arpith









-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160121/78cda27d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160121/78cda27d/attachment.gif>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Jan 2016 - Executing OpenMP 4.0 code on Nvidia's GPU

[llvm-dev] Executing OpenMP 4.0 code on Nvidia's GPU

[llvm-dev] Executing OpenMP 4.0 code on Nvidia's GPU

[llvm-dev] Executing OpenMP 4.0 code on Nvidia's GPU

[llvm-dev] Executing OpenMP 4.0 code on Nvidia's GPU

[llvm-dev] Executing OpenMP 4.0 code on Nvidia's GPU

Reasonably Related Threads