thr3ads.net - llvm dev - [LLVMdev] [RFC] OpenMP offload infrastructure [Aug 2014]

If this information is useful, please help other people find it:
Share via:

Sergey Ostanevich

2014-Aug-08 22:22 UTC

[LLVMdev] [RFC] OpenMP offload infrastructure

Hello everybody!

I would like to present a proposal for implementation of OpenMP
offloading in LLVM. It was created by a list of authors and covers the
runtime part at most and at a very high level. I believe it will be
good to have input from community at this early stage before moving
deeper in details.

The driver part is intentionally not touched, since we have no clear
vision on how one can use 3rd party compiler for target code
generation and incorporate its results into the final host link phase.
I hope to hear from you more on this.

I invite you to take part in discussion of the document. Critics,
proposals, updates - all are welcome!

Thank you,
Sergey Ostanevich
Open Source Compilers
Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: offload-proposal.pdf
Type: application/pdf
Size: 684900 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140809/cd6c7f7a/attachment.pdf>

Das, Dibyendu

2014-Aug-11 06:03 UTC

head link

[LLVMdev] [RFC] OpenMP offload infrastructure

I didn’t see SPIR discussed anywhere.

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Sergey Ostanevich
Sent: Saturday, August 09, 2014 3:52 AM
To: llvmdev at cs.uiuc.edu
Subject: [LLVMdev] [RFC] OpenMP offload infrastructure

Hello everybody!

I would like to present a proposal for implementation of OpenMP offloading in
LLVM. It was created by a list of authors and covers the runtime part at most
and at a very high level. I believe it will be good to have input from community
at this early stage before moving deeper in details.

The driver part is intentionally not touched, since we have no clear vision on
how one can use 3rd party compiler for target code generation and incorporate
its results into the final host link phase.
I hope to hear from you more on this.

I invite you to take part in discussion of the document. Critics, proposals,
updates - all are welcome!

Thank you,
Sergey Ostanevich
Open Source Compilers
Intel Corporation

"C. Bergström"

2014-Aug-11 09:15 UTC

head link

[LLVMdev] [RFC] OpenMP offload infrastructure

On 08/11/14 01:03 PM, Das, Dibyendu wrote:> I didn’t see SPIR discussed anywhere.This isn't OpenCL and depending on OpenCL for OpenMP may not really make 
sense. While I have my own opinions - If you feel strongly that it will 
help enable higher performance somewhere please list those reasons.
----------
More specifically
LLVM has a native AMD dGPU backend that is tightly coupled to the 
compiler. Unlike other platforms which use things like PTX or other 
byte-codes. Those platforms lose performance or have to work-around not 
having hw level details. Assuming this is done correctly it would be a 
disservice to emit SPIR instead of native codegen. (Imagine JAVA JIT vs 
C performance)

This also keeps everything in the open..

In my experience - people don't use OpenMP because they want so-so 
performance.. and with Exascale this will be increasingly important..

John Leidel (jleidel)

2014-Aug-11 13:36 UTC

head link

[LLVMdev] [RFC] OpenMP offload infrastructure

Sergey [et.al], thanks for putting this proposal together.  Overall, this looks
like a pretty solid approach to providing relatively hardware agnostic omp
target functionality.  I had several comments/questions as summarized below:

Pros: 
- We [local colleagues and myself] like the concise target API.  We’re big fans
of KISS development principles.
- We believe this provides a good basis for future work in heterogeneous OMP
support

Comments/Questions: 
- There doesn’t seem to be any mention of how mutable each runtime function is
with respect to its target execution region.  The core OMP spec document notes
in several places that certain user-visible runtime calls have “implementation
defined” behavior depending upon where/how they’re used.  For example, what
happens if the host runtime issues a __tgt_target_data_update() while the target
is currently executing (__tgt_rtl_run_target_region() )?  Is this implementation
defined?  I’m certainly ok with that answer, but I believe we need to explicitly
state what the behavior is.

- I noticed that Alexandre Eichenberger was one of the authors.  Has he
mentioned any support/compatibility with the profiling interfaces he (JMC,
et.al.) proposed?  How does one integrate the proposed profiling runtime logic
with a target region (specifically the dispatch & data movement interfaces)?
This would be very handy.

- I don’t see any mention of an interface to query the physical details of a
device.  I know this strays a bit from the notion of portability, but it would
be nice to have a simple interface (similar to ‘omp_get_max_threads’).  I stop
short of querying information as detailed as provided by hwloc, but it would be
nice for the user to have the ability to query the targets and see which ones
are appropriate for execution.  This would essentially provide you the ability
to build different implementations of a kernel and make a runtime decision on
which one to execute.  EG,
if( /* target of some specific type present */ ){ 
    /* use the omp target interface */
}else{ 
   /* use the normal worksharing or tasking interfaces */
}

(I realize this is more of an OMP spec question)
  
- It would be nice to define a runtime and/or environment mechanism that permits
the user to enable/disable specific targets.  For example, if a system had four
GPUs, but you only wanted to enable two, it would be convenient to do so using
an environment variable.  I realize that one could do this using actual runtime
calls in the code with some amount of intelligence, but this somewhat defeats
the purpose of portability.  Again, this is more related to the 4.x spec, but it
does have implications in the lower-level runtime.


cheers
john 


On Aug 8, 2014, at 5:22 PM, Sergey Ostanevich <sergos.gnu at gmail.com>
wrote:
> Hello everybody!
> 
> I would like to present a proposal for implementation of OpenMP
> offloading in LLVM. It was created by a list of authors and covers the
> runtime part at most and at a very high level. I believe it will be
> good to have input from community at this early stage before moving
> deeper in details.
> 
> The driver part is intentionally not touched, since we have no clear
> vision on how one can use 3rd party compiler for target code
> generation and incorporate its results into the final host link phase.
> I hope to hear from you more on this.
> 
> I invite you to take part in discussion of the document. Critics,
> proposals, updates - all are welcome!
> 
> Thank you,
> Sergey Ostanevich
> Open Source Compilers
> Intel Corporation
> <offload-proposal.pdf>_______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Samuel Antão

2014-Aug-11 16:51 UTC

head link

[LLVMdev] [RFC] OpenMP offload infrastructure

Hi John,

Thank you for the comments. I am addressing some of them bellow.

Regards,
Samuel


2014-08-11 9:36 GMT-04:00 John Leidel (jleidel) <jleidel at micron.com>:
> Sergey [et.al], thanks for putting this proposal together.  Overall, this
> looks like a pretty solid approach to providing relatively hardware
> agnostic omp target functionality.  I had several comments/questions as
> summarized below:
>
> Pros:
> - We [local colleagues and myself] like the concise target API.  We’re big
> fans of KISS development principles.
> - We believe this provides a good basis for future work in heterogeneous
> OMP support
>
> Comments/Questions:
> - There doesn’t seem to be any mention of how mutable each runtime
> function is with respect to its target execution region.  The core OMP spec
> document notes in several places that certain user-visible runtime calls
> have “implementation defined” behavior depending upon where/how they’re
> used.  For example, what happens if the host runtime issues a
> __tgt_target_data_update() while the target is currently executing
> (__tgt_rtl_run_target_region() )?  Is this implementation defined?  I’m
> certainly ok with that answer, but I believe we need to explicitly state
> what the behavior is.
>
In my view the user-visible OpenMP calls that apply to target regions
depend on the state kept in libtarget.so, and are therefore device-type
independent. What is device dependent is how the OpenMP terminology is
mapped. For example, get_num_teams() would operate on top of the state kept
in libtarget.so but how the device interpret a team is device dependent and
deviced by the target dependent runtime.

A different issue is how the RTL implementation for calls that are common
for target and host (i.e. the kmpc_ calls) should be implemented. I think
it is a good idea to have some flexibility in the codegen to tune the
generation of these calls if the default interface is not suitable for a
given target. But in general, the kmpc_ library implementation should be
known to the toolchain of that target so it can properly drive the linking.


About the specific example you mentioned. If I understand it correctly,
following the current version of the spec tgt_rtl_run_target_region() has
to be blocking so libtarget.so would have to wait for the update to be
issued.  The actions in libtarget.so would have to be sequential exactly
has the codegeneration expects. If for some reason these constraints change
in future specs, both codegeneration and libtarget.so implementation would
have to be made consistent.

> - I noticed that Alexandre Eichenberger was one of the authors.  Has he
> mentioned any support/compatibility with the profiling interfaces he (JMC,
> et.al.) proposed?  How does one integrate the proposed profiling runtime
> logic with a target region (specifically the dispatch & data movement
> interfaces)?  This would be very handy.
>
> - I don’t see any mention of an interface to query the physical details of
> a device.  I know this strays a bit from the notion of portability, but it
> would be nice to have a simple interface (similar to
> ‘omp_get_max_threads’).  I stop short of querying information as detailed
> as provided by hwloc, but it would be nice for the user to have the ability
> to query the targets and see which ones are appropriate for execution.
>  This would essentially provide you the ability to build different
> implementations of a kernel and make a runtime decision on which one to
> execute.  EG,
> if( /* target of some specific type present */ ){
>     /* use the omp target interface */
> }else{
>    /* use the normal worksharing or tasking interfaces */
> }
>
> (I realize this is more of an OMP spec question)
>
I agree this is more of an OMP spec issue. The fact we are addressing
different device-types is already an extension to the spec which poses some
issues. One of them, somehow related with this, is how the device ids are
mapped to device types. Should this depend on flags passed to the compiler
( e.g. omptargets=A,B with ids 0-1 assigned to A and 2-3 to B given that
the runtime identified in the system two devices of each), or should it
depend on the environment? In the current proposal, libtarget.so abstracts
a single target made of several targets, do we want to let the user
prioritize which exact device to use? Should this be decided at compile
time or runtime?

>
> - It would be nice to define a runtime and/or environment mechanism that
> permits the user to enable/disable specific targets.  For example, if a
> system had four GPUs, but you only wanted to enable two, it would be
> convenient to do so using an environment variable.  I realize that one
> could do this using actual runtime calls in the code with some amount of
> intelligence, but this somewhat defeats the purpose of portability.  Again,
> this is more related to the 4.x spec, but it does have implications in the
> lower-level runtime.
>
>I think this can be solved by the target dependent RTL alone by returning
the number of available devices to libtarget.so based on some env variable
specified by the RTL.

>
> cheers
> john
>
>
> On Aug 8, 2014, at 5:22 PM, Sergey Ostanevich <sergos.gnu at
gmail.com>
> wrote:
>
> > Hello everybody!
> >
> > I would like to present a proposal for implementation of OpenMP
> > offloading in LLVM. It was created by a list of authors and covers the
> > runtime part at most and at a very high level. I believe it will be
> > good to have input from community at this early stage before moving
> > deeper in details.
> >
> > The driver part is intentionally not touched, since we have no clear
> > vision on how one can use 3rd party compiler for target code
> > generation and incorporate its results into the final host link phase.
> > I hope to hear from you more on this.
> >
> > I invite you to take part in discussion of the document. Critics,
> > proposals, updates - all are welcome!
> >
> > Thank you,
> > Sergey Ostanevich
> > Open Source Compilers
> > Intel Corporation
> >
<offload-proposal.pdf>_______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140811/67e3a4a2/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Aug 2014 - [LLVMdev] [RFC] OpenMP offload infrastructure

[LLVMdev] [RFC] OpenMP offload infrastructure

[LLVMdev] [RFC] OpenMP offload infrastructure

[LLVMdev] [RFC] OpenMP offload infrastructure

[LLVMdev] [RFC] OpenMP offload infrastructure

[LLVMdev] [RFC] OpenMP offload infrastructure

Possibly Parallel Threads