thr3ads.net - llvm dev - [llvm-dev] [Openmp-dev] [cfe-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Jason Henline via llvm-dev

2016-Mar-14 17:50 UTC

[llvm-dev] [Openmp-dev] [cfe-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

I think it would be great if StreamExecutor could use liboffload to perform
its offloading under the hood. Right now offloading is handled in
StreamExecutor using platform plugins, so I think it could be very natural
for us to write a plugin which basically forwards to liboffload. If that
worked out, we could delete our current plugins and depend only on those
based on liboffload, and then all the offloading code would be unified.
Then, just as James said, StreamExecutor would provide a nice C++ interface
on top of liboffload, and liboffload could continue to support OpenMP
directly.

In this plan, I think it would make sense to move liboffload to the new
project being proposed by this RFC, and hopefully that would also make
liboffload more usable as a stand-alone project. Before moving forward with
any of these plans, I think it is right to wait to hear what IBM thinks.

On Mon, Mar 14, 2016 at 10:14 AM C Bergström <openmp-dev at
lists.llvm.org>
wrote:
> /* ignorable rant */
> I've publicly advocated it shouldn't have been there in the 1st
place.
> I have been quite vocal the work wasn't for everyone else to pay, but
> should have been part of the initial design. (Basically getting it
> right the 1st time - instead of forcing someone else to wade through a
> bunch of cmake)
>
> On Tue, Mar 15, 2016 at 1:10 AM, Cownie, James H
> <james.h.cownie at intel.com> wrote:
> >> I'd support some of Jame's comments if liboffload
wasn't glued to OMP
> as it is now.
> >
> > I certainly have no objection to moving liboffload elsewhere if that
> makes it more useful to people.
> > There is no real "glue" holding it there; it simply ended up
in the
> OpenMP directory structure because that
> > was an easy place to put it, not because that's the optimal place
for it.
> >
> > To some extent it has stayed there because no-one has put in any
effort
> to do the work to move it.
> >
> > -- Jim
> >
> > James Cownie <james.h.cownie at intel.com>
> > SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
> > Tel: +44 117 9071438
> >
> > -----Original Message-----
> > From: C Bergström [mailto:cbergstrom at pathscale.com]
> > Sent: Monday, March 14, 2016 5:01 PM
> > To: Cownie, James H <james.h.cownie at intel.com>
> > Cc: llvm-dev <llvm-dev at lists.llvm.org>; cfe-dev <cfe-dev
at lists.llvm.org>;
> openmp-dev at lists.llvm.org
> > Subject: Re: [cfe-dev] RFC: Proposing an LLVM subproject for
parallelism
> runtime and support libraries
> >
> > I'd support some of Jame's comments if liboffload wasn't
glued to OMP
> > as it is now. My attempts to decouple it into something with better
> > design layering and outside of OMP source repo, have failed. For it to
> > be advocated as "the" offload lib - it needs a home (imnsho)
outside
> > of OMP. Somewhere that others can easily play with it and not pay the
> > OMP tax. It may tick some of the boxes which have been mentioned, but
> > I'm curious how well it does when put under real workloads.
> >
> > On Tue, Mar 15, 2016 at 12:53 AM, Cownie, James H via cfe-dev
> > <cfe-dev at lists.llvm.org> wrote:
> >> Jason,
> >>
> >>
> >>
> >> It’s great that Google are interested in contributing to the
> development of
> >> LLVM in this area, and that you have code to support offload.
> >>
> >> However, I’m not sure that all of it is needed, since LLVM already
has
> the
> >> offload library which has been being developed in the context of
> OpenMP, but
> >> actually provides a general facility. It has been a part of LLVM
since
> April
> >> 2014, and is already being used to offload to both Intel Xeon Phi
and
> (at
> >> least NVidia) GPUs. (The IBM folks can tell you more about that!)
> >>
> >>
> >>
> >> The main difference I see (at a very first glance!) is that your
> >> StreamExecutor interfaces seem to be aimed more at end user code,
> whereas
> >> the interface to the existing offload library has not been
designed for
> the
> >> user, but to be an interface from the compiler. That has
advantages and
> >> disadvantages
> >>
> >> Advantages:
> >>
> >> ·         It is a C level interface, so is callable from C,C++ and
> Fortran
> >>
> >> Disadvantages:
> >>
> >> ·         Using it directly from C++ user code may be harder than
using
> >> StreamExecutor.
> >>
> >>
> >>
> >> However, there is nothing in the interface that prevents it from
being
> used
> >> with CUDA or OpenCL, and it already seems to support the low level
> features
> >> you cited as StreamExecutor’s advantages, though not the “looks
just
> like
> >> CUDA” aspects, since it’s explicitly vendor neutral.
> >>
> >>
> >>
> >>> StreamExecutor:
> >>
> >>>
> >>
> >>> * abstracts the underlying accelerator platform (avoids
locking you
> into a
> >>
> >>> single vendor, and lets you write code without thinking about
which
> >>
> >>> platform you'll be running on).
> >>
> >> Liboffload does this (and has a specific design for how to
abstract new
> >> devices and support them using device specific libraries).
> >>
> >>> * provides an open-source alternative to the CUDA runtime
library.
> >>
> >> I am not a CUDA expert, so I can’t comment on this! As before, IBM
> should
> >> comment.
> >>
> >>> * gives users a stream management model whose terminology
matches that
> of
> >>> the CUDA programming model.
> >>
> >> This is not abstract, but seems CUDA target specific, which is, if
> anything,
> >> worrying for a supposedly vendor-neutral interface!
> >>
> >>> * makes use of modern C++ to create a safe, efficient,
easy-to-use
> >>> programming interface.
> >>
> >> No, because liboffload is an implementation layer, not intended to
be
> >> user-visible.
> >>
> >>
> >>
> >>> StreamExecutor makes it easy to:
> >>
> >>>
> >>
> >>> * move data between host and accelerator (and also between
peer
> >>> accelerators).
> >>
> >> Liboffload supports this.
> >>
> >>> * execute data-parallel kernels written in the OpenCL or CUDA
kernel
> >>> languages.
> >>
> >> I believe this should be easy; IBM can comment better, since they
have
> been
> >> working on GPU support.
> >>
> >>> * inspect the capabilities of a GPU-like device at runtime.
> >>
> >>> * manage multiple devices.
> >>
> >> Liboffload supports this.
> >>
> >>
> >>
> >> We’d therefore be very interested in seeing an approach that
> implemented a
> >> C++ specific user-friendly interface on top of the existing
liboffload
> >> functionality, but we don’t see a reason to rework the OpenMP
> implementation
> >> to use StreamExecutor (since what LLVM already has is working
fine, and
> >> supporting offload to both GPUs and Xeon Phi).
> >>
> >>
> >>
> >> -- Jim
> >>
> >> James Cownie <james.h.cownie at intel.com>
> >> SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
> >>
> >> Tel: +44 117 9071438
> >>
> >>
> >>
> >>
---------------------------------------------------------------------
> >> Intel Corporation (UK) Limited
> >> Registered No. 1134945 (England)
> >> Registered Office: Pipers Way, Swindon SN3 1RJ
> >> VAT No: 860 2173 47
> >>
> >> This e-mail and any attachments may contain confidential material
for
> >> the sole use of the intended recipient(s). Any review or
distribution
> >> by others is strictly prohibited. If you are not the intended
> >> recipient, please contact the sender and delete all copies.
> >>
> >>
> >> _______________________________________________
> >> cfe-dev mailing list
> >> cfe-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>
> > ---------------------------------------------------------------------
> > Intel Corporation (UK) Limited
> > Registered No. 1134945 (England)
> > Registered Office: Pipers Way, Swindon SN3 1RJ
> > VAT No: 860 2173 47
> >
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). Any review or distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160314/ed721cba/attachment.html>

Sergey Ostanevich via llvm-dev

2016-Mar-14 19:48 UTC

head link

[llvm-dev] [cfe-dev] [Openmp-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

Jason,

It looks a cool thought on providing high-level interface for CUDA-based
development.  By far, my understanding is that the StreamExecutor is
designed as an abstract runtime layer of target architecture for
offloading, still requiring the target specific code from the user. This is
something that contradicts the PPM in my understanding: the abstraction of
the programming itself. To make the parallel: you provide an inline
assembly to enable a user with best performance/flexibility while OpenMP is
rather a Fortran compiler that gives abstraction of machine at all. This I
would name the biggest difference between the projects. Am I right?

The document you refers to is published a year ago and there was a
significant progress since then, with prototype compiler implemented at
https://github.com/clang-omp. The implementation supports NVIDIA and x86
targets, providing an abstraction of the target platform. There are other
parties contributed to the design, yet we will see open source
contributions from them. Nevertheless, I can say that the design was
thoroughly reviewed by them and it is approved by the full list of authors
- hence, their targets are satisfied also. As per the design document, the
abstraction layer is the libomptarget library supposed to dispatch
different target binaries at runtime. Along with dispatching the library
keeps track of all data mapping between the host and all target devices,
hiding this from the user and thus removes the burden of bookkeeping. This
can be a good point for the StreamExecutor to integrate its interface.
Still we need to review both sides: the libomptarget interface provided to
the compiler and the StreamExecutor internal interfaces.
> While the OpenMP model provides the convenience of allowing the author towrite their kernel code in standard C/C++, the StreamExecutor model allows
for the use of any kernel language (e.g. CUDA C++ or OpenCL C). This lets
authors use  platform-specific features that are only present in
platform-specific kernel definition languages.



Per my understanding, the OpenMP standard allows calling of functions in
target regions. Those functions can be generated for target either by the
compiler, where user annotate appropriate functions with "#pragma omp
declare target". But it also allows using of functions added to the target
binaries in any other way. For example, on Xeon PHI platform one can use
any shared library that is put on the device beforehand. As for the GPU and
other targets – binaries obtained from other compilers/build tools should
be passed to the target image link explicitly.


The OpenMP programming model was aimed the program is written once, and
 can be compiled with many compilers for multi-targets, in addition, with
the same compiler it can be compiled in serial mode. This fits well for
complex build systems and provides support to number of targets with no
additional interference with build scripts. My understanding is that  the
StreamExecutor requires different targets to be built separately and
thereafter put together to allow offloading.



I see some other differences in programming model as well, for example, the
executor project you referred to supports C++ only. Is there any plans to
support plain C and Fortran? I would suggest we do a more analysis to
extend the existing library for support StreamExecutor.


To collaborate is always a plus and having more interested parties is
beneficial both ways. I would like to start from the interfaces description
and review. The interface of the libomptarget (both compiler side and the
target RTL) are in the document. Could you prepare a small overview of the
(perhaps, supposed) StreamExecutor internal interfaces with CUDA and OpenCL
so that we can derive key common points of abstraction and try to map them
to libomptarget interface?


Regards,

Sergos

Intel Compiler Team



On Mon, Mar 14, 2016 at 8:50 PM, Jason Henline via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> I think it would be great if StreamExecutor could use liboffload to
> perform its offloading under the hood. Right now offloading is handled in
> StreamExecutor using platform plugins, so I think it could be very natural
> for us to write a plugin which basically forwards to liboffload. If that
> worked out, we could delete our current plugins and depend only on those
> based on liboffload, and then all the offloading code would be unified.
> Then, just as James said, StreamExecutor would provide a nice C++ interface
> on top of liboffload, and liboffload could continue to support OpenMP
> directly.
>
> In this plan, I think it would make sense to move liboffload to the new
> project being proposed by this RFC, and hopefully that would also make
> liboffload more usable as a stand-alone project. Before moving forward with
> any of these plans, I think it is right to wait to hear what IBM thinks.
>
> On Mon, Mar 14, 2016 at 10:14 AM C Bergström <openmp-dev at
lists.llvm.org>
> wrote:
>
>> /* ignorable rant */
>> I've publicly advocated it shouldn't have been there in the 1st
place.
>> I have been quite vocal the work wasn't for everyone else to pay,
but
>> should have been part of the initial design. (Basically getting it
>> right the 1st time - instead of forcing someone else to wade through a
>> bunch of cmake)
>>
>> On Tue, Mar 15, 2016 at 1:10 AM, Cownie, James H
>> <james.h.cownie at intel.com> wrote:
>> >> I'd support some of Jame's comments if liboffload
wasn't glued to OMP
>> as it is now.
>> >
>> > I certainly have no objection to moving liboffload elsewhere if
that
>> makes it more useful to people.
>> > There is no real "glue" holding it there; it simply
ended up in the
>> OpenMP directory structure because that
>> > was an easy place to put it, not because that's the optimal
place for
>> it.
>> >
>> > To some extent it has stayed there because no-one has put in any
effort
>> to do the work to move it.
>> >
>> > -- Jim
>> >
>> > James Cownie <james.h.cownie at intel.com>
>> > SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
>> > Tel: +44 117 9071438
>> >
>> > -----Original Message-----
>> > From: C Bergström [mailto:cbergstrom at pathscale.com]
>> > Sent: Monday, March 14, 2016 5:01 PM
>> > To: Cownie, James H <james.h.cownie at intel.com>
>> > Cc: llvm-dev <llvm-dev at lists.llvm.org>; cfe-dev
<cfe-dev at lists.llvm.org>;
>> openmp-dev at lists.llvm.org
>> > Subject: Re: [cfe-dev] RFC: Proposing an LLVM subproject for
>> parallelism runtime and support libraries
>> >
>> > I'd support some of Jame's comments if liboffload
wasn't glued to OMP
>> > as it is now. My attempts to decouple it into something with
better
>> > design layering and outside of OMP source repo, have failed. For
it to
>> > be advocated as "the" offload lib - it needs a home
(imnsho) outside
>> > of OMP. Somewhere that others can easily play with it and not pay
the
>> > OMP tax. It may tick some of the boxes which have been mentioned,
but
>> > I'm curious how well it does when put under real workloads.
>> >
>> > On Tue, Mar 15, 2016 at 12:53 AM, Cownie, James H via cfe-dev
>> > <cfe-dev at lists.llvm.org> wrote:
>> >> Jason,
>> >>
>> >>
>> >>
>> >> It’s great that Google are interested in contributing to the
>> development of
>> >> LLVM in this area, and that you have code to support offload.
>> >>
>> >> However, I’m not sure that all of it is needed, since LLVM
already has
>> the
>> >> offload library which has been being developed in the context
of
>> OpenMP, but
>> >> actually provides a general facility. It has been a part of
LLVM since
>> April
>> >> 2014, and is already being used to offload to both Intel Xeon
Phi and
>> (at
>> >> least NVidia) GPUs. (The IBM folks can tell you more about
that!)
>> >>
>> >>
>> >>
>> >> The main difference I see (at a very first glance!) is that
your
>> >> StreamExecutor interfaces seem to be aimed more at end user
code,
>> whereas
>> >> the interface to the existing offload library has not been
designed
>> for the
>> >> user, but to be an interface from the compiler. That has
advantages and
>> >> disadvantages
>> >>
>> >> Advantages:
>> >>
>> >> ·         It is a C level interface, so is callable from C,C++
and
>> Fortran
>> >>
>> >> Disadvantages:
>> >>
>> >> ·         Using it directly from C++ user code may be harder
than using
>> >> StreamExecutor.
>> >>
>> >>
>> >>
>> >> However, there is nothing in the interface that prevents it
from being
>> used
>> >> with CUDA or OpenCL, and it already seems to support the low
level
>> features
>> >> you cited as StreamExecutor’s advantages, though not the
“looks just
>> like
>> >> CUDA” aspects, since it’s explicitly vendor neutral.
>> >>
>> >>
>> >>
>> >>> StreamExecutor:
>> >>
>> >>>
>> >>
>> >>> * abstracts the underlying accelerator platform (avoids
locking you
>> into a
>> >>
>> >>> single vendor, and lets you write code without thinking
about which
>> >>
>> >>> platform you'll be running on).
>> >>
>> >> Liboffload does this (and has a specific design for how to
abstract new
>> >> devices and support them using device specific libraries).
>> >>
>> >>> * provides an open-source alternative to the CUDA runtime
library.
>> >>
>> >> I am not a CUDA expert, so I can’t comment on this! As before,
IBM
>> should
>> >> comment.
>> >>
>> >>> * gives users a stream management model whose terminology
matches
>> that of
>> >>> the CUDA programming model.
>> >>
>> >> This is not abstract, but seems CUDA target specific, which
is, if
>> anything,
>> >> worrying for a supposedly vendor-neutral interface!
>> >>
>> >>> * makes use of modern C++ to create a safe, efficient,
easy-to-use
>> >>> programming interface.
>> >>
>> >> No, because liboffload is an implementation layer, not
intended to be
>> >> user-visible.
>> >>
>> >>
>> >>
>> >>> StreamExecutor makes it easy to:
>> >>
>> >>>
>> >>
>> >>> * move data between host and accelerator (and also between
peer
>> >>> accelerators).
>> >>
>> >> Liboffload supports this.
>> >>
>> >>> * execute data-parallel kernels written in the OpenCL or
CUDA kernel
>> >>> languages.
>> >>
>> >> I believe this should be easy; IBM can comment better, since
they have
>> been
>> >> working on GPU support.
>> >>
>> >>> * inspect the capabilities of a GPU-like device at
runtime.
>> >>
>> >>> * manage multiple devices.
>> >>
>> >> Liboffload supports this.
>> >>
>> >>
>> >>
>> >> We’d therefore be very interested in seeing an approach that
>> implemented a
>> >> C++ specific user-friendly interface on top of the existing
liboffload
>> >> functionality, but we don’t see a reason to rework the OpenMP
>> implementation
>> >> to use StreamExecutor (since what LLVM already has is working
fine, and
>> >> supporting offload to both GPUs and Xeon Phi).
>> >>
>> >>
>> >>
>> >> -- Jim
>> >>
>> >> James Cownie <james.h.cownie at intel.com>
>> >> SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
>> >>
>> >> Tel: +44 117 9071438
>> >>
>> >>
>> >>
>> >>
---------------------------------------------------------------------
>> >> Intel Corporation (UK) Limited
>> >> Registered No. 1134945 (England)
>> >> Registered Office: Pipers Way, Swindon SN3 1RJ
>> >> VAT No: 860 2173 47
>> >>
>> >> This e-mail and any attachments may contain confidential
material for
>> >> the sole use of the intended recipient(s). Any review or
distribution
>> >> by others is strictly prohibited. If you are not the intended
>> >> recipient, please contact the sender and delete all copies.
>> >>
>> >>
>> >> _______________________________________________
>> >> cfe-dev mailing list
>> >> cfe-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> >>
>> >
---------------------------------------------------------------------
>> > Intel Corporation (UK) Limited
>> > Registered No. 1134945 (England)
>> > Registered Office: Pipers Way, Swindon SN3 1RJ
>> > VAT No: 860 2173 47
>> >
>> > This e-mail and any attachments may contain confidential material
for
>> > the sole use of the intended recipient(s). Any review or
distribution
>> > by others is strictly prohibited. If you are not the intended
>> > recipient, please contact the sender and delete all copies.
>> _______________________________________________
>> Openmp-dev mailing list
>> Openmp-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160314/8c202378/attachment.html>

Jason Henline via llvm-dev

2016-Mar-15 00:26 UTC

head link

[llvm-dev] [cfe-dev] [Openmp-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

Sergos,

It looks a cool thought on providing high-level interface for CUDA-based
development.  By far, my understanding is that the StreamExecutor is
designed as an abstract runtime layer of target architecture for
offloading, still requiring the target specific code from the user. This is
something that contradicts the PPM in my understanding: the abstraction of
the programming itself. To make the parallel: you provide an inline
assembly to enable a user with best performance/flexibility while OpenMP is
rather a Fortran compiler that gives abstraction of machine at all. This I
would name the biggest difference between the projects. Am I right?

Yes, it sounds to me that you have the right idea here.

The document you refers to is published a year ago and there was a
significant progress since then, with prototype compiler implemented at
https://github.com/clang-omp. The implementation supports NVIDIA and x86
targets, providing an abstraction of the target platform. There are other
parties contributed to the design, yet we will see open source
contributions from them. Nevertheless, I can say that the design was
thoroughly reviewed by them and it is approved by the full list of authors
- hence, their targets are satisfied also. As per the design document, the
abstraction layer is the libomptarget library supposed to dispatch
different target binaries at runtime. Along with dispatching the library
keeps track of all data mapping between the host and all target devices,
hiding this from the user and thus removes the burden of bookkeeping. This
can be a good point for the StreamExecutor to integrate its interface.
Still we need to review both sides: the libomptarget interface provided to
the compiler and the StreamExecutor internal interfaces.

Thanks for letting me know about the implementation on GitHub. I will take
a look at the code for libomptarget there to see how I think it could work
with StreamExecutor. In general terms, I really like the idea of
StreamExecutor being able to call into libomptarget rather than
implementing the offloading itself, so I think it will be great if we can
get those details to work out.

Per my understanding, the OpenMP standard allows calling of functions in
target regions. Those functions can be generated for target either by the
compiler, where user annotate appropriate functions with "#pragma omp
declare target". But it also allows using of functions added to the target
binaries in any other way. For example, on Xeon PHI platform one can use
any shared library that is put on the device beforehand. As for the GPU and
other targets – binaries obtained from other compilers/build tools should
be passed to the target image link explicitly.

I was not aware that OpenMP had a mode for running code compiled by a
different compiler. That sounds very nice. I would like to learn more about
what the user interface looks like for this, specifically in the case of
CUDA.

The OpenMP programming model was aimed the program is written once, and
 can be compiled with many compilers for multi-targets, in addition, with
the same compiler it can be compiled in serial mode. This fits well for
complex build systems and provides support to number of targets with no
additional interference with build scripts. My understanding is that  the
StreamExecutor requires different targets to be built separately and
thereafter put together to allow offloading.

Yes, I think that is an accurate representation of StreamExecutor. Our hope
is to integrate StreamExecutor into clang itself so that clang can manage
the bundling of device code in object files and the launching of that code.

I see some other differences in programming model as well, for example, the
executor project you referred to supports C++ only. Is there any plans to
support plain C and Fortran? I would suggest we do a more analysis to
extend the existing library for support StreamExecutor.

We are only interested in supporting C++. One of the main goals of
StreamExecutor is to create a nice interface specifically for C++.

To collaborate is always a plus and having more interested parties is
beneficial both ways. I would like to start from the interfaces description
and review. The interface of the libomptarget (both compiler side and the
target RTL) are in the document. Could you prepare a small overview of the
(perhaps, supposed) StreamExecutor internal interfaces with CUDA and OpenCL
so that we can derive key common points of abstraction and try to map them
to libomptarget interface?

Yes I will prepare a little overview of the internal StreamExecutor
interface to CUDA and OpenCL. This interface is already well defined, so I
will just need to copy and paste some things to a document. I'll plan to
have that completed some time tomorrow so you can see how StreamExecutor
would like to interact with libomptarget.

Thanks very much for your input on this,
-Jason

On Mon, Mar 14, 2016 at 12:48 PM Sergey Ostanevich <sergos.gnu at
gmail.com>
wrote:
> Jason,
>
> It looks a cool thought on providing high-level interface for CUDA-based
> development.  By far, my understanding is that the StreamExecutor is
> designed as an abstract runtime layer of target architecture for
> offloading, still requiring the target specific code from the user. This is
> something that contradicts the PPM in my understanding: the abstraction of
> the programming itself. To make the parallel: you provide an inline
> assembly to enable a user with best performance/flexibility while OpenMP is
> rather a Fortran compiler that gives abstraction of machine at all. This I
> would name the biggest difference between the projects. Am I right?
>
> The document you refers to is published a year ago and there was a
> significant progress since then, with prototype compiler implemented at
> https://github.com/clang-omp. The implementation supports NVIDIA and x86
> targets, providing an abstraction of the target platform. There are other
> parties contributed to the design, yet we will see open source
> contributions from them. Nevertheless, I can say that the design was
> thoroughly reviewed by them and it is approved by the full list of authors
> - hence, their targets are satisfied also. As per the design document, the
> abstraction layer is the libomptarget library supposed to dispatch
> different target binaries at runtime. Along with dispatching the library
> keeps track of all data mapping between the host and all target devices,
> hiding this from the user and thus removes the burden of bookkeeping. This
> can be a good point for the StreamExecutor to integrate its interface.
> Still we need to review both sides: the libomptarget interface provided to
> the compiler and the StreamExecutor internal interfaces.
>
> > While the OpenMP model provides the convenience of allowing the author
> to write their kernel code in standard C/C++, the StreamExecutor model
> allows for the use of any kernel language (e.g. CUDA C++ or OpenCL C). This
> lets authors use  platform-specific features that are only present in
> platform-specific kernel definition languages.
>
>
>
> Per my understanding, the OpenMP standard allows calling of functions in
> target regions. Those functions can be generated for target either by the
> compiler, where user annotate appropriate functions with "#pragma omp
> declare target". But it also allows using of functions added to the
target
> binaries in any other way. For example, on Xeon PHI platform one can use
> any shared library that is put on the device beforehand. As for the GPU and
> other targets – binaries obtained from other compilers/build tools should
> be passed to the target image link explicitly.
>
>
> The OpenMP programming model was aimed the program is written once, and
>  can be compiled with many compilers for multi-targets, in addition, with
> the same compiler it can be compiled in serial mode. This fits well for
> complex build systems and provides support to number of targets with no
> additional interference with build scripts. My understanding is that  the
> StreamExecutor requires different targets to be built separately and
> thereafter put together to allow offloading.
>
>
>
> I see some other differences in programming model as well, for example,
> the executor project you referred to supports C++ only. Is there any plans
> to support plain C and Fortran? I would suggest we do a more analysis to
> extend the existing library for support StreamExecutor.
>
>
> To collaborate is always a plus and having more interested parties is
> beneficial both ways. I would like to start from the interfaces description
> and review. The interface of the libomptarget (both compiler side and the
> target RTL) are in the document. Could you prepare a small overview of the
> (perhaps, supposed) StreamExecutor internal interfaces with CUDA and OpenCL
> so that we can derive key common points of abstraction and try to map them
> to libomptarget interface?
>
>
> Regards,
>
> Sergos
>
> Intel Compiler Team
>
>
>
> On Mon, Mar 14, 2016 at 8:50 PM, Jason Henline via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> I think it would be great if StreamExecutor could use liboffload to
>> perform its offloading under the hood. Right now offloading is handled
in
>> StreamExecutor using platform plugins, so I think it could be very
natural
>> for us to write a plugin which basically forwards to liboffload. If
that
>> worked out, we could delete our current plugins and depend only on
those
>> based on liboffload, and then all the offloading code would be unified.
>> Then, just as James said, StreamExecutor would provide a nice C++
interface
>> on top of liboffload, and liboffload could continue to support OpenMP
>> directly.
>>
>> In this plan, I think it would make sense to move liboffload to the new
>> project being proposed by this RFC, and hopefully that would also make
>> liboffload more usable as a stand-alone project. Before moving forward
with
>> any of these plans, I think it is right to wait to hear what IBM
thinks.
>>
>> On Mon, Mar 14, 2016 at 10:14 AM C Bergström <openmp-dev at
lists.llvm.org>
>> wrote:
>>
>>> /* ignorable rant */
>>> I've publicly advocated it shouldn't have been there in the
1st place.
>>> I have been quite vocal the work wasn't for everyone else to
pay, but
>>> should have been part of the initial design. (Basically getting it
>>> right the 1st time - instead of forcing someone else to wade
through a
>>> bunch of cmake)
>>>
>>> On Tue, Mar 15, 2016 at 1:10 AM, Cownie, James H
>>> <james.h.cownie at intel.com> wrote:
>>> >> I'd support some of Jame's comments if liboffload
wasn't glued to OMP
>>> as it is now.
>>> >
>>> > I certainly have no objection to moving liboffload elsewhere
if that
>>> makes it more useful to people.
>>> > There is no real "glue" holding it there; it simply
ended up in the
>>> OpenMP directory structure because that
>>> > was an easy place to put it, not because that's the
optimal place for
>>> it.
>>> >
>>> > To some extent it has stayed there because no-one has put in
any
>>> effort to do the work to move it.
>>> >
>>> > -- Jim
>>> >
>>> > James Cownie <james.h.cownie at intel.com>
>>> > SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
>>> > Tel: +44 117 9071438
>>> >
>>> > -----Original Message-----
>>> > From: C Bergström [mailto:cbergstrom at pathscale.com]
>>> > Sent: Monday, March 14, 2016 5:01 PM
>>> > To: Cownie, James H <james.h.cownie at intel.com>
>>> > Cc: llvm-dev <llvm-dev at lists.llvm.org>; cfe-dev <
>>> cfe-dev at lists.llvm.org>; openmp-dev at lists.llvm.org
>>> > Subject: Re: [cfe-dev] RFC: Proposing an LLVM subproject for
>>> parallelism runtime and support libraries
>>> >
>>> > I'd support some of Jame's comments if liboffload
wasn't glued to OMP
>>> > as it is now. My attempts to decouple it into something with
better
>>> > design layering and outside of OMP source repo, have failed.
For it to
>>> > be advocated as "the" offload lib - it needs a home
(imnsho) outside
>>> > of OMP. Somewhere that others can easily play with it and not
pay the
>>> > OMP tax. It may tick some of the boxes which have been
mentioned, but
>>> > I'm curious how well it does when put under real
workloads.
>>> >
>>> > On Tue, Mar 15, 2016 at 12:53 AM, Cownie, James H via cfe-dev
>>> > <cfe-dev at lists.llvm.org> wrote:
>>> >> Jason,
>>> >>
>>> >>
>>> >>
>>> >> It’s great that Google are interested in contributing to
the
>>> development of
>>> >> LLVM in this area, and that you have code to support
offload.
>>> >>
>>> >> However, I’m not sure that all of it is needed, since LLVM
already
>>> has the
>>> >> offload library which has been being developed in the
context of
>>> OpenMP, but
>>> >> actually provides a general facility. It has been a part
of LLVM
>>> since April
>>> >> 2014, and is already being used to offload to both Intel
Xeon Phi and
>>> (at
>>> >> least NVidia) GPUs. (The IBM folks can tell you more about
that!)
>>> >>
>>> >>
>>> >>
>>> >> The main difference I see (at a very first glance!) is
that your
>>> >> StreamExecutor interfaces seem to be aimed more at end
user code,
>>> whereas
>>> >> the interface to the existing offload library has not been
designed
>>> for the
>>> >> user, but to be an interface from the compiler. That has
advantages
>>> and
>>> >> disadvantages
>>> >>
>>> >> Advantages:
>>> >>
>>> >> ·         It is a C level interface, so is callable from
C,C++ and
>>> Fortran
>>> >>
>>> >> Disadvantages:
>>> >>
>>> >> ·         Using it directly from C++ user code may be
harder than
>>> using
>>> >> StreamExecutor.
>>> >>
>>> >>
>>> >>
>>> >> However, there is nothing in the interface that prevents
it from
>>> being used
>>> >> with CUDA or OpenCL, and it already seems to support the
low level
>>> features
>>> >> you cited as StreamExecutor’s advantages, though not the
“looks just
>>> like
>>> >> CUDA” aspects, since it’s explicitly vendor neutral.
>>> >>
>>> >>
>>> >>
>>> >>> StreamExecutor:
>>> >>
>>> >>>
>>> >>
>>> >>> * abstracts the underlying accelerator platform
(avoids locking you
>>> into a
>>> >>
>>> >>> single vendor, and lets you write code without
thinking about which
>>> >>
>>> >>> platform you'll be running on).
>>> >>
>>> >> Liboffload does this (and has a specific design for how to
abstract
>>> new
>>> >> devices and support them using device specific libraries).
>>> >>
>>> >>> * provides an open-source alternative to the CUDA
runtime library.
>>> >>
>>> >> I am not a CUDA expert, so I can’t comment on this! As
before, IBM
>>> should
>>> >> comment.
>>> >>
>>> >>> * gives users a stream management model whose
terminology matches
>>> that of
>>> >>> the CUDA programming model.
>>> >>
>>> >> This is not abstract, but seems CUDA target specific,
which is, if
>>> anything,
>>> >> worrying for a supposedly vendor-neutral interface!
>>> >>
>>> >>> * makes use of modern C++ to create a safe, efficient,
easy-to-use
>>> >>> programming interface.
>>> >>
>>> >> No, because liboffload is an implementation layer, not
intended to be
>>> >> user-visible.
>>> >>
>>> >>
>>> >>
>>> >>> StreamExecutor makes it easy to:
>>> >>
>>> >>>
>>> >>
>>> >>> * move data between host and accelerator (and also
between peer
>>> >>> accelerators).
>>> >>
>>> >> Liboffload supports this.
>>> >>
>>> >>> * execute data-parallel kernels written in the OpenCL
or CUDA kernel
>>> >>> languages.
>>> >>
>>> >> I believe this should be easy; IBM can comment better,
since they
>>> have been
>>> >> working on GPU support.
>>> >>
>>> >>> * inspect the capabilities of a GPU-like device at
runtime.
>>> >>
>>> >>> * manage multiple devices.
>>> >>
>>> >> Liboffload supports this.
>>> >>
>>> >>
>>> >>
>>> >> We’d therefore be very interested in seeing an approach
that
>>> implemented a
>>> >> C++ specific user-friendly interface on top of the
existing liboffload
>>> >> functionality, but we don’t see a reason to rework the
OpenMP
>>> implementation
>>> >> to use StreamExecutor (since what LLVM already has is
working fine,
>>> and
>>> >> supporting offload to both GPUs and Xeon Phi).
>>> >>
>>> >>
>>> >>
>>> >> -- Jim
>>> >>
>>> >> James Cownie <james.h.cownie at intel.com>
>>> >> SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
>>> >>
>>> >> Tel: +44 117 9071438
>>> >>
>>> >>
>>> >>
>>> >>
---------------------------------------------------------------------
>>> >> Intel Corporation (UK) Limited
>>> >> Registered No. 1134945 (England)
>>> >> Registered Office: Pipers Way, Swindon SN3 1RJ
>>> >> VAT No: 860 2173 47
>>> >>
>>> >> This e-mail and any attachments may contain confidential
material for
>>> >> the sole use of the intended recipient(s). Any review or
distribution
>>> >> by others is strictly prohibited. If you are not the
intended
>>> >> recipient, please contact the sender and delete all
copies.
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> cfe-dev mailing list
>>> >> cfe-dev at lists.llvm.org
>>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>> >>
>>> >
---------------------------------------------------------------------
>>> > Intel Corporation (UK) Limited
>>> > Registered No. 1134945 (England)
>>> > Registered Office: Pipers Way, Swindon SN3 1RJ
>>> > VAT No: 860 2173 47
>>> >
>>> > This e-mail and any attachments may contain confidential
material for
>>> > the sole use of the intended recipient(s). Any review or
distribution
>>> > by others is strictly prohibited. If you are not the intended
>>> > recipient, please contact the sender and delete all copies.
>>> _______________________________________________
>>> Openmp-dev mailing list
>>> Openmp-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/3ad78c4e/attachment.html>

Chandler Carruth via llvm-dev

2016-Mar-15 10:44 UTC

head link

[llvm-dev] [cfe-dev] [Openmp-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

On Mon, Mar 14, 2016 at 6:51 PM Jason Henline via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> I think it would be great if StreamExecutor could use liboffload to
> perform its offloading under the hood. Right now offloading is handled in
> StreamExecutor using platform plugins, so I think it could be very natural
> for us to write a plugin which basically forwards to liboffload.
>
I think that having a liboffload plugin would be nice, but I don't think we
should really base everything on top of this for a few reasons:

1) I think we already have a nice plugin interface specifically designed to
support out-of-tree platforms with StreamExecutor, and it wouldn't make a
lot of sense to force them to re-implement there stuff.

2) Some platforms may not want or be able to use the liboffload style
plugin.

It seems like if the OpenMP folks want to add a liboffload plugin to
StreamExecutor, that would be an awesome additional platform, but I don't
see why we need to force the coupling here.

My 2 cents.
-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/0e00ac6a/attachment.html>

Cownie, James H via llvm-dev

2016-Mar-15 11:13 UTC

head link

[llvm-dev] [cfe-dev] [Openmp-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

Chandler,

That raises a more meta-question for me, which is “Why should StreamExecutor be
in LLVM at all?”

AFAICS, with you approach

·         It is not a runtime library whose interface the compiler needs to
understand.

·         It does not depend on any LLVM runtime libraries.

·         It is expected to be used with out-of-tree plugins.

If I got all of that right, what connection does it have with LLVM that makes
having it in the LLVM tree necessary, or an improvement over simply having it on
github (or whatever your favourite open-source hosting location is)?

Did I misunderstand something?

-- Jim

James Cownie <james.h.cownie at intel.com>
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
Tel: +44 117 9071438

From: Chandler Carruth [mailto:chandlerc at google.com]
Sent: Tuesday, March 15, 2016 10:44 AM
To: Jason Henline <jhen at google.com>; C Bergström <cbergstrom at
pathscale.com>; Cownie, James H <james.h.cownie at intel.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; cfe-dev <cfe-dev at
lists.llvm.org>; openmp-dev at lists.llvm.org
Subject: Re: [cfe-dev] [Openmp-dev] RFC: Proposing an LLVM subproject for
parallelism runtime and support libraries

On Mon, Mar 14, 2016 at 6:51 PM Jason Henline via cfe-dev <cfe-dev at
lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:
I think it would be great if StreamExecutor could use liboffload to perform its
offloading under the hood. Right now offloading is handled in StreamExecutor
using platform plugins, so I think it could be very natural for us to write a
plugin which basically forwards to liboffload.

I think that having a liboffload plugin would be nice, but I don't think we
should really base everything on top of this for a few reasons:

1) I think we already have a nice plugin interface specifically designed to
support out-of-tree platforms with StreamExecutor, and it wouldn't make a
lot of sense to force them to re-implement there stuff.

2) Some platforms may not want or be able to use the liboffload style plugin.

It seems like if the OpenMP folks want to add a liboffload plugin to
StreamExecutor, that would be an awesome additional platform, but I don't
see why we need to force the coupling here.

My 2 cents.
-Chandler
---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/e527d0d6/attachment-0001.html>

C Bergström via llvm-dev

2016-Mar-15 15:09 UTC

head link

[llvm-dev] [cfe-dev] [Openmp-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

On Tue, Mar 15, 2016 at 6:44 PM, Chandler Carruth <chandlerc at
google.com> wrote:> On Mon, Mar 14, 2016 at 6:51 PM Jason Henline via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
>>
>> I think it would be great if StreamExecutor could use liboffload to
>> perform its offloading under the hood. Right now offloading is handled
in
>> StreamExecutor using platform plugins, so I think it could be very
natural
>> for us to write a plugin which basically forwards to liboffload.
>
>
> I think that having a liboffload plugin would be nice, but I don't
think we
> should really base everything on top of this for a few reasons:
>
> 1) I think we already have a nice plugin interface specifically designed to
> support out-of-tree platforms with StreamExecutor, and it wouldn't make
a
> lot of sense to force them to re-implement there stuff.
>
> 2) Some platforms may not want or be able to use the liboffload style
> plugin.
>
> It seems like if the OpenMP folks want to add a liboffload plugin to
> StreamExecutor, that would be an awesome additional platform, but I
don't
> see why we need to force the coupling here.
I see the distinction of being "agnostic" and welcoming different
plugins, but at the same time I asked how Google was engaging hw
stakeholders. The feedback from Intel if I heard them correctly - they
would warmly welcome some liboffload integration. (Which would enable
PHI support if I'm not mistaken?)

While I don't think anyone will try to block the integration of this
on whether it does or doesn't have support for liboffload - I think
you may win more friends if it does.

IMHO, until this gets market traction I don't think it's ready for
inclusion. There's lots of really great ideas, but there should be
some threshold of _____________ (importance?) before it's included. (I
apologize I can't word this previous sentence perfectly) /* I really
wish there was some way to have it be an "incubator" before formal
inclusion */
------------
Down the road it raises other questions like is it something google
would want enabled and packaged by default?

Also on a technical level I'd like to see some roadmap which Google
plans and if/how they will get feedback from the industry/users/etc.
This ties into previous comments - is it a "standard" or how will you
develop it - openly, semi-open or entirely behind closed doors.

Without a good testsuite and examples - it doesn't feel like there's a
lot of commitment to it. (I apologize as I may have missed if such a
thing exists)

Andrey Bokhanko via llvm-dev

2016-Mar-15 19:28 UTC

head link

[llvm-dev] [Openmp-dev] [cfe-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

Hola Chandler,

On Tue, Mar 15, 2016 at 1:44 PM, Chandler Carruth via Openmp-dev <
openmp-dev at lists.llvm.org> wrote:
> It seems like if the OpenMP folks want to add a liboffload plugin to
> StreamExecutor, that would be an awesome additional platform, but I
don't
> see why we need to force the coupling here.
>
>Let me give you a reason: while user-facing sides of StreamExecutor and
OpenMP are quite different (and each warrants its place under the sun!),
internal SE's offloading interface and liboffload are doing exactly the
same thing. Why we want to duplicate code? As previous replies
demonstrated, SE can't serve OpenMP's needs, while liboffload API seems
to
be general enough to serve SE well (though this has to be verified, of
course -- as I understand, Jason is going to do this).

Sure, there is no "must have need" to couple SE and liboffload, but
this
sounds like a solid software engineering decision to me. Or, quoting Jason,
who said this much better than me:
> Although OpenMP and StreamExecutor support different programming models,
> some of the work they perform under the hood will likely be very similar.
> By sharing code and domain expertise, both projects will be improved and
> strengthened as their capabilities are expanded. The StreamExecutor
> community looks forward to much collaboration and discussion with OpenMP
> about the best places and ways to cooperate.
Espere veure't demà!

Yours,
Andrey
====Enginyer de Software
Intel Compiler Team
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/0000804b/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Mar 2016 - [Openmp-dev] [cfe-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

[llvm-dev] [Openmp-dev] [cfe-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

[llvm-dev] [cfe-dev] [Openmp-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

[llvm-dev] [cfe-dev] [Openmp-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

[llvm-dev] [cfe-dev] [Openmp-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

[llvm-dev] [cfe-dev] [Openmp-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

[llvm-dev] [cfe-dev] [Openmp-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

[llvm-dev] [Openmp-dev] [cfe-dev] RFC: Proposing an LLVM subproject for parallelism runtime and support libraries

Possibly Parallel Threads